Prevent Python Lxml From Adding Plain Text A
Tag
I don't want lxml add anything to plain text. I left them as they are on purpose. lxml adds plain text a
tag. Here value might be html or plaintext. I need lxml to proces
Solution 1:
try this library... save my but from having to use "re" module when dealing with a XML page where for some dumb reason scrapy selctors working wonky...
from w3lib.html import remove_tags
def parse(self, response):
hxs = HtmlXPathSelector(response)
follow = hxs.xpath('//loc').re('.*type=videos.*')
follow = [remove_tags(x) for x in follow]
# It wont remove regex lines like \n
Post a Comment for "Prevent Python Lxml From Adding Plain Text A
Tag"