Skip to content Skip to sidebar Skip to footer

Prevent Python Lxml From Adding Plain Text A

Tag

I don't want lxml add anything to plain text. I left them as they are on purpose. lxml adds plain text a

tag. Here value might be html or plaintext. I need lxml to proces

Solution 1:

try this library... save my but from having to use "re" module when dealing with a XML page where for some dumb reason scrapy selctors working wonky...

from w3lib.html import remove_tags

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    follow = hxs.xpath('//loc').re('.*type=videos.*')
    follow = [remove_tags(x) for x in follow]
    # It wont remove regex lines like \n

Post a Comment for "Prevent Python Lxml From Adding Plain Text A

Tag"