Skip to content Skip to sidebar Skip to footer

Cdata Getting Stripped In Lxml Even After Using Strip_cdata=false

I have a requirement in which I need to read a XML file and replace a string with a certain value. The XML contains CDATA element and I need to preserve it. I have tried using par

Solution 1:

This is because you are doing

elem.text = elem.text.replace('Bundled Manager 2.2(8b)', '123456')

which replaces the CDATA with a normal text node.

The documentation states

Note how the .text property does not give any indication that the text content is wrapped by a CDATA section. If you want to make sure your data is wrapped by a CDATA block, you can use the CDATA() text wrapper.

Therefore, if you want to keep the CDATA section, you should only assign to elem.text if you are modifying it, and instruct lxml to use a CDATA section:

if'Bundled Manager 2.2(8b)' in elem.text:
    elem.text = ET.CDATA(elem.text.replace('Bundled Manager 2.2(8b)', '123456'))

Due to how the ElementTree library works (the entire text and cdata content is concatenated and exposed as a str in the .text property), it's not really possible to know whether CDATA was originally used or not. (see Figuring out where CDATA is in lxml element? and the source code)

Post a Comment for "Cdata Getting Stripped In Lxml Even After Using Strip_cdata=false"