Cdata Getting Stripped In Lxml Even After Using Strip_cdata=false
I have a requirement in which I need to read a XML file and replace a string with a certain value. The XML contains CDATA element and I need to preserve it. I have tried using par
Solution 1:
This is because you are doing
elem.text = elem.text.replace('Bundled Manager 2.2(8b)', '123456')
which replaces the CDATA with a normal text node.
The documentation states
Note how the
.text
property does not give any indication that the text content is wrapped by a CDATA section. If you want to make sure your data is wrapped by a CDATA block, you can use theCDATA()
text wrapper.
Therefore, if you want to keep the CDATA section, you should only assign to elem.text
if you are modifying it, and instruct lxml to use a CDATA section:
if'Bundled Manager 2.2(8b)' in elem.text:
elem.text = ET.CDATA(elem.text.replace('Bundled Manager 2.2(8b)', '123456'))
Due to how the ElementTree
library works (the entire text and cdata content is concatenated and exposed as a str
in the .text
property), it's not really possible to know whether CDATA was originally used or not. (see Figuring out where CDATA is in lxml element? and the source code)
Post a Comment for "Cdata Getting Stripped In Lxml Even After Using Strip_cdata=false"