Skip to content Skip to sidebar Skip to footer

Python: Update Xml-file Using Elementtree While Conserving Layout As Much As Possible

I have a document which uses an XML namespace for which I want to increase /group/house/dogs by one: (the file is called houses.xml)
from xml.etree import ElementTree as ET
import re


classElementTreeHelper():
    def__init__(self, xml_file_name):
        xml_file = open(xml_file_name, "rb")

        self.__parse_xml_declaration(xml_file)

        self.element_tree = ET.parse(xml_file)
        xml_file.seek(0)

        root_tag_namespace = self.__root_tag_namespace(self.element_tree)
        self.namespace = Noneif root_tag_namespace isnotNone:
            self.namespace = '{' + root_tag_namespace + '}'# Register the root tag namespace as having an empty prefix, as# this has to be done before parsing xml_file we re-parse.
            ET.register_namespace('', root_tag_namespace)
            self.element_tree = ET.parse(xml_file)

    deffind(self, xpath_query):
        return self.element_tree.find(xpath_query)

    defwrite(self, xml_file_name):
        xml_file = open(xml_file_name, "wb")
        if self.xml_declaration_line isnotNone:
            xml_file.write(self.xml_declaration_line + '\n')

        return self.element_tree.write(xml_file)

    def__parse_xml_declaration(self, xml_file):
        first_line = xml_file.readline().strip()
        if first_line.startswith('<?xml') and first_line.endswith('?>'):
            self.xml_declaration_line = first_line
        else:
            self.xml_declaration_line = None
        xml_file.seek(0)

    def__root_tag_namespace(self, element_tree):
        namespace_search = re.search('^{(\S+)}', element_tree.getroot().tag)
        if namespace_search isnotNone:
            return namespace_search.group(1)
        else:
            returnNonedef__main():
    el_tree_hlp = ElementTreeHelper('houses.xml')

    dogs_tag = el_tree_hlp.element_tree.getroot().find(
                   '{ns}house/{ns}dogs'.format(
                         ns=el_tree_hlp.namespace))
    one_dog_added = int(dogs_tag.text.strip()) + 1
    dogs_tag.text = str(one_dog_added)

    el_tree_hlp.write('hejsan.xml')

if __name__ == '__main__':
    __main()

The output:

<?xml version="1.0"?><groupxmlns="http://dogs.house.local"><house><id>2821</id><dogs>3</dogs></house></group>

If someone has an improvement to this solution please don´t hesitate to grab the code and improve it.

Solution 2:

Round-tripping, unfortunately, isn't a trivial problem. With XML, it's generally not possible to preserve the original document unless you use a special parser (like DecentXML but that's for Java).

Depending on your needs, you have the following options:

  • If you control the source and you can secure your code with unit tests, you can write your own, simple parser. This parser doesn't accept XML but only a limited subset. You can, for example, read the whole document as a string and then use Python's string operations to locate <dogs> and replace anything up to the next <. Hack? Yes.

  • You can filter the output. XML allows the string <ns0: only in one place, so you can search&replace it with < and then the same with <group xmlns:ns0="<group xmlns=". This is pretty safe unless you can have CDATA in your XML.

  • You can write your own, simple XML parser. Read the input as a string and then create Elements for each pair of <> plus their positions in the input. That allows you to take the input apart quickly but only works for small inputs.

Solution 3:

when Save xml add default_namespace argument is easy to avoid ns0, on my code

key code: xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)

if os.path.isfile(xmlfiile):
            xmltree = ET.parse(xmlfiile)
            root = xmltree.getroot()
            xmlnamespace = root.tag.split('{')[1].split('}')[0]  //get namespace

            initwin=xmltree.find("./{"+ xmlnamespace +"}test")
            initwin.find("./{"+ xmlnamespace +"}content").text = "aaa"
            xmltree.write(xmlfiile,"utf-8",default_namespace=xmlnamespace)

Solution 4:

etree from lxml provides this feature.

  1. elementTree.write('houses2.xml',encoding = "UTF-8",xml_declaration = True) helps you in not omitting the declaration

  2. While writing into the file it does not change the namespaces.

http://lxml.de/parsing.html is the link for its tutorial.

P.S : lxml should be installed separately.

Post a Comment for "Python: Update Xml-file Using Elementtree While Conserving Layout As Much As Possible"