Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python xml minidom get the full content of childnode that contains both child and text

I'm looking for extracting the content of a xml file with xml minidom, here is the example:

<parent>
   <child>
        text1 
        <subchild>text2 </subchild> 
        text3
   </child>
</parent>

The following code extract only 'text1':

  DOMTree = xml.dom.minidom.parse('file.xml')
  document = DOMTree.documentElement
  parents = document.getElementsByTagName('parent')
  for parent in parents:
    child = parents.getElementsByTagName('parent')[0]
    print(child.childNodes[0].nodeValue) # shows text1

I can get text1 and text2 but not text3
Please how can i get the full content of my child element and my subchild element (text1 text2 text3)?

like image 414
MeBex Avatar asked Dec 14 '25 03:12

MeBex


1 Answers

Iterate over child nodes and get the .data property in case of a Text object and firstChild.nodeValue otherwise:

print([node.data.strip() if isinstance(node, xml.dom.minidom.Text) else node.firstChild.nodeValue
       for node in child.childNodes])

Prints ['text1', 'text2 ', 'text3'].


I would though consider switching to something more straight-forward and easy-to-use and understand than minidom library. For example, see how easy it is with BeautifulSoup in an xml mode:

>>> from bs4 import BeautifulSoup
>>> data = """
... <parent>
...    <child>
...         text1 
...         <subchild>text2 </subchild> 
...         text3
...    </child>
... </parent>
... """
>>> soup = BeautifulSoup(data, "xml")
>>> print(soup.child.get_text())

        text1 
        text2  
        text3
like image 97
alecxe Avatar answered Dec 16 '25 18:12

alecxe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!