xml filtering with python

Tags:

I have a following xml document:

<node0>
    <node1>
      <node2 a1="x1"> ... </node2>
      <node2 a1="x2"> ... </node2>
      <node2 a1="x1"> ... </node2>
    </node1>
</node0>

I want to filter out node2 when a1="x2". The user provides the xpath and attribute values that need to tested and filtered out. I looked at some solutions in python like BeautifulSoup but they are too complicated and dont preserve the case of text. I want to keep the document same as before with some stuff filtered out.

Can you recommend a simple and succinct solution? This should not be too complicated from the looks of it. The actual xml document is not as simple as above but idea is the same.

603

asked May 19 '10 21:05

user236215

1 Answers

This uses xml.etree.ElementTree which is in the standard library:

import xml.etree.ElementTree as xee
data='''\
<node1>
  <node2 a1="x1"> ... </node2>
  <node2 a1="x2"> ... </node2>
  <node2 a1="x1"> ... </node2>
</node1>
'''
doc=xee.fromstring(data)

for tag in doc.findall('node2'):
    if tag.attrib['a1']=='x2':
        doc.remove(tag)
print(xee.tostring(doc))
# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

This uses lxml, which is not in the standard library, but has a more powerful syntax:

import lxml.etree
data='''\
<node1>
  <node2 a1="x1"> ... </node2>
  <node2 a1="x2"> ... </node2>
  <node2 a1="x1"> ... </node2>
</node1>
'''
doc = lxml.etree.XML(data)
e=doc.find('node2/[@a1="x2"]')
doc.remove(e)
print(lxml.etree.tostring(doc))

# <node1>
#   <node2 a1="x1"> ... </node2>
#   <node2 a1="x1"> ... </node2>
# </node1>

Edit: If node2 is buried more deeply in the xml, then you can iterate through all the tags, check each parent tag to see if the node2 element is one of its children, and the remove it if so:

Using only xml.etree.ElementTree:

doc=xee.fromstring(data)
for parent in doc.getiterator():
    for child in parent.findall('node2'):
        if child.attrib['a1']=='x2':
            parent.remove(child)

Using lxml:

doc = lxml.etree.XML(data)
for parent in doc.iter('*'):
    child=parent.find('node2/[@a1="x2"]')
    if child is not None:
        parent.remove(child)

151

answered Oct 25 '22 01:10

unutbu

Related questions
                            
                                Loading a document on OpenOffice using an external Python program
                            
                                How is the __format__ method supposed to be used for int?
                            
                                In Python, how to access a uint16[3] array wrapped by SWIG (i.e. unwrap a PySwigObject)?
                            
                                Python error while using MysqlDb - sets module is deprecated
                            
                                Beginner extending C with Python (specifically Numpy)
                            
                                Avoiding unnecessary slice copying in Python
                            
                                Output of python scripts displayed only at termination when using SSH?
                            
                                Designing to easily migrate to Google App Engine
                            
                                Vectorization of index operation for a scipy.sparse matrix
                            
                                Python program doesn't quit when finished
                            
                                What are some good ways to do intermachine locking?
                            
                                Getting libstdc++-v3/python
                            
                                Importing a function/class from a Python module of the same name
                            
                                Is there a Perl equivlant module to pydbg module?
                            
                                OS Reboot, Shutdown, Hibernate, Sleep, Wakeup (Windows Python)
                            
                                How to log in to a website using installed twill?
                            
                                Quote POSIX shell special characters in Python output
                            
                                Optimizing BeautifulSoup (Python) code
                            
                                Unit testing authorization in a Pylons app fails; cookies aren't been correctly set or recorded
                            
                                How to add OSX menu bar icon with wxPython

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

xml filtering with python

Tags:

python

xml

xpath

elementtree

user236215

People also ask

1 Answers

unutbu

Recent Activity

Donate For Us