In an XML file, I'm trying to get the content of a tag that appears multiple times at different levels in the tag hierarchy. I'm trying to get the content of the highest level occurrence of the tag, but my XML reader (BeautifulSoup for Python) keeps giving me the wrong content.
Here is the concrete problem. This is part of the XML file (condensed to the parts I believe are relevant):
<object>
<name>person</name>
<part>
<name>head</name>
<bndbox>
<xmin>337</xmin>
<ymin>2</ymin>
<xmax>382</xmax>
<ymax>66</ymax>
</bndbox>
</part>
<bndbox>
<xmin>334</xmin>
<ymin>1</ymin>
<xmax>436</xmax>
<ymax>373</ymax>
</bndbox>
</object>
I'm interested in getting the content of the <bndbox> tag at the very end of this snippet via the command
box = object.bndbox
But if I print out box, I keep getting this:
<bndbox>
<xmin>337</xmin>
<ymin>2</ymin>
<xmax>382</xmax>
<ymax>66</ymax>
</bndbox>
This makes no sense to me. The box above that I keep getting is one hierarchy level lower than what I'm asking for, under a <part> tag, so I should only be able to access this box via
object.part.bndbox
while
object.bndbox
should give me the only box that is hierarchically directly under the object tag, which is the last box in the snippet above.
As stated by @mjsqu in the comments:
BeautifulSoup returns the first tag matching that name, so object.bbox refers to the first bbox in the XML, regardless of position in the hierarchy.
So, to get the second <bndbox> tag, or, the <bndbox> which is the direct child of the <object> tag, you can use recursive=False as a parameter. This will look only for the tags that are direct children of the current tag.
xml = '''
<object>
<name>person</name>
<part>
<name>head</name>
<bndbox>
<xmin>337</xmin>
<ymin>2</ymin>
<xmax>382</xmax>
<ymax>66</ymax>
</bndbox>
</part>
<bndbox>
<xmin>334</xmin>
<ymin>1</ymin>
<xmax>436</xmax>
<ymax>373</ymax>
</bndbox>
</object>'''
soup = BeautifulSoup(xml, 'xml')
print(soup.object.find('bndbox', recursive=False))
Output:
<bndbox>
<xmin>334</xmin>
<ymin>1</ymin>
<xmax>436</xmax>
<ymax>373</ymax>
</bndbox>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With