Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to escape parent attribute in BeautifulSoup ISO tag actually named <parent>?

OK, this is kind of funny. Here is the XML:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

    <parent>
        <groupId>com.parent</groupId>
        <artifactId>parent</artifactId>
        <version>1.0-SNAPSHOT</version>
        <relativePath>../pom.xml</relativePath>
    </parent>

    <build>
        <sourceDirectory>src</sourceDirectory>
    </build>

I want to use the simple BeautifulSoup hierarchical notation to get to the node actually named <parent> but parent is actually a reserved attribute label in this API.

with open(pom) as pomHandle:
    soup = BeautifulSoup(pomHandle)

#this returns the proper build node
buildNode = soup.project.build
#this does not return the proper parent node but the XML parent of the project node
#(which is the whole doc) because 'parent' is reserved
parentNode = soup.project.parent

How do I override this limitation?

like image 278
amphibient Avatar asked Jan 22 '26 05:01

amphibient


1 Answers

You can use find() instead:

soup.project.find('parent')

Essentially this is the same thing since BeautifulSoup uses find under-the-hood in __getattr__() method of a Tag class.

Hope that helps.

like image 151
alecxe Avatar answered Jan 23 '26 19:01

alecxe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!