Is there any way to find a nonrecursive DOM subnode in Python using <code>BeautifulSoup</code>? E.g. consider parsing a <code>pom.xml</code> file: <pre class="prettyprint"><code><project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd"> <parent> <groupId>com.parent</groupId> <artifactId>parent</artifactId> <version>1.0-SNAPSHOT</version> <relativePath>../pom.xml</relativePath> </parent> <modelVersion>2.0.0</modelVersion> <groupId>com.parent.somemodule</groupId> <artifactId>some_module</artifactId> <packaging>jar</packaging> <version>1.0-SNAPSHOT</version> <name>Some Module</name> ... </code></pre> If I want to get <code>groupId</code> at the top level (specifically <code>project->groupId</code>, not <code>project->parent->groupId</code>), I use: <pre class="prettyprint"><code>with open(pom) as pomHandle: soup = BeautifulSoup(pomHandle) groupId = soup.groupid.text </code></pre> But unfortunately, that finds the first physical occurrence of <code>groupId</code> in the file regardless of the hierarchy level, which is <code>project->parent->groupId</code>. I actually want to do a unrecursive find ONLY at a specific node level, not within its children. Is there a way to do it in <code>BeautifulSoup</code>?

You can search inside "project" node with <code>recursive=False</code>: <pre class="prettyprint"><code>groupId = soup.project.find('groupid', recursive=False).text </code></pre> Hope that helps.

Finding a nonrecursive DOM subnode in Python using BeautifulSoup

Tags:

python

dom

xml

xml-parsing

beautifulsoup

Is there any way to find a nonrecursive DOM subnode in Python using BeautifulSoup?

E.g. consider parsing a pom.xml file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

    <parent>
        <groupId>com.parent</groupId>
        <artifactId>parent</artifactId>
        <version>1.0-SNAPSHOT</version>
        <relativePath>../pom.xml</relativePath>
    </parent>

    <modelVersion>2.0.0</modelVersion>
    <groupId>com.parent.somemodule</groupId>
    <artifactId>some_module</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>Some Module</name>
    ...

If I want to get groupId at the top level (specifically project->groupId, not project->parent->groupId), I use:

with open(pom) as pomHandle:
    soup = BeautifulSoup(pomHandle)

groupId = soup.groupid.text

But unfortunately, that finds the first physical occurrence of groupId in the file regardless of the hierarchy level, which is project->parent->groupId. I actually want to do a unrecursive find ONLY at a specific node level, not within its children. Is there a way to do it in BeautifulSoup?

712

asked Jan 15 '14 20:01

amphibient

1 Answers

You can search inside "project" node with recursive=False:

groupId = soup.project.find('groupid', recursive=False).text

Hope that helps.

answered Oct 20 '22 19:10

alecxe

Related questions
                            
                                Python to Java code conversion
                            
                                How to run Python scripts on a web server (e.g localhost)
                            
                                Python: numpy.insert NaN value
                            
                                Visual Python working very slowly
                            
                                How to find next day's Unix timestamp for same hour, including DST, in Python?
                            
                                Can python dictionary comprehension be used to create a dictionary of substrings and their locations?
                            
                                Flask matplotlib graphics in template
                            
                                Use classes or modules to group static like methods? [closed]
                            
                                Setting the angle of a turtle in Python
                            
                                Gtk-Message: Failed to load module "canberra-gtk-module"
                            
                                using threading in pygame
                            
                                '_csv.writer' object has no attribute 'write'
                            
                                Python + GTK - How to suppress warnings
                            
                                Creating a view function without returning a response in Flask
                            
                                How to properly handle wrong urlsafe key provided? [duplicate]
                            
                                how to find source collections.deque?
                            
                                How to call a celery task delay function from non-python languages such as Java?
                            
                                "OSError: dlopen(libSystem.dylib, 6): image not found" (OS X + macports + Celery 3.1.7)
                            
                                get div from HTML with Python
                            
                                How to use the debugging tool in Spyder for python scripts?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With