Is there any way to find a nonrecursive DOM subnode in Python using BeautifulSoup
?
E.g. consider parsing a pom.xml
file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<parent>
<groupId>com.parent</groupId>
<artifactId>parent</artifactId>
<version>1.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>2.0.0</modelVersion>
<groupId>com.parent.somemodule</groupId>
<artifactId>some_module</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Some Module</name>
...
If I want to get groupId
at the top level (specifically project->groupId
, not project->parent->groupId
), I use:
with open(pom) as pomHandle:
soup = BeautifulSoup(pomHandle)
groupId = soup.groupid.text
But unfortunately, that finds the first physical occurrence of groupId
in the file regardless of the hierarchy level, which is project->parent->groupId
. I actually want to do a unrecursive find ONLY at a specific node level, not within its children. Is there a way to do it in BeautifulSoup
?
BeautifulSoup has a limited support for CSS selectors, but covers most commonly used ones. Use select() method to find multiple elements and select_one() to find a single element.
Answer #1: You can use extract() to remove unwanted tag before you get text. But it keeps all 'n' and spaces so you will need some work to remove them. You can skip every Tag object inside external span and keep only NavigableString objects (it is plain text in HTML).
Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.
You can search inside "project" node with recursive=False
:
groupId = soup.project.find('groupid', recursive=False).text
Hope that helps.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With