i have an XML file with an defined structure but different number of tags, like
file1.xml:
<document>
<subDoc>
<id>1</id>
<myId>1</myId>
</subDoc>
</document>
file2.xml:
<document>
<subDoc>
<id>2</id>
</subDoc>
</document>
Now i like to check, if the tag myId
exits. So i did the following:
data = open("file1.xml",'r').read()
xml = BeautifulSoup(data)
hasAttrBs = xml.document.subdoc.has_attr('myID')
hasAttrPy = hasattr(xml.document.subdoc,'myID')
hasType = type(xml.document.subdoc.myid)
The result is for file1.xml:
hasAttrBs -> False
hasAttrPy -> True
hasType -> <class 'bs4.element.Tag'>
file2.xml:
hasAttrBs -> False
hasAttrPy -> True
hasType -> <type 'NoneType'>
Okay, <myId>
is not an attribute of <subdoc>
.
But how i can test, if an sub-tag exists?
//Edit: By the way: I'm don't really like to iterate trough the whole subdoc, because that will be very slow. I hope to find an way where I can direct address/ask that element.
Beautiful Soup's find_all(~) method returns a list of all the tags or strings that match a particular criteria.
A tag object in BeautifulSoup corresponds to an HTML or XML tag in the actual page or document. Tags contain lot of attributes and methods and two important features of a tag are its name and attributes.
BeautifulSoup is a Python package that parses broken HTML, just like lxml supports it based on the parser of libxml2.
if tag.find('child_tag_name'):
The simplest way to find if a child tag exists is simply
childTag = xml.find('childTag')
if childTag:
# do stuff
More specifically to OP's question:
If you don't know the structure of the XML doc, you can use the .find()
method of the soup. Something like this:
with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
xml = BeautifulSoup(data.read())
xml2 = BeautifulSoup(data2.read())
hasAttrBs = xml.find("myId")
hasAttrBs2 = xml2.find("myId")
If you do know the structure, you can get the desired element by accessing the tag name as an attribute like this xml.document.subdoc.myid
. So the whole thing would go something like this:
with open("file1.xml",'r') as data, open("file2.xml",'r') as data2:
xml = BeautifulSoup(data.read())
xml2 = BeautifulSoup(data2.read())
hasAttrBs = xml.document.subdoc.myid
hasAttrBs2 = xml2.document.subdoc.myid
print hasAttrBs
print hasAttrBs2
Prints
<myid>1</myid>
None
Here's an example to check if h2 tag exists in an Instagram URL. Hope you find it useful:
import datetime
import urllib
import requests
from bs4 import BeautifulSoup
instagram_url = 'https://www.instagram.com/p/BHijrYFgX2v/?taken-by=findingmero'
html_source = requests.get(instagram_url).text
soup = BeautifulSoup(html_source, "lxml")
if not soup.find('h2'):
print("didn't find h2")
you can handle it like this:
for child in xml.document.subdoc.children:
if 'myId' == child.name:
return True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With