I am having problems using Python (2.7). The code basically consists of:
str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulStoneSoup(str)
for x in z.findAll('el'):
# if 'at' in x:
# if hasattr(x, 'at'):
print x['at']
else:
print 'nothing'
I expected the first if
statement to work correctly (ie: if at
doesn't exist, print "nothing"
), but it always prints nothing (ie: is always False
). The second if
on the other hand is always True
, which will cause the code to raise a KeyError
when trying to access at
from the second <el>
element, which of course doesn't exist.
BeautifulSoup is a Python package that parses broken HTML, just like lxml supports it based on the parser of libxml2.
To extract attributes of elements in Beautiful Soup, use the [~] notation. For instance, el["id"] retrieves the value of the id attribute.
We can do this by using the Request library of Python. Fetching and parsing the data using Beautifulsoup and maintain the data in some data structure such as Dict or List. Analyzing the HTML tags and their attributes, such as class, id, and other HTML tag attributes.
The in
operator is for sequence and mapping types, what makes you think the object returned by BeautifulSoup
is supposed to implement it correctly? According to the BeautifulSoup docs, you should access attributes using the []
syntax.
Re hasattr
, I think you confused HTML/XML attributes and Python object attributes. hasattr
is for the latter, and BeaitufulSoup AFAIK doesn't reflect the HTML/XML attributes it parsed in its own object attributes.
P.S. note that the Tag
object in BeautifulSoup
does implement __contains__
- so maybe you're trying with the wrong object? Can you show a complete but minimal example that demonstrates the problem?
Running this:
from BeautifulSoup import BeautifulSoup
str = '<el at="some">ABC</el><el>DEF</el>'
z = BeautifulSoup(str)
for x in z.findAll('el'):
print type(x)
print x['at']
I get:
<class 'BeautifulSoup.Tag'>
some
<class 'BeautifulSoup.Tag'>
Traceback (most recent call last):
File "soup4.py", line 8, in <module>
print x['at']
File "C:\Python26\lib\site-packages\BeautifulSoup.py", line 601, in __getitem__
return self._getAttrMap()[key]
KeyError: 'at'
Which is what I expected. The first el
has a at
attribute, the second doesn't - and this throws a KeyError
.
Update 2: the BeautifulSoup.Tag.__contains__
looks inside the contents of the tag, not its attributes. To check if an attribute exists use in
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With