How would I extract the value of this HTML element attribute with Beautiful Soup?

Question

I am developing a small tool to scrape a webpage. I am using Beautiful Soup. I would like to fetch the class id from the page. The HTML code looks something like this:

<span class='class_id' id='New_line'></span>

How would I obtain class_id?

wal-o-mat · Accepted Answer

This answer refers to an older version of the question where beautifulsoup has not been mentioned

You can use LXML and iterate over all elements asking them for the value of their "class" attribute. LXML is a library for parsing XML documents.

Like, for example:

from lxml import etree
root = etree.parse(filename).getroot()

for span in root.iterdescendants("span"):
    cls = span.attrib.get("class")

luc · Answer

Does the following example may help you?

>>> from BeautifulSoup import BeautifulSoup as B
>>> s = B("<span class='class_id' id='New_line'></span>")
>>> s.span.attrs
[(u'class', u'class_id'), (u'id', u'New_line')]

How would I extract the value of this HTML element attribute with Beautiful Soup?

Tags:

python

html

beautifulsoup

web-scraping

screen-scraping

Kiran

2 Answers

wal-o-mat

luc

Recent Activity

Donate For Us

How would I extract the value of this HTML element attribute with Beautiful Soup?

Tags:

python

html

beautifulsoup

web-scraping

screen-scraping

Kiran

2 Answers

wal-o-mat

luc

Related questions

Recent Activity

Donate For Us