Get attribute values by BeautifulSoup

Question

I want to get all data-js attribute values from the content by BeautifulSoup.

Input:

<p data-js="1, 2, 3">some text..</p><p data-js="5">some 1 text</p><p data-js="4"> some 2 text. </p>

Output:

['1, 2, 3', '5', '4']

I've done it with lxml:

>>> content = """<p data-js="1, 2, 3">some text..</p><p data-js="5">some 1 text</p><p data-js="4"> some 2 text. </p>"""
>>> import lxml.html as PARSER
>>> root = PARSER.fromstring(content)
>>> root.xpath("//*/@data-js")
['1, 2, 3', '5', '4']

I want the above result via BeautifulSoup.

alecxe · Accepted Answer

The idea would to find all elements having data-js attributes and collect them in a list:

from bs4 import BeautifulSoup


data = """
<p data-js="1, 2, 3">some text..</p><p data-js="5">some 1 text</p><p data-js="4"> some 2 text. </p>
"""

soup = BeautifulSoup(data)
print [elm['data-js'] for elm in soup.find_all(attrs={"data-js": True})]

Prints ['1, 2, 3', '5', '4'].

Get attribute values by BeautifulSoup

Tags:

python

html

html-parsing

beautifulsoup

Vivek Sable

1 Answers

alecxe

Recent Activity

Donate For Us

Get attribute values by BeautifulSoup

Tags:

python

html

html-parsing

beautifulsoup

Vivek Sable

1 Answers

alecxe

Related questions

Recent Activity

Donate For Us