Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How find specific data attribute from html tag in BeautifulSoup4?

Is there a way to find an element using only the data attribute in html, and then grab that value?

For example, with this line inside an html doc:

<ul data-bin="Sdafdo39">

How do I retrieve Sdafdo39 by searching the entire html doc for the element that has the data-bin attribute?

like image 465
user21398 Avatar asked Jun 13 '14 04:06

user21398


3 Answers

A little bit more accurate

[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]


This way, the iterated list only has the ul elements that has the attr you want to find

from bs4 import BeautifulSoup
bs = BeautifulSoup(html_doc)
html_doc = """<ul class="foo">foo</ul><ul data-bin="Sdafdo39">"""
[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]


like image 199
xecgr Avatar answered Oct 20 '22 05:10

xecgr


You can use find_all method to get all the tags and filtering based on "data-bin" found in its attributes will get us the actual tag which has got it. Then we can simply extract the value corresponding to it, like this

from bs4 import BeautifulSoup
html_doc = """<ul data-bin="Sdafdo39">"""
bs = BeautifulSoup(html_doc)
print [item["data-bin"] for item in bs.find_all() if "data-bin" in item.attrs]
# ['Sdafdo39']
like image 40
thefourtheye Avatar answered Oct 20 '22 04:10

thefourtheye


You could solve this with gazpacho in just a couple of lines:

First, import and turn the html into a Soup object:

from gazpacho import Soup

html = """<ul data-bin="Sdafdo39">"""
soup = Soup(html)

Then you can just search for the "ul" tag and extract the href attribute:

soup.find("ul").attrs["data-bin"]
# Sdafdo39
like image 43
emehex Avatar answered Oct 20 '22 05:10

emehex