Is there a way to find an element using only the data attribute in html, and then grab that value?
For example, with this line inside an html doc:
<ul data-bin="Sdafdo39">
How do I retrieve Sdafdo39
by searching the entire html doc for the element that has the data-bin
attribute?
A little bit more accurate
[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]
This way, the iterated list only has the ul elements that has the attr you want to find
from bs4 import BeautifulSoup
bs = BeautifulSoup(html_doc)
html_doc = """<ul class="foo">foo</ul><ul data-bin="Sdafdo39">"""
[item['data-bin'] for item in bs.find_all('ul', attrs={'data-bin' : True})]
You can use find_all
method to get all the tags and filtering based on "data-bin" found in its attributes will get us the actual tag which has got it. Then we can simply extract the value corresponding to it, like this
from bs4 import BeautifulSoup
html_doc = """<ul data-bin="Sdafdo39">"""
bs = BeautifulSoup(html_doc)
print [item["data-bin"] for item in bs.find_all() if "data-bin" in item.attrs]
# ['Sdafdo39']
You could solve this with gazpacho in just a couple of lines:
First, import and turn the html into a Soup
object:
from gazpacho import Soup
html = """<ul data-bin="Sdafdo39">"""
soup = Soup(html)
Then you can just search for the "ul" tag and extract the href attribute:
soup.find("ul").attrs["data-bin"]
# Sdafdo39
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With