Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select tags by attribute value with Beautiful Soup

I have the following HTML fragment:

>>> a
<div class="headercolumn">
<h2>
<a class="results" data-name="result-name" href="/xxy> my text</a>
</h2>

I am trying to select header column only if attribute data-name="result-name"

I've tried:

>>> a.select('a["data-name="result-name""]')

This gives:

ValueError: Unsupported or invalid CSS selector: 

How can I get this working?

like image 494
user1592380 Avatar asked Jul 27 '14 18:07

user1592380


People also ask

Which method in BeautifulSoup is used for extracting the attributes from HTML?

We can access a tag's attributes by treating it like a dictionary. Implementation: Example 1: Program to extract the attributes using attrs approach.

How to find elements by attribute in Beautiful Soup?

To find elements by attribute in Beautiful Soup, us the select (~) method or the find_all (~) method. Note that the square bracket in CSS denotes attributes.

How to find a tag with the given attribute value in HTML?

In this article, we will discuss how beautifulsoup can be employed to find a tag with the given attribute value in an HTML document. Import module. Scrap data from a webpage. Parse the string scraped to HTML. Use find () function to find the attribute and tag. Print the result.

How to iterate over tag’s children in Beautiful Soup?

One of the important pieces of element in any piece of HTML document are tags, which may contain other tags/strings (tag’s children). Beautiful Soup provides different ways to navigate and iterate over’s tag’s children. Easiest way to search a parse tree is to search the tag by its name.

How to get all attributes of an element in HTML?

To get all attributes of an element, you need to follow this code: Let's see how to get the attribute class. Find all by ul tag. Iterate over the result. Get the class value of each element. In the below example, we'll get the value of the href attribute. attrs={"attribute":"value", "attribute":"value",...}


2 Answers

You can simply do this :

soup = BeautifulSoup(html)
results = soup.findAll("a", {"data-name" : "result-name"})

Source : How to find tags with only certain attributes - BeautifulSoup

like image 154
Azwr Avatar answered Nov 15 '22 08:11

Azwr


html = """
<div class="headercolumn">
<h2>
<a class="results" data-name="result-name" href="/xxy> my text</a>
</h2>
"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
for d in soup.findAll("div",{"class":"headercolumn"}):
    print d.a.get("data-name")
    print d.select("a.results")

result-name
[<a class="results" data-name="result-name" href="/xxy&gt; my text&lt;/a&gt;&lt;/h2&gt;"></a>]
like image 42
Padraic Cunningham Avatar answered Nov 15 '22 07:11

Padraic Cunningham