Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to Find a tag without specific attribute using beautifulsoup?

I'm trying to get the content of the 'p' tags that didn't have the specific attribute.

I have some tags with 'class'='cost', and some tags with 'class'='cost' and 'itemprop'='price'

all_cars = soup.find_all('div', attrs={'class': 'listdata'})
...
...
tatal_cost= car.findChildren('p', attrs={'class': 'cost'})
cost= car.findChildren('p', attrs={'class': 'cost', 'itemprop':'price'})

I am trying to find 'p' tags without 'itemprop' attribute, but i cant find any solution.

like image 486
rodahviing Avatar asked Jan 12 '19 19:01

rodahviing


People also ask

How to get the attribute value of an element using beautifulsoup?

Beautifulsoup: Get the attribute value of an element 1. Find all by ul tag. 2. Iterate over the result. 3. Get the class value of each element. In the following example, well get the href attribute value. 3. Beautifulsoup: Find all by multiple attributes

How to parse HTML tags in beautifulsoup?

Here we first import the regular expressions and BeautifulSoup libraries. Then we open the HTML file using the open function which we want to parse. Then using the find_all function, we find a particular tag that we pass inside that function and also the text we want to have within the tag.

What are attribute attributes in Beautiful Soup?

Attributes are provided by Beautiful Soup which is a web scraping framework for Python. Web scraping is the process of extracting data from the website using automated tools to make the process faster. A tag may have any number of attributes. For example, the tag <b class=”active”> has an attribute “class” whose value is “active”.

How to iterate over the parents of a beautifulsoup object?

The .parent of a Beautifulsoup object is defined as None − To iterate over all the parents elements, use .parents attribute. In the above doc, <b> and <c> tag is at the same level and they are both children of the same tag. Both <b> and <c> tag are siblings.


1 Answers

BeautifulSoup's built-in attribute filters are enough for this. You can give True as value to simple check if the attribute is present. None can be used to specify that the attribute should not be present. Likewise the value can be any attribute value (eg 'cost').

from bs4 import BeautifulSoup
html="""
<p class="cost">paragraph 1</p>
<p class="cost">paragraph 2</p>
<p class="cost">paragraph 3</p>
<p class="cost" itemprop="1">paragraph 4</p>
<p class="somethingelse">paragraph 5</p>
"""
soup=BeautifulSoup(html,'html.parser')
print("---without 'itemprop' attribute")
print(soup.find_all('p',itemprop=None))
print("---with class = 'cost' and without 'itemprop' attribute----")
print(soup.find_all('p',attrs={'itemprop':None,"class":'cost'}))
#below is an alternative way to specify this
#print(soup.find_all('p',itemprop=None,class_='cost'))

Output

---without 'itemprop' attribute
[<p class="cost">paragraph 1</p>, <p class="cost">paragraph 2</p>, <p class="cost">paragraph 3</p>, <p class="somethingelse">paragraph 5</p>]
---with class = 'cost' and without 'itemprop' attribute----
[<p class="cost">paragraph 1</p>, <p class="cost">paragraph 2</p>, <p class="cost">paragraph 3</p>]
like image 175
Bitto Bennichan Avatar answered Nov 15 '22 08:11

Bitto Bennichan