Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the equivalent of '*' for Beautifulsoup - find_all?

I am trying to get all <tr class="**colour blue** attr1 attr2"> from a page.

The attrs are different each time, and some of the other sibling <tr>s have colour red, colour pink etc. classes.

So I'm looking for any other characters after colour blue in class to be included in the result. I've tried using *, but it didn't work:

soup.find_all('tr', {'class': 'colour blue*'})

Thank you

like image 971
StevenH Avatar asked Feb 26 '17 08:02

StevenH


People also ask

What does Find_all return BeautifulSoup?

Beautiful Soup's find_all(~) method returns a list of all the tags or strings that match a particular criteria.

What is the difference between Find_all () and find () in BeautifulSoup?

find is used for returning the result when the searched element is found on the page. find_all is used for returning all the matches after scanning the entire document.

What is Beautifulstonesoup?

Beautiful Soup is a Python library that makes it easy to scrape information from web pages. It sits atop an HTML or XML parser and provides Pythonic idioms for iterating, searching, and modifying the parse tree.


2 Answers

You can use commonly-used CSS Selectors with beautiful soup:

>>> soup = BeautifulSoup('''
...     <tr class="colour blue attr1 attr2"></tr>
...     <tr class="colour red attr1 attr2"></tr>
...     <tr class="unwanted attr1 attr2"></tr>
...     <tr class="colour blue attr3"></tr>
...     <tr class="another attr1 attr2"></tr>
... ''')
>>> soup.select('tr.colour.blue')
[<tr class="colour blue attr1 attr2"></tr>, <tr class="colour blue attr3"></tr>]

tr.colours.blue selector will match tr as long as it has colours and blue class attributes.

like image 146
falsetru Avatar answered Sep 24 '22 08:09

falsetru


Use regex filter:

import re

soup.find_all('tr', class_=re.compile(r'colour blue.+'))
  • In regex, it uses re.search() to find the string.

  • . means match any character except the newline.

  • + means match . more than one time.

like image 38
宏杰李 Avatar answered Sep 25 '22 08:09

宏杰李