I am trying to get all <code><tr class="**colour blue** attr1 attr2"></code> from a page. The <code>attrs</code> are different each time, and some of the other sibling <code><tr>s</code> have <code>colour red</code>, <code>colour pink</code> etc. classes. So I'm looking for any other characters after <code>colour blue</code> in <code>class</code> to be included in the result. I've tried using <code>*</code>, but it didn't work: <pre class="prettyprint"><code>soup.find_all('tr', {'class': 'colour blue*'}) </code></pre> Thank you

You can use commonly-used CSS Selectors with beautiful soup: <pre class="prettyprint"><code>>>> soup = BeautifulSoup(''' ... <tr class="colour blue attr1 attr2"></tr> ... <tr class="colour red attr1 attr2"></tr> ... <tr class="unwanted attr1 attr2"></tr> ... <tr class="colour blue attr3"></tr> ... <tr class="another attr1 attr2"></tr> ... ''') >>> soup.select('tr.colour.blue') [<tr class="colour blue attr1 attr2"></tr>, <tr class="colour blue attr3"></tr>] </code></pre> <code>tr.colours.blue</code> selector will match <code>tr</code> as long as it has <code>colours</code> and <code>blue</code> class attributes.

Use regex filter: <pre class="prettyprint"><code>import re soup.find_all('tr', class_=re.compile(r'colour blue.+')) </code></pre> <ul> <li>In regex, it uses <code>re.search()</code> to find the string. </li> <li><code>.</code> means match any character except the newline. </li> <li><code>+</code> means match <code>.</code> more than one time.</li> </ul>

What's the equivalent of '*' for Beautifulsoup - find_all?

Tags:

python

beautifulsoup

I am trying to get all <tr class="**colour blue** attr1 attr2"> from a page.

The attrs are different each time, and some of the other sibling <tr>s have colour red, colour pink etc. classes.

So I'm looking for any other characters after colour blue in class to be included in the result. I've tried using *, but it didn't work:

soup.find_all('tr', {'class': 'colour blue*'})

Thank you

971

asked Feb 26 '17 08:02

StevenH

2 Answers

You can use commonly-used CSS Selectors with beautiful soup:

>>> soup = BeautifulSoup('''
...     <tr class="colour blue attr1 attr2"></tr>
...     <tr class="colour red attr1 attr2"></tr>
...     <tr class="unwanted attr1 attr2"></tr>
...     <tr class="colour blue attr3"></tr>
...     <tr class="another attr1 attr2"></tr>
... ''')
>>> soup.select('tr.colour.blue')
[<tr class="colour blue attr1 attr2"></tr>, <tr class="colour blue attr3"></tr>]

tr.colours.blue selector will match tr as long as it has colours and blue class attributes.

146

answered Sep 24 '22 08:09

falsetru

Use regex filter:

import re

soup.find_all('tr', class_=re.compile(r'colour blue.+'))

In regex, it uses re.search() to find the string.
. means match any character except the newline.
+ means match . more than one time.

answered Sep 25 '22 08:09

宏杰李

Related questions
                            
                                How can Python be used to write line breaks to a csv as '\n'?
                            
                                Python add custom property/metadata to file
                            
                                Using a decorator function defined as an instance variable
                            
                                Can this cython code be optimized?
                            
                                Use Python regex to parse string of floats output by Java Arrays.deepToString
                            
                                How save list to file in spark?
                            
                                Getting blocked when scraping Amazon (even with headers, proxies, delay) [closed]
                            
                                How can asyncio ever not be thread safe considering the GIL?
                            
                                How do I keep the timezone of my index when serializing/deserializing a Pandas DataFrame using JSON
                            
                                pip upgrade uninstalled itself
                            
                                Python winreg module access denied
                            
                                Is there any way to import python modules for an entire package?
                            
                                Change color of __init__ and other predefined items in Pycharm's syntax highlighting
                            
                                Can PyCharm reuse a console for running code?
                            
                                Pandas/Numpy Get matrix from column of arrays
                            
                                Django Rest Framework - Updating a model using model.ModelViewSet
                            
                                How do I write tests for the functionality of Django admin pages?
                            
                                Apache Airflow - customize logging format
                            
                                Python ElementTree - iterate through child nodes and text in order
                            
                                How to use yield function in python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With