How would I, using BeautifulSoup, search for tags containing ONLY the attributes I search for? For example, I want to find all <code><td valign="top"></code> tags. The following code: <code>raw_card_data = soup.fetch('td', {'valign':re.compile('top')})</code> gets all of the data I want, but also grabs any <code><td></code> tag that has the attribute <code>valign:top</code> I also tried: <code>raw_card_data = soup.findAll(re.compile('<td valign="top">'))</code> and this returns nothing (probably because of bad regex) I was wondering if there was a way in BeautifulSoup to say "Find <code><td></code> tags whose only attribute is <code>valign:top</code>" UPDATE FOr example, if an HTML document contained the following <code><td></code> tags: <pre class="prettyprint"><code><td valign="top">.....</td> <td width="580" valign="top">.......</td> <td>.....</td> </code></pre> I would want only the first <code><td></code> tag (<code><td width="580" valign="top"></code>) to return

As explained on the BeautifulSoup documentation You may use this : <pre class="prettyprint"><code>soup = BeautifulSoup(html) results = soup.findAll("td", {"valign" : "top"}) </code></pre> EDIT : To return tags that have only the valign="top" attribute, you can check for the length of the tag <code>attrs</code> property : <pre class="prettyprint"><code>from BeautifulSoup import BeautifulSoup html = '<td valign="top">.....</td>\ <td width="580" valign="top">.......</td>\ <td>.....</td>' soup = BeautifulSoup(html) results = soup.findAll("td", {"valign" : "top"}) for result in results : if len(result.attrs) == 1 : print result </code></pre> That returns : <pre class="prettyprint"><code><td valign="top">.....</td> </code></pre>

You can use <code>lambda</code> functions in <code>findAll</code> as explained in documentation. So that in your case to search for <code>td</code> tag with only <code>valign = "top"</code> use following: <pre class="prettyprint"><code>td_tag_list = soup.findAll( lambda tag:tag.name == "td" and len(tag.attrs) == 1 and tag["valign"] == "top") </code></pre>

How to find tags with only certain attributes - BeautifulSoup

Tags:

python

beautifulsoup

How would I, using BeautifulSoup, search for tags containing ONLY the attributes I search for?

For example, I want to find all <td valign="top"> tags.

The following code: raw_card_data = soup.fetch('td', {'valign':re.compile('top')})

gets all of the data I want, but also grabs any <td> tag that has the attribute valign:top

I also tried: raw_card_data = soup.findAll(re.compile('<td valign="top">')) and this returns nothing (probably because of bad regex)

I was wondering if there was a way in BeautifulSoup to say "Find <td> tags whose only attribute is valign:top"

UPDATE FOr example, if an HTML document contained the following <td> tags:

<td valign="top">.....</td><br />
<td width="580" valign="top">.......</td><br />
<td>.....</td><br />

I would want only the first <td> tag (<td width="580" valign="top">) to return

239

asked Jan 19 '12 21:01

Snaxib

3 Answers

As explained on the BeautifulSoup documentation

You may use this :

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})

EDIT :

To return tags that have only the valign="top" attribute, you can check for the length of the tag attrs property :

from BeautifulSoup import BeautifulSoup

html = '<td valign="top">.....</td>\
        <td width="580" valign="top">.......</td>\
        <td>.....</td>'

soup = BeautifulSoup(html)
results = soup.findAll("td", {"valign" : "top"})

for result in results :
    if len(result.attrs) == 1 :
        print result

That returns :

<td valign="top">.....</td>

162

answered Oct 11 '22 12:10

Loïc G.

You can use lambda functions in findAll as explained in documentation. So that in your case to search for td tag with only valign = "top" use following:

td_tag_list = soup.findAll(
                lambda tag:tag.name == "td" and
                len(tag.attrs) == 1 and
                tag["valign"] == "top")

answered Oct 11 '22 12:10

Yogesh

if you want to only search with attribute name with any value

from bs4 import BeautifulSoup
import re

soup= BeautifulSoup(html.text,'lxml')
results = soup.findAll("td", {"valign" : re.compile(r".*")})

as per Steve Lorimer better to pass True instead of regex

results = soup.findAll("td", {"valign" : True})

answered Oct 11 '22 12:10

Amr

Related questions
                            
                                Dynamically import a method in a file, from a string
                            
                                is it possible to do fuzzy match merge with python pandas?
                            
                                Find all occurrences of a key in nested dictionaries and lists
                            
                                Apache Spark: How to use pyspark with Python 3
                            
                                How to delete all columns in DataFrame except certain ones?
                            
                                Selenium: FirefoxProfile exception Can't load the profile
                            
                                Convert a space delimited string to list [duplicate]
                            
                                Python Pandas How to assign groupby operation results back to columns in parent dataframe?
                            
                                python request with authentication (access_token)
                            
                                How to create an empty R vector to add new items
                            
                                Django Rest Framework - How to add custom field in ModelSerializer
                            
                                Copy file with pathlib in Python
                            
                                How to redirect stdout to both file and console with scripting?
                            
                                Python - add PYTHONPATH during command line module run
                            
                                Reading a UTF8 CSV file with Python
                            
                                Getting Python error "from: can't read /var/mail/Bio"
                            
                                ImportError: No module named scipy
                            
                                How to filter in NaN (pandas)?
                            
                                Where does Anaconda Python install on Windows?
                            
                                Spark Error - Unsupported class file major version

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to find tags with only certain attributes - BeautifulSoup

Tags:

python

beautifulsoup

Snaxib

People also ask

3 Answers

Loïc G.

Yogesh

Amr

Recent Activity

Donate For Us