So this is how my HTML looks that I'm parsing. It is all within a table and gets repeated multiple times and I just want the <code>href</code> attribute value that is inside the div with the attribute <code>class="Special_Div_Name"</code>. All these divs are then inside table rows and there are lots of rows. <pre class="prettyprint"><code><tr> <div class="Special_Div_Name"> <a href="something.mp3">text</a> </div> </tr> </code></pre> What I want is only the <code>href</code> attribute values that end in ".mp3" that are inside the div with the attribute <code>class="Special_Div_Name"</code>. So far I was able to come up with this code: <pre class="prettyprint"><code>download = soup.find_all('a', href = re.compile('.mp3')) for text in download: hrefText = (text['href']) print hrefText </code></pre> This code currently prints off every <code>href</code> attribute value on the page that ends in ".mp3" and it's very close to doing exactly what I want. Its just I only want the ".mp3"s that are inside that div class.

Since Beautiful Soup accepts most CSS selectors with the <code>.select()</code> method, I'd suggest using the attribute selector <code>[href$=".mp3"]</code> in order to select <code>a</code> elements with an <code>href</code> attribute ending with <code>.mp3</code>. Then you can just prepend the selector <code>.Special_Div_Name</code> in order to only select anchor elements that are descendants: <pre class="prettyprint"><code>for a in soup.select('div.Special_Div_Name a[href$=".mp3"]'): print (a['href']) </code></pre> In a more general case, if you would just like to select <code>a</code> elements with an <code>[href]</code> attribute that are a descendant of a <code>div</code> element, then you would use the selector <code>div a[href]</code>: <pre class="prettyprint"><code>for a in soup.select('div a[href]'): print (a) </code></pre> <hr> If you don't use the code above, then based on the original code that you provided, you would need to select all the elements with a class of <code>Special_Div_Name</code>, then you would need to iterate over those elements and select the descendant anchor elements: <pre class="prettyprint"><code>for div in soup.select('.Special_Div_Name'): for a in div.find_all('a', href = re.compile('\.mp3$')): print (a['href']) </code></pre> As a side note, <code>re.compile('.mp3')</code> should be <code>re.compile('\.mp3$')</code> since <code>.</code> has special meaning in a regular expression. In addition, you will also want the anchor <code>$</code> in order to match at the end of the sting (rather than anywhere in the string).

How to find all anchor tags inside a div using Beautifulsoup in Python

Tags:

python

html

beautifulsoup

python-2.7

web-scraping

So this is how my HTML looks that I'm parsing. It is all within a table and gets repeated multiple times and I just want the href attribute value that is inside the div with the attribute class="Special_Div_Name". All these divs are then inside table rows and there are lots of rows.

<tr>
   <div class="Special_Div_Name">
      <a href="something.mp3">text</a>
   </div>
</tr>

What I want is only the href attribute values that end in ".mp3" that are inside the div with the attribute class="Special_Div_Name".

So far I was able to come up with this code:

download = soup.find_all('a', href = re.compile('.mp3'))
for text in download:
    hrefText = (text['href'])
    print hrefText

This code currently prints off every href attribute value on the page that ends in ".mp3" and it's very close to doing exactly what I want. Its just I only want the ".mp3"s that are inside that div class.

339

asked Feb 18 '16 01:02

ddschmitz

2 Answers

This minor adjustment should get you what you want:

special_divs = soup.find_all('div',{'class':'Special_Div_Name'})
for text in special_divs:
    download = text.find_all('a', href = re.compile('\.mp3$'))
    for text in download:
        hrefText = (text['href'])
        print hrefText

answered Oct 29 '22 15:10

rofls

Since Beautiful Soup accepts most CSS selectors with the .select() method, I'd suggest using the attribute selector [href$=".mp3"] in order to select a elements with an href attribute ending with .mp3.

Then you can just prepend the selector .Special_Div_Name in order to only select anchor elements that are descendants:

for a in soup.select('div.Special_Div_Name a[href$=".mp3"]'):
    print (a['href'])

In a more general case, if you would just like to select a elements with an [href] attribute that are a descendant of a div element, then you would use the selector div a[href]:

for a in soup.select('div a[href]'):
    print (a)

If you don't use the code above, then based on the original code that you provided, you would need to select all the elements with a class of Special_Div_Name, then you would need to iterate over those elements and select the descendant anchor elements:

for div in soup.select('.Special_Div_Name'):
    for a in div.find_all('a', href = re.compile('\.mp3$')):
        print (a['href'])

As a side note, re.compile('.mp3') should be re.compile('\.mp3$') since . has special meaning in a regular expression. In addition, you will also want the anchor $ in order to match at the end of the sting (rather than anywhere in the string).

answered Oct 29 '22 15:10

Josh Crozier

Related questions
                            
                                NumPy exception when using MLlib even though Numpy is installed
                            
                                Flask-Sqlalchemy setup engine configuration
                            
                                How to replace dash between characters with space using regex
                            
                                Looking for example Python code for Netsuite API using OAuth?
                            
                                How can I add values in the list using for loop in python? [duplicate]
                            
                                GenericForeignKey data migtation error: 'content_object' is an invalid keyword argument
                            
                                How to select specific fields in elasticsearch-dsl python
                            
                                Socket module, how to send integer
                            
                                When I tried to sort a list, I got an error 'dict' object has no attribute
                            
                                convert pandas float series to int
                            
                                Possible ways to do one hot encoding in scikit-learn?
                            
                                How to hide a tab in QTabWidget and show it when a button is pressed
                            
                                Create a directed graph using python-igraph [closed]
                            
                                How do I use Selenium's wait?
                            
                                decimal.InvalidOperation in python
                            
                                Python: How to read file and store certain columns in array
                            
                                odoo context field. default value for popup
                            
                                counting the number of non-zero numbers in a column of a df in pandas/python
                            
                                How do I print double in Python with exact precision? [duplicate]
                            
                                How can I delete rows for a particular Date in a Pandas dataframe?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With