Get href Attribute Link from td tag BeautifulSoup Python

Tags:

python

beautifulsoup

I am new in Python and someone suggested me to use Beautiful soup for Scrapping and i am struck in a problem to fetch the href attribute from a td tag Column 2 on the basis of year in column 4.

<table class="tableFile2" summary="Results">
         <tr>
            <th width="7%" scope="col">Filings</th>
            <th width="10%" scope="col">Format</th>
            <th scope="col">Description</th>
            <th width="10%" scope="col">Filing Date</th>
            <th width="15%" scope="col">File/Film Number</th>
         </tr>
<tr>
<td nowrap="nowrap">8-K</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513199324/0001193125-13-199324-index.htm" id="documentsbutton">&nbsp;Documents</a></td>
<td class="small" >Current report, items 8.01 and 9.01
<br />Acc-no: 0001193125</td>
            <td>2013-05-03</td>
            <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=000-10030&amp;owner=include&amp;count=40">000-10030</a><br>13813281         </td>
         </tr>
<tr class="blueRow">
<td nowrap="nowrap">424B2</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513191849/0001193125-13-191849-index.htm" id="documentsbutton">&nbsp;Documents</a></td>
<td class="small" >Prospectus [Rule 424(b)(2)]<br />Acc-no: 0001193125</td>
            <td>2013-05-01</td>
            <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=333-188191&amp;owner=include&amp;count=40">333-188191</a><br>13802405         </td>
         </tr>
<tr>
<td nowrap="nowrap">FWP</td>
<td nowrap="nowrap"><a href="/Archives/edgar/data/320193/000119312513189053/0001193125-13-189053-index.htm" id="documentsbutton">&nbsp;Documents</a></td>
<td class="small" >Filing under Securities Act Rules 163/433 of free writing prospectuses<br />Acc-no: 0001193125-13-189053&nbsp;(34 Act)&nbsp; Size: 52 KB            </td>
            <td>2013-05-01</td>
            <td nowrap="nowrap"><a href="/cgi-bin/browse-edgar?action=getcompany&amp;filenum=333-188191&amp;owner=include&amp;count=40">333-188191</a><br>13800170         </td>
         </tr>
</table>



table = soup.find('table', class="tableFile2")

rows = table.findAll('tr')
for tr in rows:
  cols = tr.findAll('td')
  if "2013" in cols[3]
    link = cols[1].find('a').get('href')
  print

357

asked May 24 '13 10:05

Zaid Iqbal

1 Answers

This works for me in Python 2.7:

table = soup.find('table', {'class': 'tableFile2'})
rows = table.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    if len(cols) >= 4 and "2013" in cols[3].text:
        link = cols[1].find('a').get('href')
        print link

A few issues with your previous code:

soup.find() requires a dictionary of attributes (e.g., {'class' : 'tableFile2'})
Not every cols instance will have at least 3 columns, so you need to check length first.

154

answered Sep 17 '22 19:09

Charles Marsh

Related questions
                            
                                Are there any other ways to iterate through the attributes of a custom class, excluding the in-built ones? [duplicate]
                            
                                plot circle on unequal axes with pyplot
                            
                                What is the complexity of bisect algorithm?
                            
                                Scrapy - Select specific link based on text
                            
                                Passing variable to a macro in Jinja2
                            
                                How to add an attribute that contains a hyphen to a WTForms field
                            
                                Django. Get values for all fields in one object
                            
                                How to make the plot not disappear?
                            
                                os.path.getsize reports a filesize with an L at the end, why?
                            
                                Jinja2 for loop with conditions
                            
                                Redirection of stdout to a file not working
                            
                                Effcient way to find longest duplicate string for Python (From Programming Pearls)
                            
                                Calculating Pearson correlation
                            
                                What is the number of characters in a python uuid (type 4)?
                            
                                Django - referencing static files in templates
                            
                                Checking a Python FTP connection
                            
                                How to parse options without any argument using optparse module
                            
                                What was Blender created in?
                            
                                What is the difference between literal and variables in Python? [closed]
                            
                                How to use regular expressions do reverse search?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With