Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

python/beautifulsoup to find all <a href> with specific anchor text

Tags:

I am trying to use beautiful soup to parse html and find all href with a specific anchor tag

<a href="http://example.com">TEXT</a> <a href="http://example.com/link">TEXT</a> <a href="http://example.com/page">TEXT</a> 

all the links I am looking for have the exact same anchor text, in this case TEXT. I am NOT looking for the word TEXT, I want to use the word TEXT to find all the different HREF

edit:

for clarification looking for something similar to using the class to parse for the links

<a href="http://example.com" class="visible">TEXT</a> <a href="http://example.com/link" class="visible">TEXT</a> <a href="http://example.com/page" class="visible">TEXT</a> 

and then using

findAll('a', 'visible') 

except the HTML I am parsing doesn't have a class but always the same anchor text

like image 639
cwal Avatar asked Nov 05 '12 21:11

cwal


People also ask

What is Find () method in Beautiful Soup?

find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.

Which Beautiful Soup method can find all the instances of a tag on a page?

Use contentTable. find_all('a', string = 'Alamo') to extract all anchor tags with text Alamo. By default, Beautiful Soup searches through all of the child elements.


1 Answers

Would something like this work?

In [39]: from bs4 import BeautifulSoup  In [40]: s = """\    ....: <a href="http://example.com">TEXT</a>    ....: <a href="http://example.com/link">TEXT</a>    ....: <a href="http://example.com/page">TEXT</a>    ....: <a href="http://dontmatchme.com/page">WRONGTEXT</a>"""  In [41]: soup = BeautifulSoup(s)  In [42]: for link in soup.findAll('a', href=True, text='TEXT'):    ....:     print link['href']    ....:    ....: http://example.com http://example.com/link http://example.com/page 
like image 155
RocketDonkey Avatar answered Sep 19 '22 12:09

RocketDonkey