I am trying to use beautiful soup to parse html and find all href with a specific anchor tag
<a href="http://example.com">TEXT</a> <a href="http://example.com/link">TEXT</a> <a href="http://example.com/page">TEXT</a>
all the links I am looking for have the exact same anchor text, in this case TEXT. I am NOT looking for the word TEXT, I want to use the word TEXT to find all the different HREF
edit:
for clarification looking for something similar to using the class to parse for the links
<a href="http://example.com" class="visible">TEXT</a> <a href="http://example.com/link" class="visible">TEXT</a> <a href="http://example.com/page" class="visible">TEXT</a>
and then using
findAll('a', 'visible')
except the HTML I am parsing doesn't have a class but always the same anchor text
find() method The find method is used for finding out the first tag with the specified name or id and returning an object of type bs4. Example: For instance, consider this simple HTML webpage having different paragraph tags.
Use contentTable. find_all('a', string = 'Alamo') to extract all anchor tags with text Alamo. By default, Beautiful Soup searches through all of the child elements.
Would something like this work?
In [39]: from bs4 import BeautifulSoup In [40]: s = """\ ....: <a href="http://example.com">TEXT</a> ....: <a href="http://example.com/link">TEXT</a> ....: <a href="http://example.com/page">TEXT</a> ....: <a href="http://dontmatchme.com/page">WRONGTEXT</a>""" In [41]: soup = BeautifulSoup(s) In [42]: for link in soup.findAll('a', href=True, text='TEXT'): ....: print link['href'] ....: ....: http://example.com http://example.com/link http://example.com/page
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With