I am trying to use beautiful soup to parse html and find all href with a specific anchor tag <pre class="prettyprint"><code><a href="http://example.com">TEXT</a> <a href="http://example.com/link">TEXT</a> <a href="http://example.com/page">TEXT</a> </code></pre> all the links I am looking for have the exact same anchor text, in this case TEXT. I am NOT looking for the word TEXT, I want to use the word TEXT to find all the different HREF edit: for clarification looking for something similar to using the class to parse for the links <pre class="prettyprint"><code><a href="http://example.com" class="visible">TEXT</a> <a href="http://example.com/link" class="visible">TEXT</a> <a href="http://example.com/page" class="visible">TEXT</a> </code></pre> and then using <pre class="prettyprint"><code>findAll('a', 'visible') </code></pre> except the HTML I am parsing doesn't have a class but always the same anchor text

Would something like this work? <pre class="prettyprint"><code>In [39]: from bs4 import BeautifulSoup In [40]: s = """\ ....: <a href="http://example.com">TEXT</a> ....: <a href="http://example.com/link">TEXT</a> ....: <a href="http://example.com/page">TEXT</a> ....: <a href="http://dontmatchme.com/page">WRONGTEXT</a>""" In [41]: soup = BeautifulSoup(s) In [42]: for link in soup.findAll('a', href=True, text='TEXT'): ....: print link['href'] ....: ....: http://example.com http://example.com/link http://example.com/page </code></pre>

python/beautifulsoup to find all <a href> with specific anchor text

Tags:

I am trying to use beautiful soup to parse html and find all href with a specific anchor tag

<a href="http://example.com">TEXT</a> <a href="http://example.com/link">TEXT</a> <a href="http://example.com/page">TEXT</a>

all the links I am looking for have the exact same anchor text, in this case TEXT. I am NOT looking for the word TEXT, I want to use the word TEXT to find all the different HREF

edit:

for clarification looking for something similar to using the class to parse for the links

<a href="http://example.com" class="visible">TEXT</a> <a href="http://example.com/link" class="visible">TEXT</a> <a href="http://example.com/page" class="visible">TEXT</a>

and then using

findAll('a', 'visible')

except the HTML I am parsing doesn't have a class but always the same anchor text

639

asked Nov 05 '12 21:11

cwal

1 Answers

Would something like this work?

In [39]: from bs4 import BeautifulSoup  In [40]: s = """\    ....: <a href="http://example.com">TEXT</a>    ....: <a href="http://example.com/link">TEXT</a>    ....: <a href="http://example.com/page">TEXT</a>    ....: <a href="http://dontmatchme.com/page">WRONGTEXT</a>"""  In [41]: soup = BeautifulSoup(s)  In [42]: for link in soup.findAll('a', href=True, text='TEXT'):    ....:     print link['href']    ....:    ....: http://example.com http://example.com/link http://example.com/page

155

answered Sep 19 '22 12:09

RocketDonkey

Related questions
                            
                                Get public (remote) IP address
                            
                                POS Application Development - Receipt Printing
                            
                                Tables without a clustered index are not supported in this version of SQL Server
                            
                                auto-generate models for sequelize
                            
                                Jade Templates - Dynamic Attributes
                            
                                Stretch items to fill canvas
                            
                                Converting a JSON.NET JObject's Properties/Tokens into Dictionary Keys
                            
                                Google chart default size
                            
                                Premature end of script headers: index.php, mod_fcgid: read data timeout in 61 seconds
                            
                                Launch Minecraft from command line - username and password as prefix
                            
                                Simple Backup and Restore for mysql Database from Java
                            
                                Python: is thread still running

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With