beautiful soup getting href based on a text

Tags:

python

beautifulsoup

Say there's a page with hundreds of links, each with unique text in the a tag. How can I specify an a tag's text and then get the href from there? For example,

Click to copy

for a in soup.findAll('a', href=True):
  print(a['href'])

This gets all the href throughout the page, which is overkill. When I do this:

Click to copy

for a in soup.findAll('a', href=True text="Some Value"):
  print(a['href'])

I can't grab the href tag because it no longer returns a Tag object, but instead an Navigable object. Any idea how I can achieve what I want?

564

asked Jan 06 '12 07:01

tipu

1 Answers

Instead of passing the text parameter, you can pass a callable as the name parameter that checks both the tag name and the text:

Click to copy

for tag in soup.findAll(lambda tag: (tag.name == 'a'
                                     and tag.text == 'Some Value'),
                        href=True):
    print tag['href']

This way, the returned value is a Tag instead of a NavigableString.

Note also that, according to the documentation:

If you use text, then any values you give for name and the keyword arguments are ignored.

So probably the second example in your question doesn't work as expected even if you just want to get the NavigableString.

126

answered Oct 05 '22 16:10

jcollado

Related questions
                            
                                What are the advantages of MongoDB over MySQL and PostgreSQL? [closed]
                            
                                Python metaclass and the object base class
                            
                                How can I call python program from VBA?
                            
                                how to apply patch in mercurial and show diff tool if failed apply
                            
                                What is the most simple / lightest-weight WSGI framework?
                            
                                Longest common prefix using buffer?
                            
                                Convert path to polygon
                            
                                How do I avoid naming clashes within Python's module system?
                            
                                IE9 hangs local Flask instance
                            
                                python: urllib2 using different network interface
                            
                                How to do a memset with Python buffer object?
                            
                                Find the largest divisor of N that is less than sqrt(N)
                            
                                How to make Tkinter message expand when I resize the window?
                            
                                How to vectorize the evaluation of bilinear & quadratic forms?
                            
                                Is it possible to concatenate QuerySets?
                            
                                Python Class Variable Initialization
                            
                                Python imports being overridden by the standard library (Python 2.4)
                            
                                Python subprocess to Bash: curly braces
                            
                                control memory alignment in python ctypes
                            
                                Remove default "Python" submenu with Tkinter Menu on Mac OSX

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

beautiful soup getting href based on a text

Tags:

python

beautifulsoup

tipu

People also ask

1 Answers

jcollado

Recent Activity

Donate For Us