Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

beautiful soup getting href based on a text

Say there's a page with hundreds of links, each with unique text in the a tag. How can I specify an a tag's text and then get the href from there? For example,

for a in soup.findAll('a', href=True):
  print(a['href'])

This gets all the href throughout the page, which is overkill. When I do this:

for a in soup.findAll('a', href=True text="Some Value"):
  print(a['href'])

I can't grab the href tag because it no longer returns a Tag object, but instead an Navigable object. Any idea how I can achieve what I want?

like image 564
tipu Avatar asked Jan 06 '12 07:01

tipu


People also ask

How do you get the link in text Beautiful Soup?

Steps to be followed:get() method by passing URL to it. Create a Parse Tree object i.e. soup object using of BeautifulSoup() method, passing it HTML document extracted above and Python built-in HTML parser. Use the a tag to extract the links from the BeautifulSoup object.

How do you scrape text using Beautiful Soup?

For web scraping to work in Python, we're going to perform three basic steps: Extract the HTML content using the requests library. Analyze the HTML structure and identify the tags which have our content. Extract the tags using Beautiful Soup and put the data in a Python list.

Is navigable string editable in Beautiful Soup?

The navigablestring object is used to represent the contents of a tag. To access the contents, use “. string” with tag. You can replace the string with another string but you can't edit the existing string.


1 Answers

Instead of passing the text parameter, you can pass a callable as the name parameter that checks both the tag name and the text:

for tag in soup.findAll(lambda tag: (tag.name == 'a'
                                     and tag.text == 'Some Value'),
                        href=True):
    print tag['href']

This way, the returned value is a Tag instead of a NavigableString.

Note also that, according to the documentation:

If you use text, then any values you give for name and the keyword arguments are ignored.

So probably the second example in your question doesn't work as expected even if you just want to get the NavigableString.

like image 126
jcollado Avatar answered Oct 05 '22 16:10

jcollado