Say there's a page with hundreds of links, each with unique text in the a tag. How can I specify an a tag's text and then get the href from there? For example,
for a in soup.findAll('a', href=True):
print(a['href'])
This gets all the href throughout the page, which is overkill. When I do this:
for a in soup.findAll('a', href=True text="Some Value"):
print(a['href'])
I can't grab the href tag because it no longer returns a Tag object, but instead an Navigable object. Any idea how I can achieve what I want?
Steps to be followed:get() method by passing URL to it. Create a Parse Tree object i.e. soup object using of BeautifulSoup() method, passing it HTML document extracted above and Python built-in HTML parser. Use the a tag to extract the links from the BeautifulSoup object.
For web scraping to work in Python, we're going to perform three basic steps: Extract the HTML content using the requests library. Analyze the HTML structure and identify the tags which have our content. Extract the tags using Beautiful Soup and put the data in a Python list.
The navigablestring object is used to represent the contents of a tag. To access the contents, use “. string” with tag. You can replace the string with another string but you can't edit the existing string.
Instead of passing the text
parameter, you can pass a callable as the name
parameter that checks both the tag name
and the text
:
for tag in soup.findAll(lambda tag: (tag.name == 'a'
and tag.text == 'Some Value'),
href=True):
print tag['href']
This way, the returned value is a Tag
instead of a NavigableString
.
Note also that, according to the documentation:
If you use text, then any values you give for name and the keyword arguments are ignored.
So probably the second example in your question doesn't work as expected even if you just want to get the NavigableString
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With