I am using beautifulsoup to get all the links from a page. My code is:
import requests
from bs4 import BeautifulSoup
url = 'http://www.acontecaeventos.com.br/marketing-promocional-sao-paulo'
r = requests.get(url)
html_content = r.text
soup = BeautifulSoup(html_content, 'lxml')
soup.find_all('href')
All that I get is:
[]
How can I get a list of all the href links on that page?
Steps to be followed: get() method by passing URL to it. Create a Parse Tree object i.e. soup object using of BeautifulSoup() method, passing it HTML document extracted above and Python built-in HTML parser. Use the a tag to extract the links from the BeautifulSoup object.
Method 1: Using descendants and find() First, import the required modules, then provide the URL and create its requests object that will be parsed by the beautifulsoup object. Now with the help of find() function in beautifulsoup we will find the <body> and its corresponding <ul> tags.
You are telling the find_all
method to find href
tags, not attributes.
You need to find the <a>
tags, they're used to represent link elements.
links = soup.find_all('a')
Later you can access their href
attributes like this:
link = links[0] # get the first link in the entire page
url = link['href'] # get value of the href attribute
url = link.get('href') # or like this
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With