Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Getting all Links from a page Beautiful Soup

I am using beautifulsoup to get all the links from a page. My code is:

import requests
from bs4 import BeautifulSoup


url = 'http://www.acontecaeventos.com.br/marketing-promocional-sao-paulo'
r = requests.get(url)
html_content = r.text
soup = BeautifulSoup(html_content, 'lxml')

soup.find_all('href')

All that I get is:

[]

How can I get a list of all the href links on that page?

like image 209
user1922364 Avatar asked Sep 29 '17 14:09

user1922364


People also ask

How do you extract links from beautiful soup?

Steps to be followed: get() method by passing URL to it. Create a Parse Tree object i.e. soup object using of BeautifulSoup() method, passing it HTML document extracted above and Python built-in HTML parser. Use the a tag to extract the links from the BeautifulSoup object.

Which method in BeautifulSoup is used to check all URL or images?

Method 1: Using descendants and find() First, import the required modules, then provide the URL and create its requests object that will be parsed by the beautifulsoup object. Now with the help of find() function in beautifulsoup we will find the <body> and its corresponding <ul> tags.


1 Answers

You are telling the find_all method to find href tags, not attributes.

You need to find the <a> tags, they're used to represent link elements.

links = soup.find_all('a')

Later you can access their href attributes like this:

link = links[0]          # get the first link in the entire page
url  = link['href']      # get value of the href attribute
url  = link.get('href')  # or like this
like image 146
Anonta Avatar answered Sep 20 '22 18:09

Anonta