Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Web scraping with Python and Beautiful Soup

I am practicing building web scrapers. One that I am working on now involves going to a site, scraping links for the various cities on that site, then taking all of the links for each of the cities and scraping all the links for the properties in said cites.

I'm using the following code:

import requests

from bs4 import BeautifulSoup

main_url = "http://www.chapter-living.com/"

# Getting individual cities url
re = requests.get(main_url)
soup = BeautifulSoup(re.text, "html.parser")
city_tags = soup.find_all('a', class_="nav-title")  # Bottom page not loaded dynamycally
cities_links = [main_url + tag["href"] for tag in city_tags.find_all("a")]  # Links to cities

If I print out city_tags I get the HTML I want. However, when I print cities_links I get AttributeError: 'ResultSet' object has no attribute 'find_all'.

I gather from other q's on here that this error occurs because city_tags returns none, but this can't be the case if it is printing out the desired html? I have noticed that said html is in [] - does this make a difference?

like image 820
Maverick Avatar asked Mar 09 '23 14:03

Maverick


1 Answers

Well city_tags is a bs4.element.ResultSet (essentially a list) of tags and you are calling find_all on it. You probably want to call find_all in every element of the resultset or in this specific case just retrieve their href attribute

import requests
from bs4 import BeautifulSoup

main_url = "http://www.chapter-living.com/"

# Getting individual cities url
re = requests.get(main_url)
soup = BeautifulSoup(re.text, "html.parser")
city_tags = soup.find_all('a', class_="nav-title")  # Bottom page not loaded dynamycally
cities_links = [main_url + tag["href"] for tag in city_tags]  # Links to cities
like image 73
Giannis Spiliopoulos Avatar answered Mar 19 '23 05:03

Giannis Spiliopoulos