Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Scrape Multiple URLs using Beautiful Soup

I'm trying to extract specific classes from multiple URLs. The tags and classes stay the same but I need my python program to scrape all as I just input my link.

Here's a sample of my work:

from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip

url = input('insert URL here: ')
#scrape elements
response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

#print titles only
h1 = soup.find("h1", class_= "class-headline")
print(h1.get_text())

This works for individual URLs but not for a batch. Thanks for helping me. I learned a lot from this community.

like image 937
Rudolph Musngi Avatar asked Nov 16 '16 10:11

Rudolph Musngi


1 Answers

Have a list of urls and iterate through it.

from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip

urls = ['www.website1.com', 'www.website2.com', 'www.website3.com', .....]
#scrape elements
for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    #print titles only
    h1 = soup.find("h1", class_= "class-headline")
    print(h1.get_text())

If you are going to prompt user for input for each site then it can be done this way

from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip

urls = ['www.website1.com', 'www.website2.com', 'www.website3.com', .....]
#scrape elements
msg = 'Enter Url, to exit type q and hit enter.'
url = input(msg)
while(url!='q'):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")

    #print titles only
    h1 = soup.find("h1", class_= "class-headline")
    print(h1.get_text())
    input(msg)
like image 80
Falloutcoder Avatar answered Oct 13 '22 00:10

Falloutcoder