Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Prevent from being banned from google scraping with BeautifulSoup

I want to make google news scraper with Python and BeautifulSoup but I have read that there is a chance that I can be banned.

I have also read that I can prevent this with using some rotating proxies and rotating IP addresses. Only thing I managed to do Is to make rotating User-Agent. Can you show me how to add rotating proxy and rotating IP address?

I know that it should be added in request.get() part but I do not know how.

This is my code:

from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

term = 'usa'
page=0

for page in range(1,5):

    page = page*10

    url = 'https://www.google.com/search?q={}&tbm=nws&sxsrf=ACYBGNTx2Ew_5d5HsCvjwDoo5SC4U6JBVg:1574261023484&ei=H1HVXf-fHfiU1fAP65K6uAU&start={}&sa=N&ved=0ahUKEwi_q9qog_nlAhV4ShUIHWuJDlcQ8tMDCF8&biw=1280&bih=561&dpr=1.5'.format(term,page)
    print(url)

    response = requests.get(url, headers=headers)
    soup = BeautifulSoup(response.text, 'html.parser')

    headline_text = soup.find_all('h3', class_= "r dO0Ag")

    snippet_text = soup.find_all('div', class_='st')

    news_date = soup.find_all('div', class_='slp')

    print(len(news_date))
like image 997
taga Avatar asked Nov 01 '25 15:11

taga


2 Answers

You can do searches with the proper API from Google:

https://developers.google.com/custom-search/v1/overview

like image 67
Ramon Medeiros Avatar answered Nov 03 '25 06:11

Ramon Medeiros


One more simple trick is like Using Google colab in the Brave Tor browser and then see the results that you will get different ip addresses.

So, once you'll get the data that you want then you can use that data in you jupyter notebook or VS Code or elsewhere.

See, the results in the screenshots:

Using free proxies will get an error because there are too many requests on the free proxies so, you have to pick every time different one whose proxy is getting lower traffic so that's a terrible task to chose one out of hundreds.

Using free proxies will get an error because there are too many requests on the free proxies so, you have to pick every time different one whose proxy is getting lower traffic so that's a terrible task to chose one out of hundreds

Getting correct results with Brave Tor VPN: Getting correct results with Brave Tor VPN

like image 44
Mayur Gupta Avatar answered Nov 03 '25 07:11

Mayur Gupta



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!