Google Search Web Scraping with Python

Tags:

I've been learning a lot of python lately to work on some projects at work.

Currently I need to do some web scraping with google search results. I found several sites that demonstrated how to use ajax google api to search, however after attempting to use it, it appears to no longer be supported. Any suggestions?

I've been searching for quite a while to find a way but can't seem to find any solutions that currently work.

872

asked Jul 27 '16 17:07

pbell

1 Answers

You can always directly scrape Google results. To do this, you can use the URL https://google.com/search?q=<Query> this will return the top 10 search results.

Then you can use lxml for example to parse the page. Depending on what you use, you can either query the resulting node tree via a CSS-Selector (.r a) or using a XPath-Selector (//h3[@class="r"]/a)

In some cases the resulting URL will redirect to Google. Usually it contains a query-parameter qwhich will contain the actual request URL.

Example code using lxml and requests:

from urllib.parse import urlencode, urlparse, parse_qs

from lxml.html import fromstring
from requests import get

raw = get("https://www.google.com/search?q=StackOverflow").text
page = fromstring(raw)

for result in page.cssselect(".r a"):
    url = result.get("href")
    if url.startswith("/url?"):
        url = parse_qs(urlparse(url).query)['q']
    print(url[0])

A note on google banning your IP: In my experience, google only bans if you start spamming google with search requests. It will respond with a 503 if Google thinks you are bot.

answered Oct 04 '22 03:10

StuxCrystal

Related questions
                            
                                Flask unit testing: Getting the response's redirect location
                            
                                Accessing argument values for argparse in Python
                            
                                Why is super used so much in PySide/PyQt?
                            
                                What are __signature__ and __text_signature__ used for in Python 3.4
                            
                                Writing hex data into a file
                            
                                Python imports relative path
                            
                                How can I display an image using Pillow?
                            
                                Python 3 exception deletes variable in enclosing scope for unknown reason [duplicate]
                            
                                How to create ternary contour plot in Python?
                            
                                How can I keep test data after Django tests complete?
                            
                                Memory efficient sort of massive numpy array in Python
                            
                                What is the difference between skew and kurtosis functions in pandas vs. scipy?
                            
                                ValueError: setting an array element with a sequence. for Pandas
                            
                                Reorder levels of MultiIndex in a pandas DataFrame
                            
                                How to replace all values in a Pandas Dataframe not in a list? [duplicate]
                            
                                Using Boto3 to interact with amazon Aurora on RDS
                            
                                Average of a numpy array returns NaN
                            
                                overcome Graphdef cannot be larger than 2GB in tensorflow
                            
                                interpolate missing values 2d python
                            
                                How to remove the extra row (or column) after transpose() in Pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Google Search Web Scraping with Python

Tags:

python

python-2.7

google-search

google-search-api

pbell

People also ask

1 Answers

StuxCrystal

Recent Activity

Donate For Us