Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to scrape google maps using python

I am trying to scrape the number of reviews of a place from google maps using python. For example the restaurant Pike's Landing (see google maps URL below) has 162 reviews. I want to pull this number in python.

URL: https://www.google.com/maps?cid=15423079754231040967

I am not vert well versed with HTML, but from some basic examples on the internet I wrote the following code, but what I get is a black variable after running this code. If you could let me know what am I dong wrong in this that would be much appreciated.

from urllib.request import urlopen
from bs4 import BeautifulSoup

quote_page ='https://www.google.com/maps?cid=15423079754231040967'
page = urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find_all('button',attrs={'class':'widget-pane-link'})
print(price_box.text)
like image 677
user3510503 Avatar asked Dec 29 '17 12:12

user3510503


2 Answers

It's hard to do it in pure Python and without an API, here's what I ended with (note that I added &hl=en at the end of the url, to get English results and not in my language):

import re
import requests
from ast import literal_eval

urls = [
'https://www.google.com/maps?cid=15423079754231040967&hl=en',
'https://www.google.com/maps?cid=16168151796978303235&hl=en']

for url in urls:
    for g in re.findall(r'\[\\"http.*?\d+ reviews?.*?]', requests.get(url).text):
        data = literal_eval(g.replace('null', 'None').replace('\\"', '"'))
        print(bytes(data[0], 'utf-8').decode('unicode_escape'))
        print(data[1])

Prints:

http://www.google.com/search?q=Pike's+Landing,+4438+Airport+Way,+Fairbanks,+AK+99709,+USA&ludocid=15423079754231040967#lrd=0x51325b1733fa71bf:0xd609c9524d75cbc7,1
469 reviews
http://www.google.com/search?q=Sequoia+TreeScape,+Newmarket,+ON+L3Y+8R5,+Canada&ludocid=16168151796978303235#lrd=0x882ad2157062b6c3:0xe060d065957c4103,1
42 reviews
like image 144
Andrej Kesely Avatar answered Oct 23 '22 18:10

Andrej Kesely


You need to view the source code of the page and parse window.APP_INITIALIZATION_STATE variable block using a regular expression, there you'll find all needed data.


Alternatively, you can use Google Maps Reviews API from SerpApi.

Example JSON output:

"place_results": {
  "title": "Pike's Landing",
  "data_id": "0x51325b1733fa71bf:0xd609c9524d75cbc7",
  "reviews_link": "https://serpapi.com/search.json?engine=google_maps_reviews&hl=en&place_id=0x51325b1733fa71bf%3A0xd609c9524d75cbc7",
  "gps_coordinates": {
    "latitude": 64.8299557,
    "longitude": -147.8488774
  },
  "place_id_search": "https://serpapi.com/search.json?data=%214m5%213m4%211s0x51325b1733fa71bf%3A0xd609c9524d75cbc7%218m2%213d64.8299557%214d-147.8488774&engine=google_maps&google_domain=google.com&hl=en&type=place",
  "thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNtwheOCQ97QFrUNIwKYUoAPiV81rpiW5cIiQco=w152-h86-k-no",
  "rating": 3.9,
  "reviews": 839,
  "price": "$$",
  "type": [
    "American restaurant"
  ],
  "description": "Burgers, seafood, steak & river views. Pub fare alongside steak & seafood, served in a dining room with river views & a waterfront patio.",
  "service_options": {
    "dine_in": true,
    "curbside_pickup": true,
    "delivery": false
  }
}

Code to integrate:

import os
from serpapi import GoogleSearch

params = {
    "engine": "google_maps",
    "type": "search",
    "q": "pike's landing",
    "ll": "@40.7455096,-74.0083012,14z",
    "google_domain": "google.com",
    "api_key": os.getenv("API_KEY"),
}

search = GoogleSearch(params)
results = search.get_dict()

reviews = results["place_results"]["reviews"]

print(reviews)

Output:

839

Disclaimer, I work for SerpApi.

like image 25
Dmitriy Zub Avatar answered Oct 23 '22 18:10

Dmitriy Zub