I am trying to scrape the number of reviews of a place from google maps using python. For example the restaurant Pike's Landing (see google maps URL below) has 162 reviews. I want to pull this number in python.
URL: https://www.google.com/maps?cid=15423079754231040967
I am not vert well versed with HTML, but from some basic examples on the internet I wrote the following code, but what I get is a black variable after running this code. If you could let me know what am I dong wrong in this that would be much appreciated.
from urllib.request import urlopen
from bs4 import BeautifulSoup
quote_page ='https://www.google.com/maps?cid=15423079754231040967'
page = urlopen(quote_page)
soup = BeautifulSoup(page, 'html.parser')
price_box = soup.find_all('button',attrs={'class':'widget-pane-link'})
print(price_box.text)
It's hard to do it in pure Python and without an API, here's what I ended with (note that I added &hl=en
at the end of the url, to get English results and not in my language):
import re
import requests
from ast import literal_eval
urls = [
'https://www.google.com/maps?cid=15423079754231040967&hl=en',
'https://www.google.com/maps?cid=16168151796978303235&hl=en']
for url in urls:
for g in re.findall(r'\[\\"http.*?\d+ reviews?.*?]', requests.get(url).text):
data = literal_eval(g.replace('null', 'None').replace('\\"', '"'))
print(bytes(data[0], 'utf-8').decode('unicode_escape'))
print(data[1])
Prints:
http://www.google.com/search?q=Pike's+Landing,+4438+Airport+Way,+Fairbanks,+AK+99709,+USA&ludocid=15423079754231040967#lrd=0x51325b1733fa71bf:0xd609c9524d75cbc7,1
469 reviews
http://www.google.com/search?q=Sequoia+TreeScape,+Newmarket,+ON+L3Y+8R5,+Canada&ludocid=16168151796978303235#lrd=0x882ad2157062b6c3:0xe060d065957c4103,1
42 reviews
You need to view the source code of the page and parse window.APP_INITIALIZATION_STATE
variable block using a regular expression, there you'll find all needed data.
Alternatively, you can use Google Maps Reviews API from SerpApi.
Example JSON output:
"place_results": {
"title": "Pike's Landing",
"data_id": "0x51325b1733fa71bf:0xd609c9524d75cbc7",
"reviews_link": "https://serpapi.com/search.json?engine=google_maps_reviews&hl=en&place_id=0x51325b1733fa71bf%3A0xd609c9524d75cbc7",
"gps_coordinates": {
"latitude": 64.8299557,
"longitude": -147.8488774
},
"place_id_search": "https://serpapi.com/search.json?data=%214m5%213m4%211s0x51325b1733fa71bf%3A0xd609c9524d75cbc7%218m2%213d64.8299557%214d-147.8488774&engine=google_maps&google_domain=google.com&hl=en&type=place",
"thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNtwheOCQ97QFrUNIwKYUoAPiV81rpiW5cIiQco=w152-h86-k-no",
"rating": 3.9,
"reviews": 839,
"price": "$$",
"type": [
"American restaurant"
],
"description": "Burgers, seafood, steak & river views. Pub fare alongside steak & seafood, served in a dining room with river views & a waterfront patio.",
"service_options": {
"dine_in": true,
"curbside_pickup": true,
"delivery": false
}
}
Code to integrate:
import os
from serpapi import GoogleSearch
params = {
"engine": "google_maps",
"type": "search",
"q": "pike's landing",
"ll": "@40.7455096,-74.0083012,14z",
"google_domain": "google.com",
"api_key": os.getenv("API_KEY"),
}
search = GoogleSearch(params)
results = search.get_dict()
reviews = results["place_results"]["reviews"]
print(reviews)
Output:
839
Disclaimer, I work for SerpApi.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With