I am trying to write a Python program to gather data from Google Trends (GT)- specifically, I want to automatically open URLs and access the specific values that are displayed in the line graphs:
I would be happy with downloading the CSV files, or with web-scraping the values (based on my reading of Inspect Element, cleaning the data would only require a simple split or two). I have many searches I want to conduct (many different keywords)
I am creating many URLs to gather data from Google Trends. I used the actual URL from a test search. Example of a URL: https://trends.google.com/trends/explore?q=sports%20cars&geo=US Physically searching this URL on a browser shows the relevant GT page. The problem comes when I try to access it through a program.
Most responses I have seen suggest using public modules from Pip (e.g. PyTrends and the "Unofficial Google Trends API")- my project manager has insisted I do not use modules that are not directly created by the site (i.e.: APIs are acceptable but only an official Google API). Only BeautifulSoup has been sanctioned as a plugin (don't ask why).
Below is an example of the code I have tried. I know it is basic, but on the very first request I got:
HTTPError: HTTP Error 429: unknown": too many requests.
Some responses to other questions mention Google Trends API - is this real? I could not find any documentation on an official API.
Here is another post which outlined a solution that I have tried that did not work for me:
https://codereview.stackexchange.com/questions/208277/web-scraping-google-trends-in-python
url = 'https://trends.google.com/trends/explore?q=sports%20cars&geo=US'
html = urlopen(url).read()
soup = bs(html, 'html.parser')
divs = soup.find_all('div')
return divs
Google Trends is a public platform that you can use to analyze interest over time for a given topic, search term, and even company. Pytrends is an unofficial Google Trends API that provides different methods to download reports of trending results from google trends.
Google Trends does not have an API, but Google Trends Scraper creates an unofficial Google Trends API to let you extract data from Google Trends directly and at scale. It is built on the powerful Apify SDK and you can run it on the Apify platform and locally.
It's using an API you can find in the network tab
import requests
import json
r = requests.get('https://trends.google.com/trends/api/widgetdata/multiline?hl=en-GB&tz=-60&req=%7B%22time%22:%222018-05-29+2019-05-29%22,%22resolution%22:%22WEEK%22,%22locale%22:%22en-GB%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%22country%22:%22US%22%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22sports+cars%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=APP6_UEAAAAAXO-yaYekqJ7Tf2nuoLBAigMSW7axoLTL&tz=-60')
data = json.loads(r.text.lstrip(")]}\',\n"))
for item in data['default']['timelineData']:
print(item['formattedAxisTime'], item['value'])
We can unquote the url to have a better idea of what is going on:
import urllib.parse
url = 'https://trends.google.com/trends/api/widgetdata/multiline?hl=en-GB&tz=-60&req=%7B%22time%22:%222018-05-29+2019-05-29%22,%22resolution%22:%22WEEK%22,%22locale%22:%22en-GB%22,%22comparisonItem%22:%5B%7B%22geo%22:%7B%22country%22:%22US%22%7D,%22complexKeywordsRestriction%22:%7B%22keyword%22:%5B%7B%22type%22:%22BROAD%22,%22value%22:%22sports+cars%22%7D%5D%7D%7D%5D,%22requestOptions%22:%7B%22property%22:%22%22,%22backend%22:%22IZG%22,%22category%22:0%7D%7D&token=APP6_UEAAAAAXO-yaYekqJ7Tf2nuoLBAigMSW7axoLTL&tz=-60'
print(urllib.parse.unquote(url))
This yields:
'https://trends.google.com/trends/api/widgetdata/multiline?hl=en-GB&tz=-60&req={"time":"2018-05-29+2019-05-29","resolution":"WEEK","locale":"en-GB","comparisonItem":[{"geo":{"country":"US"},"complexKeywordsRestriction":{"keyword":[{"type":"BROAD","value":"sports+cars"}]}}],"requestOptions":{"property":"","backend":"IZG","category":0}}&token=APP6_UEAAAAAXO-yaYekqJ7Tf2nuoLBAigMSW7axoLTL&tz=-60'
You'll need to explore how transferable elements from this are.
For example, I looked at search term banana and this was the result:
unquoted:
'https://trends.google.com/trends/api/explore?hl=en-GB&tz=-60&req={"comparisonItem":[{"keyword":"banana","geo":"US","time":"today+12-m"}],"category":0,"property":""}&tz=-60'
quoted:
'https://trends.google.com/trends/api/explore?hl=en-GB&tz=-60&req=%7B%22comparisonItem%22:%5B%7B%22keyword%22:%22banana%22,%22geo%22:%22US%22,%22time%22:%22today+12-m%22%7D%5D,%22category%22:0,%22property%22:%22%22%7D&tz=-60'
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With