I'm trying to gather information from Google Analytics to build a recommendation engine for my site. The site consists of many pages, so I'm tracking the number of times a user clicks, for example, from page A to page B. Currently I can measure the A -> B
transitions on Google Analytics with previousPagePath = '/A'
and nextPagePath = '/B'
, but the question I really want to answer is, "Of all the visits to the site that included viewing page A, how many times were pages B, C, ... viewed in the same visit?"
For example, if the flow was A -> homepage -> B
, then that would not be captured by my current methodology, but would be captured by the broader measure. It looks like the "Visitors Flow" report on the Google Analytics web interface has the data I'm looking for, but I can't figure out how to access it programmatically via the API.
What is the best way to get this data?
With its robust web analytics and reporting tools, Google Analytics can help you make the most out of visitors and potentially turn them into customers. In addition to tracking the number of visitors, Google Analytics provides key insights into how your website is performing and what you can do to meet your goals.
This is a really great idea. I'm a little late to this, but you should be able to accomplish this by downloading all of the data using the Google Analytics Reporting API, store it in a local database/file/whatever, and then build your recommendation engine by aggregating the statistics by hand and storing them locally.
To get the data from the Reporting API, try playing with the query explorer and extracting the number of visits to pages between all pairs of paths using a method similar to @carlsoja:
dimensions=ga:previousPagePath,ga:pagePath&metrics=ga:visits
In order to get all of the data, you will have to use one of the Core Reporting Client Libraries to paginate through the results (which you can experiment with in the query explorer).
Once you have all of the data, you can pretty easily calculate the Markov Chain transition probabilities that a person visits page /A
after they have visited page /B
, or p(/A | /B)
. Then it would be pretty straightforward to estimate the probability that someone visits page /A
if they visited page /B
at some point in the past. If you wanted to get really fancy, you could use their complete history {H}
to make recommendations for pages by estimating p(/A | {H})
, but I'll leave that as an exercise for the reader ;)
Hope this helps!
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With