Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting different customDimensions on BigQuery integration of GA

Sorry for the specific question but I feel like I've hit a dead end because my knowledge of SQL doesn't go that far.

The data that comes out from the BigQuery implementation of GoogleAnalytics raw data looks like this:

|-visitId
|- date
|- (....)
+- hits
   |- time
   +- customDimensions
      |- index
      |- value
   +- customMetrics
      |- index
      |- value

I know there are hits that always send some data to GA. Specifically I want customDimensions.index= 43, customDimensions.index= 24 and customMetrics.index=14. To specify, Dimension 43 is the object being seen or sold, dimension 24 tells me if they are being seen and metric 14, the value has 1 when it's just been sold. My final result should look like this:

customDimension.value( when index=43)    count(when customDimension.index=24 and customDimension.value=='ficha')      count(when customMetrics.index=14 and customMetrics.value ==1))

Grouped by customDimension.value (when index=43) I know that everytime that a hit is sent with customMetrics.index=14, the same hit has customDimensions.index=43, the same way, customDimensions.index=24 always has a customDimensions.index=43. I actually managed to create an SQL that does what I want but, at what cost? It's big, it's slow, it's ugly. What I'm currently doing is:

  • Create three tables, all having visitId, hit.time and the value when index=14,24,43
  • Left join 43 with 24 ON 43.visitId==24.visitId AND 43.hits.time==24.hits.time as result
  • Left join result with 14 ON 14.visitId==result.visitId AND 14.hits.time==result.hits.time

I'm not interested in visitId or hits.time, it's just a way to relate the same hits (and know which product they bought when the customMetrics.index=14 and value=1.

This is my code:

SELECT Tviviendasvisitas.viviendaId as ViviendaID ,sum(Tviviendasvisitas.NumeroVisitas) as NumeroVisitas,sum(Ttransacciones.Transactions) as Transactions FROM (
SELECT Tviviendas.visitId as visitId, Tviviendas.hits.time as visitTime, Tviviendas.ViviendaID as viviendaId,Tvisitas.visitas as NumeroVisitas FROM (
SELECT  visitId,hits.time,hits.customDimensions.value as ViviendaID FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30'))))
WHERE hits.customDimensions.index = 43 
GROUP EACH BY visitId,hits.time, ViviendaID)as Tviviendas

LEFT JOIN EACH(
SELECT  visitId,hits.time,count(*) as visitas FROM ((TABLE_DATE_RANGE([-------.ga_sessions_],  TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30'))))
WHERE hits.customDimensions.index = 24 AND hits.customDimensions.value=='ficha'
GROUP EACH BY visitId,hits.time) as Tvisitas
ON Tvisitas.visitId==Tviviendas.visitId AND Tvisitas.time==Tviviendas.time) as Tviviendasvisitas

LEFT JOIN EACH (
SELECT  visitId ,hits.time as transactionTime, sum(hits.customMetrics.value) as Transactions FROM(TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30')))
WHERE hits.customMetrics.index = 14 AND hits.customMetrics.value=1
GROUP BY visitId, transactionTime) as Ttransacciones
ON Tviviendasvisitas.visitId==Ttransacciones.visitId AND Tviviendasvisitas.visitTime==Ttransacciones.transactionTime
GROUP BY ViviendaID

Running this query takes way too much time for me to create a propper dashboard with the results.

So help me God if that's my final result. I feel like there should be a WAY more elegant solution to this problem but I can't seem to find it on my own.

Help?

like image 926
Datasetter Avatar asked Oct 29 '14 11:10

Datasetter


1 Answers

You should be able to structure this query without the joins by using BigQuery's scoped aggregation (the WITHIN clause). Here is a small example, which may not be exactly the logic you want, but should illustrate some of the possibilities:

SELECT  visitId, hits.time,
        SOME(hits.customDimensions.index = 43) WITHIN RECORD AS has43,
        SUM(IF(hits.customDimensions.index = 24 AND hits.customDimensions.value = 'ficha', 1, 0)) WITHIN RECORD AS numFichas,
        SUM(IF(hits.customMetrics.index = 14, hits.customMetrics.value, 0)) WITHIN RECORD AS totalValues
FROM ((TABLE_DATE_RANGE([-------.ga_sessions_], TIMESTAMP('2014-09-01'), TIMESTAMP('2014-09-30'))))
HAVING has43

The example shows three WITHIN RECORD aggregations, meaning they will be computed over the repeated fields of a single record. SOME() takes a boolean expression and returns true if any field within the record satisfies that expression. So has43 will be true for visits that have one or more hits with customDimensions.index = 43. The HAVING clause filters out records where that is false.

The SUM(IF(...)) expressions compute the total number of customDimensions with index = 24 and value = 'ficha' and the total values associated with the customMetrics with index = 14.

like image 189
sprocket Avatar answered Nov 03 '22 01:11

sprocket