Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the unique identifier in BigQuery GA4 events data?

Is there a column or combination of columns within BigQuery GA4 events data that uniquely identifies a record? I'm asking because the event_params, user_properties and items fields contain nested json and when parsing them, I'm trying to ensure the related records are kept together.

I've appended a picture to the description below which shows the output when checking for multiple rows using those 4 columns you mentioned.

Notice how, for those event_name, event_timestamp, user_pseudo_id, event_bundle_sequence_id values, there are multiple rows with differing event_parms values?

Should we be combining the event_parms data from the multiple rows into one? I guess we were expecting all related event_parms values to display within one event_name, event_timestamp, user_pseudo_id, event_bundle_sequence_id record, if that makes sense?

enter image description here

select d.cnt,
  g.event_date
, g.event_timestamp
, g.event_name
, g.event_previous_timestamp
, g.event_value_in_usd
, g.event_bundle_sequence_id
, g.event_server_timestamp_offset
, g.user_id
, g.user_pseudo_id
, g.user_first_touch_timestamp
, g.event_params
from dbo.GA4Events g
    inner join (
                select 
                  event_date
                , event_timestamp
                , event_name
                , event_previous_timestamp
                , event_value_in_usd
                , event_bundle_sequence_id
                , event_server_timestamp_offset
                , user_id
                , user_pseudo_id
                , user_first_touch_timestamp
                , count(*) as cnt
                from dbo.GA4Events
                group by 
                  event_date
                , event_timestamp
                , event_name
                , event_previous_timestamp
                , event_value_in_usd
                , event_bundle_sequence_id
                , event_server_timestamp_offset
                , user_id
                , user_pseudo_id
                , user_first_touch_timestamp
                having count(*) > 1) d on 
                isnull(g.event_date,'')                              = isnull(d.event_date,'')
                and isnull(g.event_timestamp,'')                     = isnull(d.event_timestamp,'')
                and isnull(g.event_name ,'')                         = isnull(d.event_name,'')
                and isnull(g.event_previous_timestamp,'')            = isnull(d.event_previous_timestamp,'')
                and isnull(g.event_value_in_usd,'')                  = isnull(d.event_value_in_usd,'')
                and isnull(g.event_bundle_sequence_id,'')            = isnull(d.event_bundle_sequence_id,'')
                and isnull(g.event_server_timestamp_offset,'')       = isnull(d.event_server_timestamp_offset,'')
                and isnull(g.user_id,'')                             = isnull(d.user_id,'')
                and isnull(g.user_pseudo_id,'')                      = isnull(d.user_pseudo_id,'')
                and isnull(g.user_first_touch_timestamp,'')          = isnull(d.user_first_touch_timestamp,'')
order by 
  g.event_date
, g.event_timestamp
, g.event_name
, g.event_previous_timestamp
, g.event_value_in_usd
, g.event_bundle_sequence_id
, g.event_server_timestamp_offset
, g.user_id
, g.user_pseudo_id
, g.user_first_touch_timestamp

enter image description here

like image 523
Matthew Walk Avatar asked Sep 03 '25 16:09

Matthew Walk


2 Answers

Would you try below ? Documentation says,

User

The user fields contain information that uniquely identifies the user associated with the event.

Field name Data type Description
user_id STRING The user ID set via the setUserId API.
user_pseudo_id STRING The pseudonymous id (e.g., app instance ID) for the user.
user_first_touch_timestamp INTEGER The time (in microseconds) at which the user first opened the app or visited the site.
Event

The event fields contain information that uniquely identifies an event.

Field name Data type Description
event_date STRING The date when the event was logged (YYYYMMDD format in the registered timezone of your app).
event_timestamp INTEGER The time (in microseconds, UTC) when the event was logged on the client.
event_previous_timestamp INTEGER The time (in microseconds, UTC) when the event was previously logged on the client.
event_name STRING The name of the event.
event_value_in_usd FLOAT The currency-converted value (in USD) of the event's "value" parameter.
event_bundle_sequence_id INTEGER The sequential ID of the bundle in which these events were uploaded.
event_server_timestamp_offset INTEGER Timestamp offset between collection time and upload time in micros.
like image 135
Jaytiger Avatar answered Sep 05 '25 14:09

Jaytiger


Google Analytics added batch_event_index in the schema on July 16, 2024. With the addition of this, the combination of these five fields should be unique for each record in the GA4 BigQuery event export table: event_name, event_timestamp, user_pseudo_id, event_bundle_sequence_id, batch_event_index.

like image 35
Minhaz Kazi Avatar answered Sep 05 '25 14:09

Minhaz Kazi