Introduction.
When setting up GA4 Export to BigQuery and analyzing GA4 data on BigQuery, one of the most common stumbling blocks is how to parse GA4 event_params.
In this issue, we will summarize on this subject.
What is event_params?
Events in GA4 record specific user actions or changes in site or app state; GA4 uses two main elements for event tracking and analysis: event_name and event_params.
event_name
- A string that uniquely identifies a particular event. It represents the type of event, e.g.,
page_view
,purchase
,sign_up, etc.
, and can be set by the user.
event_params
- A repeating record that contains each parameter of the event.
Fields in GA4’s event table are exported to BigQuery (partial).
Field (excerpt) | type (e.g. of machine, goods, etc.) | memo |
event_date | STRING | Date the event was recorded |
event_name | STRING | Event name (session_start , page_view, etc. ) |
event_params | RECORD | A repeating record that contains each parameter of the event. |
The event_params column contains an array
event_params is
an array of records (or objects) that store additional information related to the event. Each record has a key/value
pair that provides context or additional details about the event.
Specifically, multiple event_params are stored as key, value pairs for one event_name as shown below.
{
"event_name":"session_start",
"event_params":[
{"key":"page_referrer", "string_value":"https://www.bing.com/"},
{"key":"ga_session_id", "int_value":1670474519},
{"key":"page_location", "string_value":"https://exammple.com/page1"},
{"key":"page_title", "string_value":"page1"},
・・・(abridgement)
]
}
The table exported to BigQuery will show the following.
event_date | event_timestamp | event_name | event_params.key | event_params_value.string_value |
20220222 | 1643673600483790 | page_view | page_location | https://example.com |
page_title | Home | |||
medium | referral | |||
source | ||||
page_referrer | ||||
Frequently set event_params items
- page_location (key): URL of the page visited by the user.
- page_referrer (key): URL of the page previously visited by the user.
- event_category (key): Category of the event, UA era parameter
- event_label (key): Label of the event. parameter in UA era
- value: Numeric value associated with the event (e.g., purchase amount).
Incidentally, in GA4, parameters such as
event_category and
event_label
from the Universal Analytics (UA) era are not used as standard, but can be set arbitrarily.
4 different types ofevent_paramsvalues (string, int, float, double)
To complicate matters further, there are actually four different types of value in
event_params
: string_value
, int_value
, float_value
, and double_value
.
They are used to store different types of data depending on the context of the event.
string_value
- Stores text information. Most commonly used.
- The URL of the page, the type of action the user performed, the name of the product, the name of the category, etc. For example, this includes information such as which buttons the user clicked and which products they viewed.
int_value
- Stores integers. Suitable for numerical data that does not contain a decimal point, such as specific counts or IDs that are not large in number.
- Specific cases: number of page views, number of times a certain event occurred, product ID or category ID, etc.
float_value
- Stores floating-point numbers. Suitable for numbers containing a decimal point or for very large or small numbers.
- Specific cases: product review scores or user ratings, average time (in seconds) for a certain action, or discount rates associated with a particular event.
double_value
- Stores floating-point numbers, but with greater range and precision than
float_value
. Used for more precise calculations or when a larger range of numbers is needed. - Specific case: Used for numerical data that requires higher precision, such as amounts of money, distances, and times. For example, the price of an expensive product, the total time (in milliseconds) a user has used the application, or a long distance (in meters).
The value type is automatically inferred when sending event_params.
When sending GA4 events, gtag.js does not provide the ability to explicitly specify the type of value. Instead, it is automatically inferred based on the type of value passed. In other words, the JavaScript variable type is used as is.
gtag('event', 'purchase', {
"transaction_id": "T12345", // String
"value": 9.99, // Number (in JavaScript this would be a float, but GA4 interprets this automatically)
"currency": "USD", // String
"items_sold": 5, // Integer
"shipping_method": "express", // String
"purchase_time_seconds": 360.5 // Float
});
It is not necessary to explicitly specify the type of value to be sent to the value of event_params in the gtag as shown below.
gtag('event', 'purchase', {
// There is no need to specify the type explicitly in this way.
"transaction_id": String "T12345",
"items_sold": Int 5
});
Expand and analyze the contents of the event_params column using the UNNEST operator
Since the data is nested in event_params in a key-value format array, the UNNEST operator in SQL is used to expand the elements in the data and convert them to a single row before aggregation.
UNNEST
is a function that expands elements in an array or structured data and converts them into a single row.
If you try to access event_params
without expanding it, you may get the following type of error
[ERROR] Cannot access field value on a value with type ARRAY<STRUCT>
Expand the following event_params
array with the UNNEST function.
Here, key and
it’s corresponding value
are extracted from event_params
. since value
may have multiple types, various values
( string_value
, int_value, etc.
) are selectively obtained.
{
"event_name":"session_start",
"event_params":[
{"key":"page_referrer", "string_value":"https://www.bing.com/"},
{"key":"ga_session_id", "int_value":1670474519},
{"key":"page_location", "string_value":"https://exammple.com/page1"},
{"key":"page_title", "string_value":"page1"},
・・・(abridgement)
]
}
SELECT
event_name,
param.key AS param_key,
param.value.string_value AS string_value,
param.value.int_value AS int_value,
param.value.float_value AS float_value,
param.value.double_value AS double_value
FROM
`your_dataset.your_table`,
UNNEST(event_params) AS param
Execution of this query yields the following tabular results.
event_name | param_key | string_value | int_value | float_value | double_value |
session_start | page_referrer | https://www.bing.com/ | null | null | null |
session_start | ga_session_id | null | 1670474519 | null | null |
session_start | page_location | https://exammple.com/page1 | null | null | null |
session_start | page_title | page1 | null | null | null |
Filter by event_name, filter by event_params
In actual data analysis, it is common to first filter data by event_name
and then further filter by a specific param_key and
its string_value
(or other value
type).
This is because different event_name
may have the same param_key
, and filtering by param_key and
value alone
will include unrelated event data. For example, a session_start
event and a page_view
event may both have a param_key of
page_location
, but the context of these events is completely different, so filtering by event_name
first is important.
In the case of event_params, subqueries are often created using theWITH
clause to manage the complexity of the query. Subqueries allow each part of the query to be modularized to create a more readable and maintainable SQL query.
The following identifies and filters events by event_name
in a subquery, and then further narrows it down by specific param_key and
string_value
in event_params in the subsequent main query.
// Sub-query: filter by event_name
WITH EventDetails AS (
SELECT
event_name,
param.key AS param_key,
param.value.string_value AS string_value
FROM
`your_dataset.your_table`,
UNNEST(event_params) AS param
WHERE
event_name = 'session_start'
)
// Main query: further filtering by event_params
SELECT *
FROM EventDetails
WHERE
param_key = 'page_location' AND string_value = 'https://exammple.com/page1'
Incidentally, the above query can also bring param_key into the subquery as shown below. Only events whose event_name
is session_start
and whose param_key is
page_location
are selected in the subquery. As a result, the main query only needs to check if page_location
matches a specific URL.
Using param_key
in a subquery may reduce the amount of data handled in subsequent processing and improve query execution efficiency.
WITH EventDetails AS (
SELECT
event_name,
param.key AS param_key,
param.value.string_value AS string_value
FROM
`your_dataset.your_table`,
UNNEST(event_params) AS param
WHERE
event_name = 'session_start'
AND param.key = 'page_location'
)
SELECT *
FROM EventDetails
WHERE
string_value = 'https://exammple.com/page1'
Bad and correct queries for filters in event_params
Examples of bad queries
This query example attempts to apply multiple filters by event_category and event_label to the same params
entry, which does not work as expected.
The problem with this query is that it expects params.key
to be 'event_category'
and 'event_label'
at the same time. This is not possible, and as a result, the query returns zero results.
SELECT
event_name,
COUNT(*) as event_count
FROM
`your_project.your_dataset.your_table`,
UNNEST(event_params) as params
WHERE
_TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 4 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
AND event_name = 'show'
AND (params.key = 'event_category' AND params.value.string_value = 'dummy_category')
AND (params.key = 'event_label' AND params.value.string_value = 'dummy_label')
GROUP BY
event_name;
Example of a correct query
Extract event_category and
event_label
separately using subqueries and filtering accordingly.
The query first extracts categories and labels from event_params
with WITH
clause and stores them as separate columns in the
intermediate table filtered_events
. Then, from this intermediate table, records whose category is
'dummy_category'
and whose label is
'dummy_label'
are selected and counted. This allows only events with specific categories and labels to be accurately counted.
WITH filtered_events AS (
SELECT
event_name,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'event_category') as category,
(SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'event_label') as label
FROM
`your_project.your_dataset.your_table`
WHERE
event_name = 'show'
AND _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 4 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
)
SELECT
event_name,
COUNT(*) as event_count
FROM
filtered_events
WHERE
category = 'dummy_category'
AND label = 'dummy_label'
GROUP BY
event_name;
コメント