[BigQuery] Expansion and analysis of GA4 event_params with UNNEST: Bad and right queries

bigquery-expansion-and-analysis-of-ga4
This article can be read in about 20 minutes.

Introduction.

When setting up GA4 Export to BigQuery and analyzing GA4 data on BigQuery, one of the most common stumbling blocks is how to parse GA4 event_params.

In this issue, we will summarize on this subject.

What is event_params?

Events in GA4 record specific user actions or changes in site or app state; GA4 uses two main elements for event tracking and analysis: event_name and event_params.

event_name

  • A string that uniquely identifies a particular event. It represents the type of event, e.g., page_view, purchase, sign_up, etc., and can be set by the user.

event_params

  • A repeating record that contains each parameter of the event.

Fields in GA4’s event table are exported to BigQuery (partial).

Field (excerpt)type (e.g. of machine, goods, etc.)memo
event_dateSTRINGDate the event was recorded
event_nameSTRINGEvent name (session_start, page_view, etc. )
event_paramsRECORDA repeating record that contains each parameter of the event.

The event_params column contains an array

event_params is an array of records (or objects) that store additional information related to the event. Each record has a key/value pair that provides context or additional details about the event.

Specifically, multiple event_params are stored as key, value pairs for one event_name as shown below.

JavaScript
{
	"event_name":"session_start",
	"event_params":[
		{"key":"page_referrer", "string_value":"https://www.bing.com/"},
		{"key":"ga_session_id", "int_value":1670474519},
		{"key":"page_location", "string_value":"https://exammple.com/page1"},
		{"key":"page_title", "string_value":"page1"},
		・・・(abridgement)
	]
}

The table exported to BigQuery will show the following.

event_dateevent_timestampevent_nameevent_params.keyevent_params_value.string_value
202202221643673600483790page_viewpage_locationhttps://example.com
page_titleHome
mediumreferral
sourcegoogle
page_referrerGoogle

Frequently set event_params items

  1. page_location (key): URL of the page visited by the user.
  2. page_referrer (key): URL of the page previously visited by the user.
  3. event_category (key): Category of the event, UA era parameter
  4. event_label (key): Label of the event. parameter in UA era
  5. value: Numeric value associated with the event (e.g., purchase amount).

Incidentally, in GA4, parameters such as event_category and event_label from the Universal Analytics (UA) era are not used as standard, but can be set arbitrarily.

4 different types ofevent_paramsvalues (string, int, float, double)

To complicate matters further, there are actually four different types of value inevent_params: string_value, int_value, float_value, and double_value.

They are used to store different types of data depending on the context of the event.

string_value

  • Stores text information. Most commonly used.
  • The URL of the page, the type of action the user performed, the name of the product, the name of the category, etc. For example, this includes information such as which buttons the user clicked and which products they viewed.

int_value

  • Stores integers. Suitable for numerical data that does not contain a decimal point, such as specific counts or IDs that are not large in number.
  • Specific cases: number of page views, number of times a certain event occurred, product ID or category ID, etc.

float_value

  • Stores floating-point numbers. Suitable for numbers containing a decimal point or for very large or small numbers.
  • Specific cases: product review scores or user ratings, average time (in seconds) for a certain action, or discount rates associated with a particular event.

double_value

  • Stores floating-point numbers, but with greater range and precision than float_value. Used for more precise calculations or when a larger range of numbers is needed.
  • Specific case: Used for numerical data that requires higher precision, such as amounts of money, distances, and times. For example, the price of an expensive product, the total time (in milliseconds) a user has used the application, or a long distance (in meters).

The value type is automatically inferred when sending event_params.

When sending GA4 events, gtag.js does not provide the ability to explicitly specify the type of value. Instead, it is automatically inferred based on the type of value passed. In other words, the JavaScript variable type is used as is.

JavaScript
gtag('event', 'purchase', {
  "transaction_id": "T12345", // String
  "value": 9.99, // Number (in JavaScript this would be a float, but GA4 interprets this automatically)
  "currency": "USD", // String
  "items_sold": 5, // Integer
  "shipping_method": "express", // String
  "purchase_time_seconds": 360.5 // Float
});

It is not necessary to explicitly specify the type of value to be sent to the value of event_params in the gtag as shown below.

JavaScript
gtag('event', 'purchase', {
		// There is no need to specify the type explicitly in this way.
		"transaction_id": String "T12345",
		"items_sold": Int 5 
});

Expand and analyze the contents of the event_params column using the UNNEST operator

Since the data is nested in event_params in a key-value format array, the UNNEST operator in SQL is used to expand the elements in the data and convert them to a single row before aggregation.

UNNEST is a function that expands elements in an array or structured data and converts them into a single row.

If you try to access event_params without expanding it, you may get the following type of error

SQL
[ERROR] Cannot access field value on a value with type ARRAY<STRUCT>

Expand the following event_params array with the UNNEST function.

Here, key and it’s corresponding value are extracted from event_params. since value may have multiple types, various values ( string_value, int_value, etc. ) are selectively obtained.

JavaScript
{
	"event_name":"session_start",
	"event_params":[
		{"key":"page_referrer", "string_value":"https://www.bing.com/"},
		{"key":"ga_session_id", "int_value":1670474519},
		{"key":"page_location", "string_value":"https://exammple.com/page1"},
		{"key":"page_title", "string_value":"page1"},
		・・・(abridgement)
	]
}
SQL
SELECT 
  event_name,
  param.key AS param_key,
  param.value.string_value AS string_value,
  param.value.int_value AS int_value,
  param.value.float_value AS float_value,
  param.value.double_value AS double_value
FROM 
  `your_dataset.your_table`,
  UNNEST(event_params) AS param

Execution of this query yields the following tabular results.

event_nameparam_keystring_valueint_valuefloat_valuedouble_value
session_startpage_referrerhttps://www.bing.com/nullnullnull
session_startga_session_idnull1670474519nullnull
session_startpage_locationhttps://exammple.com/page1nullnullnull
session_startpage_titlepage1nullnullnull

Filter by event_name, filter by event_params

In actual data analysis, it is common to first filter data by event_name and then further filter by a specific param_key and its string_value (or other value type).

This is because different event_name may have the same param_key, and filtering by param_key andvalue alone will include unrelated event data. For example, a session_start event and a page_view event may both have a param_key ofpage_location, but the context of these events is completely different, so filtering by event_name first is important.

In the case of event_params, subqueries are often created using theWITHclause to manage the complexity of the query. Subqueries allow each part of the query to be modularized to create a more readable and maintainable SQL query.

The following identifies and filters events by event_name in a subquery, and then further narrows it down by specific param_key andstring_value in event_params in the subsequent main query.

SQL
// Sub-query: filter by event_name
WITH EventDetails AS (
  SELECT
    event_name,
    param.key AS param_key,
    param.value.string_value AS string_value
  FROM
    `your_dataset.your_table`,
    UNNEST(event_params) AS param
  WHERE
    event_name = 'session_start'
)

// Main query: further filtering by event_params
SELECT *
FROM EventDetails
WHERE
  param_key = 'page_location' AND string_value = 'https://exammple.com/page1'

Incidentally, the above query can also bring param_key into the subquery as shown below. Only events whose event_name is session_start and whose param_key ispage_location are selected in the subquery. As a result, the main query only needs to check if page_location matches a specific URL.

Using param_key in a subquery may reduce the amount of data handled in subsequent processing and improve query execution efficiency.

SQL
WITH EventDetails AS (
  SELECT
    event_name,
    param.key AS param_key,
    param.value.string_value AS string_value
  FROM
    `your_dataset.your_table`,
    UNNEST(event_params) AS param
  WHERE
    event_name = 'session_start'
    AND param.key = 'page_location'
)

SELECT *
FROM EventDetails
WHERE
  string_value = 'https://exammple.com/page1'

Bad and correct queries for filters in event_params

Examples of bad queries

This query example attempts to apply multiple filters by event_category and event_label to the same params entry, which does not work as expected.

The problem with this query is that it expects params.key to be 'event_category' and 'event_label' at the same time. This is not possible, and as a result, the query returns zero results.

JavaScript
SELECT
  event_name,
  COUNT(*) as event_count
FROM
  `your_project.your_dataset.your_table`,
  UNNEST(event_params) as params
WHERE
  _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 4 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
  AND event_name = 'show'
  AND (params.key = 'event_category' AND params.value.string_value = 'dummy_category')
  AND (params.key = 'event_label' AND params.value.string_value = 'dummy_label')
GROUP BY
  event_name;

Example of a correct query

Extract event_category andevent_label separately using subqueries and filtering accordingly.

The query first extracts categories and labels from event_params with WITH clause and stores them as separate columns in the intermediate table filtered_events. Then, from this intermediate table, records whose category is'dummy_category' and whose label is'dummy_label' are selected and counted. This allows only events with specific categories and labels to be accurately counted.

JavaScript
WITH filtered_events AS (
  SELECT
    event_name,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'event_category') as category,
    (SELECT value.string_value FROM UNNEST(event_params) WHERE key = 'event_label') as label
  FROM
    `your_project.your_dataset.your_table`
  WHERE
    event_name = 'show'
    AND _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 4 DAY)) AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
)

SELECT
  event_name,
  COUNT(*) as event_count
FROM
  filtered_events
WHERE
  category = 'dummy_category'
  AND label = 'dummy_label'
GROUP BY
  event_name;

コメント

Copied title and URL