BigQuery

Overview

Destination Type:

Server-side

Installation instructions

1. Create a BigQuery Table for your Converge Data

Log into your Google Cloud console at console.cloud.google.com
Choose your relevant Google Cloud Project from the top bar. If you do not have a Google Cloud Project yet, you can create one following the instructions here.
From the top search bar, search for bigquery and pick the BigQuery service.
In the BigQuery Explorer, click on Add.
From the side-pane, choose Google Cloud Storage, this will open the Create table modal.
In the Create table modal, choose Empty table as a Source. Create a new Data set and call it converge_data, in the Table-field choose a name for your table, e.g. events.

In the Schema section, flip the switch to Edit as text and paste the schema below. Click on Create Table after.

Schema

JSON

[
  {
    "name": "id",
    "type": "STRING",
    "mode": "REQUIRED",
    "description": "Event ID"
  },
  {
    "name": "timestamp",
    "type": "DATETIME",
    "mode": "REQUIRED",
    "description": "Timestamp when the event occurred"
  },
  {
    "name": "event_name",
    "type": "STRING",
    "mode": "REQUIRED"
  },
  {
    "name": "profile_id",
    "type": "INTEGER",
    "mode": "REQUIRED",
    "description": "Profile ID"
  },
  {
    "name": "event_properties",
    "type": "JSON",
    "mode": "NULLABLE",
    "description": "Properties of the event"
  },
  {
    "name": "profile_properties",
    "type": "JSON",
    "mode": "NULLABLE",
    "description": "Properties of the profile this event belongs to"
  },
  {
    "name": "session_id",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "Session Id of the session this event belongs to"
  },
  {
    "name": "session_properties",
    "type": "JSON",
    "mode": "NULLABLE",
    "description": "Properties of the session this event belongs to"
  },
  {
    "name": "source_id",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "ID of the source that generated this event prefixed with 'datasource_' or 'pixel_' depending on the source type"
  },
  {
    "name": "source_name",
    "type": "STRING",
    "mode": "NULLABLE",
    "description": "Name of the source that generated this event"
  }
]

Great, now you have a table for your Converge Events, follow the guide below to set up a destination that streams Converge events into this BigQuery Table

2. Stream Converge Data into your BigQuery table

The first thing we need to do in Google Cloud is set up a service account. Start by going to the sidebar menu and choosing IAM and admin > Service accounts.
Click on Create Service Account.
Fill in service account name, e.g. converge-bigq-account and click on Create and continue.

Under Grant this service account access to the project, choose the BigQuery Data Owner scope and click on Done.

Pick your newly created service account from the list of Service Accounts and choose Manage keys.

Click on Add Key and Create new key. In the popup choose JSON and click on Create. This will download a file that ends with .json.

In the Converge sidebar, click on
Event streams
and then click Create new Destination.
Select the BigQuery destination and click Next.
Give your destination a name (e.g. BigQuery Destination) and click Next.
Fill in the Credentials from your service account, Dataset ID and Table ID you copied from Google Cloud and click Next.

You can also stream profile merges to BigQuery. This is optional, but recommended for constructing accurate profile journeys. Enter a Merged Profiles Table ID to enable it.

Select the sources you want to forward events from, then click Next.
Review any issues and click Submit.

Converge automatically sends all implemented events from the Converge event spec and any custom events to BigQuery.

Multi-touch attribution support

Converge preprocesses all events and enriches them with all available profile and session information to make your attribution data warehouse queries as simple as possible.

Profile Information

Every event that is forwarded includes all available information Converge has on that profile until that moment, ranging from $ip_address all the way to $email. Events that typically do not have an $email property associated with them (e.g. $page_load) will still contain this property if we were able to identify this profile before the $page_load happened.

Profile Information Overview

Profile Merging

Converge may combine multiple profiles into one if there is an overlap in aliases. This means that the profile_id field in your events table may become out of date. Under the ‘Advanced’ tab of the BigQuery destination configuration, you can configure a table to stream these profile merges to. You can create this table as follows:

Under your Converge dataset (converge_data), create a new table, e.g. profile_merges.

In the Schema section, flip the switch to Edit as text and paste the schema below. Click on Create Table after.

Schema

JSON

[
  {
    "name": "timestamp",
    "type": "DATETIME",
    "mode": "REQUIRED",
    "description": "Timestamp when the event occurred"
  },
  {
   "name": "old_profile_id",
   "type": "INTEGER",
   "mode": "REQUIRED",
   "description": "Old Profile ID that was merged"
  },
  {
    "name": "new_profile_id",
    "type": "INTEGER",
    "mode": "REQUIRED",
    "description": "New Profile ID resulting from the merge"
  }
]

To use this table, it’s important to note that profiles maybe merged multiple times. For example profile A and B could be mapped to C as follows:

(A, C)
(B, C) Later on, profile C could be merged into profile D:
(C, D) In this case, profiles A, B, and C, would all be mapped to D. To deal with this, we recommend creating a (materialized) view that recursively resolves these mappings. Here’s an example query to achieve this:

SQL

WITH RECURSIVE profile_merges AS (
  SELECT 
    timestamp, 
    old_profile_id, 
    new_profile_id, 
    ARRAY[old_profile_id] AS visited_profiles,
    1 AS depth,
    FALSE AS is_circular_reference
  FROM 
    `your-project-name.your-dataset-name.profile_merges`
  UNION ALL
  SELECT 
    pmr.timestamp, 
    pm.old_profile_id,
    pmr.new_profile_id, 
    ARRAY_CONCAT(pm.visited_profiles, [pmr.new_profile_id]) AS visited_profiles,
    pm.depth + 1 AS depth,
    IF(pmr.new_profile_id IN UNNEST(pm.visited_profiles), TRUE, pm.is_circular_reference) AS is_circular_reference
  FROM 
    profile_merges pm
  JOIN 
    `your-project-name.your-dataset-name.profile_merges` pmr
  ON 
    pm.new_profile_id = pmr.old_profile_id
  WHERE
    pmr.timestamp > pm.timestamp
    AND NOT pm.is_circular_reference
)
SELECT 
  old_profile_id, 
  MAX_BY(new_profile_id, depth) AS new_profile_id
FROM 
  profile_merges
GROUP BY 
  old_profile_id;

Querying attribution data

The attributed first_touch_properties etc. columns are deprecated and are no longer being filled, see here.

The events table contains all the data you need to build custom attribution models. You can easily construct the customer’s touchpoints through the session_id and session_properties columns. A multi-touch model can be broken down into the following steps:

Selecting conversion events
Selecting touchpoints
Joining touchpoints with conversion events according to the attribution window
Applying the attribution model
Aggregating conversion metrics by touchpoint

This model can be adapted to 100% flexibly build attribution insights. The examples below show how to build basic first-touch and last-touch models and count the number of orders in BigQuery.

First Touch

For example, the following query represents a 28-day first-touch model.

  WITH _events AS (SELECT * FROM `your-project-name.your-dataset-name.your-table-name`),
  _1_conversions AS (
    SELECT
        id,
        profile_id,
        timestamp
    FROM _events
    WHERE
        timestamp > CURRENT_DATETIME() - INTERVAL 7 day
        AND event_name = 'Placed Order'
  ), 
  _2_touchpoints AS (
    SELECT
        profile_id,
        session_id,
        MIN_BY(session_properties, timestamp) AS session_properties,
        MIN(timestamp) AS timestamp
    FROM _events
    WHERE
        timestamp > CURRENT_DATETIME() - INTERVAL 7 day - INTERVAL 7 day
        AND event_name = '$page_load'
    GROUP BY profile_id, session_id
  ),
  _3_conversions_touchpoints AS (
    SELECT
      c.id AS id,
      t.timestamp AS touchpoint_timestamp,
      t.session_properties AS touchpoint_properties
    FROM _1_conversions AS c
    LEFT JOIN _2_touchpoints AS t
    ON c.profile_id = t.profile_id
        AND t.timestamp BETWEEN c.timestamp - INTERVAL 28 day AND c.timestamp
  ),
  _4_attributed_conversions AS (
    SELECT
        id,
        MIN_BY(touchpoint_properties, touchpoint_timestamp) AS touchpoint_properties,
        MIN(touchpoint_timestamp) AS touchpoint_timestamp
    FROM _3_conversions_touchpoints
    GROUP BY id
  )
  SELECT
      JSON_VALUE(touchpoint_properties, '$.utm_source') AS first_touch_utm_source,
      JSON_VALUE(touchpoint_properties, '$.utm_medium') AS first_touch_utm_medium,
      COUNT(*) AS conversions
  FROM _4_attributed_conversions
  GROUP BY first_touch_utm_source, first_touch_utm_medium

Last Touch

For example, the following query represents a 7-day last-touch model.

  WITH _events AS (SELECT * FROM `your-project-name.your-dataset-name.your-table-name`),
  _1_conversions AS (
    SELECT
        id,
        profile_id,
        timestamp
    FROM _events
    WHERE
        timestamp > CURRENT_DATETIME() - INTERVAL 7 day
        AND event_name = 'Placed Order'
  ), 
  _2_touchpoints AS (
    SELECT
        profile_id,
        session_id,
        MIN_BY(session_properties, timestamp) AS session_properties,
        MIN(timestamp) AS timestamp
    FROM _events
    WHERE
        timestamp > CURRENT_DATETIME() - INTERVAL 7 day - INTERVAL 7 day
        AND event_name = '$page_load'
    GROUP BY profile_id, session_id
  ),
  _3_conversions_touchpoints AS (
    SELECT
      c.id AS id,
      t.timestamp AS touchpoint_timestamp,
      t.session_properties AS touchpoint_properties
    FROM _1_conversions AS c
    LEFT JOIN _2_touchpoints AS t
    ON c.profile_id = t.profile_id
        AND t.timestamp BETWEEN c.timestamp - INTERVAL 28 day AND c.timestamp
    ),
  _4_attributed_conversions AS (
    SELECT
        id,
        MAX_BY(touchpoint_properties, touchpoint_timestamp) AS touchpoint_properties,
        MAX(touchpoint_timestamp) AS touchpoint_timestamp
    FROM _3_conversions_touchpoints
    GROUP BY id
  )
  SELECT
      JSON_VALUE(touchpoint_properties, '$.utm_source') AS last_touch_utm_source,
      JSON_VALUE(touchpoint_properties, '$.utm_medium') AS last_touch_utm_medium,
      COUNT(*) AS conversions
  FROM _4_attributed_conversions
  GROUP BY last_touch_utm_source, last_touch_utm_medium

The session columns have the following structure:

session_id

string

required

Session ID of the event

session_properties

Session Properties Object

required

Properties containing information about the session of the forwarded event

Show properties

landing_url

string

The landing URL of the session.

referrer

string

Referrer information for the session.

utm_source

string

UTM source information for the session.

utm_medium

string

UTM medium information for the session.

utm_campaign

string

UTM campaign information for the session.

utm_content

string

UTM content information for the session.

utm_term

string

UTM term information for the session.

Deprecation of pre-processed attribution columns

Converge previously supported the following columns:

first_touch_properties and first_touch_session_id
last_touch_properties and last_touch_session_id
first_touch_paid_properties and first_touch_paid_session_id
last_touch_paid_properties and last_touch_paid_session_id

These columns are deprecated and are no longer being filled. The recommended way to build multi-touch attribution models is to use the session_id and session_properties columns in the Converge data. These allow full flexibility in building custom attribution models while maintaining correctness.

An example query

As an example, the following query on top of the Converge Table gives you the number of times an SKU was ordered:

SQL

SELECT
  string(item.sku) sku, count(*) order_count
FROM `your-project-name.your-dataset-name.your-table-name`
LEFT JOIN unnest(json_extract_array(event_properties.items,'$')) item
WHERE event_name = 'Placed Order'
GROUP BY sku
ORDER BY order_count DESC

Make sure you replace your-project-name, your-dataset-name and your-table-name with your appropriate values.

Converge functionality

This integration supports the following Converge destination functionality.

Converge Feature	Supported
Custom Events
Filters
Server-side Conversions

Event mapping

Converge automatically sends all implemented events from the Converge Spec and any custom events to BigQuery.

Getting started

Implementation

Sources

Destinations

Data Warehouse

Attribution

Insights & Dashboards

Marketing Sources

Data Management

Settings

Overview

Installation instructions

Multi-touch attribution support

Profile Information

Profile Merging

Querying attribution data

Deprecation of pre-processed attribution columns

An example query

Converge functionality

Event mapping

Getting started

Implementation

Sources

Destinations

Data Warehouse

Attribution

Insights & Dashboards

Marketing Sources

Data Management

Settings

​Overview

​Installation instructions

​Multi-touch attribution support

​Profile Information

​Profile Merging

​Querying attribution data

​Deprecation of pre-processed attribution columns

​An example query

​Converge functionality

​Event mapping

Overview

Installation instructions

Multi-touch attribution support

Profile Information

Profile Merging

Querying attribution data

Deprecation of pre-processed attribution columns

An example query

Converge functionality

Event mapping