Google Analytics Universal
Last updated
Last updated
Google Analytics was a web analytics service offered by Google that tracks and reports website traffic (now replaced with Google Analytics 4). This telescope obtained data from Google Analytics for 1 view id per publisher and for several combinations of metrics and dimensions. It is possible to add a regex expression to filter on pagepaths, so only data on relevant pagepaths is collected. Note that Google Analytics data is only available for the last 26 months, see Data retention - Analytics Help for more info.
To get access to the analytics data a publisher needs to add the relevant google service account as a user.
ANU Press was using custom dimensions in their google analytics data. To ensure that the telescope processes these custom dimensions, the organisation name needs to be set to exactly 'ANU Press'. The organisation name is used directly inside the telescope and if it matches 'ANU Press' additional dimensions will be added and a different BigQuery schema is used.
We use the python client for the The Google Analytics API in order to retrieve the data on several metrics (such as page views) per country. It appears as though the API does not return a result for every country. We would have expected any data without a country field to be labelled with a country name of not set, however this does not appear to be the case. At this time, we have no other way of retrieving country-level data on the desired metrics, so we must acknowledge that the numbers returned by the API are slightly different to those found on the Google Analytics web page. A ticket has been created with google in the hope of resolving this issue.
The name of the organisation as displayed on Google Analytics
The View ID points to the specific view on which Google Analytics data is collected. See the google support page for more information on the hierarchy of the Analytics account.
This is a regular expression that is used to filter on pagepaths for which analytics data is collected. The regular expression can be set to an empty string if no filtering is required. Note that the Google Analytics API uses 're2', so it is not possible to use e.g. negative lookaheads. See the google support page and github wiki for more information.
Create a service account from IAM & Admin - Service Accounts
Create a JSON key and download the file with key
For each organisation/publisher of interest, ask them to add this service account as a user for the correct view id
Downloads a single month of reporting data using the GA3 analytics reporting API. Transforms the data from the report structure into the schema structure and saves it to a .jsonl file.
The transformed data is loaded from the Google Cloud bucket into a partitioned BigQuery table under the google dataset (which will be created should it not exist yet). Since the data is partitioned on the release month, there will only be a single table named google_analytics3.
Name | Description |
---|---|
name | type | mode | description |
---|---|---|---|
oaebu_service_account
The credentials for the service account that has been given access to the google analytics view.
url
STRING
REQUIRED
Base URL of the book pages.
title
STRING
REQUIRED
Title o f the book.
start_date
DATE
REQUIRED
Start date for period of analytics info.
end_date
DATE
REQUIRED
End date for period of analytics info.
average_time
FLOAT
REQUIRED
Average time (in seconds) spent on each page.
unique_views
RECORD
NULLABLE
Unique views for several different dimensions. Unique views is the number of sessions during which the specified page was viewed at least once. A unique pageview is counted for each page URL + page title combination.
unique_views.country
RECORD
REPEATED
Unique views per users' country, derived from their IP addresses or Geographical IDs.
unique_views.country.name
STRING
NULLABLE
Country name.
unique_views.country.value
INTEGER
NULLABLE
Number of unique views.
unique_views.referrer
RECORD
REPEATED
Unique views per referrer, the full referring URL including the hostname and path.
unique_views.referrer.name
STRING
NULLABLE
Referrer name.
unique_views.referrer.value
INTEGER
NULLABLE
Number of unique views.
unique_views.social_network
RECORD
REPEATED
Unique views per social network. This is related to the referring social network for traffic sources; e.g., Google+, Blogger.
unique_views.social_network.name
STRING
NULLABLE
Social network name.
unique_views.social_network.value
INTEGER
NULLABLE
Number of unique views.
page_views
RECORD
NULLABLE
The total number of pageviews for the property
page_views.country
RECORD
REPEATED
Page views per users' country, derived from their IP addresses or Geographical IDs.
page_views.country.name
STRING
NULLABLE
Country name.
page_views.country.value
INTEGER
NULLABLE
Number of page views.
page_views.referrer
RECORD
REPEATED
Page views per referrer, the full referring URL including the hostname and path.
page_views.referrer.name
STRING
NULLABLE
Referrer name.
page_views.referrer.value
INTEGER
NULLABLE
Number of page views.
page_views.social_network
RECORD
REPEATED
Page views per social network. This is related to the referring social network for traffic sources; e.g., Google+, Blogger.
page_views.social_network.name
STRING
NULLABLE
Social network name.
page_views.social_network.value
INTEGER
NULLABLE
Number of page views.
sessions
RECORD
NULLABLE
Total number of sessions for several different dimensions.
sessions.country
RECORD
REPEATED
Unique views per users' country, derived from their IP addresses or Geographical IDs.
sessions.country.name
STRING
NULLABLE
Country name.
sessions.country.value
INTEGER
NULLABLE
Number of sessions.
sessions.source
RECORD
REPEATED
Sessions per source of referrals. For manual campaign tracking, it is the value of the utm_source campaign tracking parameter. For AdWords autotagging, it is google. If you use neither, it is the domain of the source (e.g., document.referrer) referring the users. It may also contain a port address. If users arrived without a referrer, its value is (direct)..
sessions.source.name
STRING
NULLABLE
Source name.
sessions.source.value
INTEGER
NULLABLE
Number of sessions.
release_date
DATE
REQUIRED
Last day of the release month. Table is partitioned on this column.
Dataset Name
Table Names
google_analytics
Table Type
Partitioned
Average Runtime
10 min
Average Download Size
10-20 MB
Harvest Type
API
Run Schedule
Monthly
Catch-up Missed Runs
Each Run Includes All Data