📚
Book Analytics Service
  • 📚Dashboard overview
    • Book Analytics Service
    • How the Dashboard works
    • Dashboard data sources
    • How to use your Dashboard
    • More information and contact us
      • Glossary
      • License
      • Contributing Guide
  • 🖱️Installing BAD Workflows
  • 🔭Workflows & Telescopes
    • Workflow Schedule
    • Data Telescopes
      • Google Analytics Universal
      • Google Books
      • IRUS Fulcrum
      • IRUS OAPEN
      • JSTOR
      • UCL Discovery
      • UCL Sales
    • Metadata Telescopes
      • OAPEN Metadata
      • ONIX
      • Thoth
    • ONIX Workflow
      • Data Partners
      • Schemas
      • Crossref Metadata
Powered by GitBook
On this page
  • Custom dimensions for ANU Press
  • A note on the API metrics
  • Telescope kwargs
  • Organisation Name (organisation_name)
  • View ID (view_id)
  • Pagepath Regex (pagepath_regex)
  • Setting up service account
  • Airflow connections
  • Telescope Tasks
  • Data Download & Transform
  • Big Query Load
  • Table Schema
  1. Workflows & Telescopes
  2. Data Telescopes

Google Analytics Universal

PreviousData TelescopesNextGoogle Books

Last updated 2 days ago

Google Analytics was a web analytics service offered by Google that tracked and reported website traffic (now replaced with Google Analytics 4). This telescope is deprecated and no longer supported by BAD, however its historical data is still used by the ONIX Workflow.

This telescope obtains data from Google Analytics for one view id per publisher and for several combinations of metrics and dimensions. It is possible to add a regex expression to filter on pagepaths, so only data on relevant pagepaths is collected. Note that Google Analytics data is only available for the last 26 months, see for more info.

To get access to the Google Analytics data a publisher, as a user, needs to add the relevant google service account.

Dataset Name

google

Table Names

google_analytics

Table Type

Partitioned

Average Runtime

10 min

Average Download Size

10-20 MB

Harvest Type

API

Run Schedule

Monthly

Catch-up Missed Runs

Each Run Includes All Data

Custom dimensions for ANU Press

ANU Press was using custom dimensions in their Google Analytics data. To ensure that the telescope processes these custom dimensions, the organisation name needs to be set to exactly 'ANU Press'. The organisation name is used directly inside the telescope and if it matches 'ANU Press' additional dimensions are added and a different BigQuery schema is used.

A note on the API metrics

We use the python client for the The Google Analytics API in order to retrieve the data on several metrics (such as page views) per country. It appears as though the API does not return a result for every country. We would have expected any data without a country field to be labelled with a country name of "not set", however this does not appear to be the case. At this time, we have no other way of retrieving country-level data on the desired metrics, so we must acknowledge that the numbers returned by the API are slightly different to those found on the Google Analytics web page. A support has been created with google in the hope of resolving this issue.

Telescope kwargs

Organisation Name (organisation_name)

The name of the organisation as displayed on Google Analytics

View ID (view_id)

Pagepath Regex (pagepath_regex)

Setting up service account

  • Create a service account from IAM & Admin - Service Accounts

  • Create a JSON key and download the file with key

  • For each organisation/publisher of interest, ask them to add this service account as a user for the correct view id

Getting the view ID (after given access)
from googleapiclient.discovery import build
from oauth2client.service_account import ServiceAccountCredentials

scopes = ['https://www.googleapis.com/auth/analytics.readonly']
credentials_path = '/path/to/service_account_credentials.json'

creds = ServiceAccountCredentials.from_json_keyfile_name(credentials_path, scopes=scopes)

# Build the service object.
service = build('analytics', 'v3', credentials=creds)

account_summaries = service.management().accountSummaries().list().execute()
view_ids = []
for account in account_summaries['items']:
    account_name = account['name']
    profiles = account['webProperties'][0]['profiles']
    website_url = account['webProperties'][0]['websiteUrl']
    for profile in profiles:
        view_id_info = {'account': account_name, 'websiteUrl': website_url, 'view_id': profile['id'], 
                        'view_name': profile['name']}
        view_ids.append(view_id_info)

Airflow connections

Name
Description

oaebu_service_account

The credentials for the service account that has been given access to the google analytics view.

Telescope Tasks

Data Download & Transform

Big Query Load

The transformed data is loaded from the Google Cloud bucket into a partitioned BigQuery table under the google dataset (which will be created should it not exist yet). Since the data is partitioned on the release month, there will only be a single table named google_analytics3.

Table Schema

Name
Type
Mode
Description

url

STRING

REQUIRED

Base URL of the book pages.

title

STRING

REQUIRED

Title o f the book.

start_date

DATE

REQUIRED

Start date for period of analytics info.

end_date

DATE

REQUIRED

End date for period of analytics info.

average_time

FLOAT

REQUIRED

Average time (in seconds) spent on each page.

unique_views

RECORD

NULLABLE

Unique views for several different dimensions. Unique views is the number of sessions during which the specified page was viewed at least once. A unique pageview is counted for each page URL + page title combination.

unique_views.country

RECORD

REPEATED

Unique views per users' country, derived from their IP addresses or Geographical IDs.

unique_views.country.name

STRING

NULLABLE

Country name.

unique_views.country.value

INTEGER

NULLABLE

Number of unique views.

unique_views.referrer

RECORD

REPEATED

Unique views per referrer, the full referring URL including the hostname and path.

unique_views.referrer.name

STRING

NULLABLE

Referrer name.

unique_views.referrer.value

INTEGER

NULLABLE

Number of unique views.

unique_views.social_network

RECORD

REPEATED

Unique views per social network. This is related to the referring social network for traffic sources; e.g., Google+, Blogger.

unique_views.social_network.name

STRING

NULLABLE

Social network name.

unique_views.social_network.value

INTEGER

NULLABLE

Number of unique views.

page_views

RECORD

NULLABLE

The total number of pageviews for the property

page_views.country

RECORD

REPEATED

Page views per users' country, derived from their IP addresses or Geographical IDs.

page_views.country.name

STRING

NULLABLE

Country name.

page_views.country.value

INTEGER

NULLABLE

Number of page views.

page_views.referrer

RECORD

REPEATED

Page views per referrer, the full referring URL including the hostname and path.

page_views.referrer.name

STRING

NULLABLE

Referrer name.

page_views.referrer.value

INTEGER

NULLABLE

Number of page views.

page_views.social_network

RECORD

REPEATED

Page views per social network. This is related to the referring social network for traffic sources; e.g., Google+, Blogger.

page_views.social_network.name

STRING

NULLABLE

Social network name.

page_views.social_network.value

INTEGER

NULLABLE

Number of page views.

sessions

RECORD

NULLABLE

Total number of sessions for several different dimensions.

sessions.country

RECORD

REPEATED

Unique views per users' country, derived from their IP addresses or Geographical IDs.

sessions.country.name

STRING

NULLABLE

Country name.

sessions.country.value

INTEGER

NULLABLE

Number of sessions.

sessions.source

RECORD

REPEATED

Sessions per source of referrals. For manual campaign tracking, it is the value of the utm_source campaign tracking parameter. For AdWords autotagging, it is google. If you use neither, it is the domain of the source (e.g., document.referrer) referring the users. It may also contain a port address. If users arrived without a referrer, its value is (direct)..

sessions.source.name

STRING

NULLABLE

Source name.

sessions.source.value

INTEGER

NULLABLE

Number of sessions.

release_date

DATE

REQUIRED

Last day of the release month. Table is partitioned on this column.

The View ID points to the specific view on which Google Analytics data is collected. See for more information on the hierarchy of the Analytics account.

This is a regular expression that is used to filter on pagepaths for which analytics data is collected. The regular expression can be set to an empty string if no filtering is required. Note that the Google Analytics API uses 're2', so it is not possible to use e.g. negative lookaheads. See and for more information.

This downloads a single month of reporting data using the GA3 analytics reporting API. It transforms the data from the report structure into the structure and saves it to a .jsonl file.

🔭
Data retention - Analytics Help
ticket
the google support page
the google support page
github wiki
schema
✅
❌