📚
Book Analytics Service
  • 📚Dashboard overview
    • Book Analytics Service
    • How the Dashboard works
    • Dashboard data sources
    • How to use your Dashboard
    • More information and contact us
      • Glossary
      • License
      • Contributing Guide
  • 🖱️Installing BAD Workflows
  • 🔭Workflows & Telescopes
    • Workflow Schedule
    • Data Telescopes
      • Google Analytics Universal
      • Google Books
      • IRUS Fulcrum
      • IRUS OAPEN
      • JSTOR
      • UCL Discovery
      • UCL Sales
    • Metadata Telescopes
      • OAPEN Metadata
      • ONIX
      • Thoth
    • ONIX Workflow
      • Data Partners
      • Schemas
      • Crossref Metadata
Powered by GitBook
On this page
  • Workflow kwargs
  • Sensor DAG IDs (sensor_dag_ids)
  • Data Partners (data_partners)
  • Workflow Tasks
  • Aggregate Works
  • Create Crossref Metadata table
  • Create Book table
  • Create intermediate tables
  • Create Book Product table
  • Create Export tables
  • Create Latest Views
  1. Workflows & Telescopes

ONIX Workflow

The ONIX Workflow filters and aggregates the data ingested by the telescopes.

PreviousThothNextData Partners

Last updated 2 days ago

BAD has a single core workflow - the ONIX Workflow, which is responsible for filtering and aggregating all data for a publisher. The workflow prepares the data so that it can be easily accessed/imported by a dashboarding tool. The ONIX Workflow can broadly broken into three parts:

  1. Aggregating and Mapping book products into works and work families

  2. Linking data from metric providers to book products

  3. Creating export tables for visualisation in dashboards

Average Runtime

10-20 min

Run Schedule

Weekly and on the 5th of every month

Catch-up Missed Runs

Each Run Includes All Data

Workflow kwargs

Sensor DAG IDs (sensor_dag_ids)

A list of DAG IDs. Upon instantiation, the workflow will create a sensor task for each of the supplied DAGs. Each of the sensors will look back 7 days for a completed DAG. If there are no runs in the last 7 days, the sensor will be marked as a success. Otherwise, the sensor will wait for the DAG to complete before marking itself as a success.

Data Partners (data_partners)

Workflow Tasks

Aggregate Works

Dataset Name

onix_workflow

Table Names

onix_workfamilyid_isbn, onix_workid_isbn, onix_workid_isbn_errors

Average Runtime

1-5 min

Average Download Size

0-1 MB

Table Type

Sharded

  1. Aggregate book product records into works records. Works are equivalence classes of products, where each product in the class is a manifestation of each other. For example, a PDF and a paperback of the same work.

  2. Aggregate work records into work family records. A work family is an equivalence class of works where each work in the class is just a different edition.

  3. Produce intermediate lookup tables mapping ISBN13 -> WorkID and ISBN13 -> WorkFamilyID.

  4. Produce intermediate tables that append work_id and work_family_id columns to different data tables with ISBN keys.

Definitions - Product, Work and Work Families
  • Product: A product is a manifestation of a work, and will have its own ISBN. There may be several DOIs linked to a single product though (or sometimes none at all).

  • Work: Can be a collection of products, which are each different manifestation of the same work. Some datasets have unique IDs assigned to the concept of a work, but these are not as clear as the usage of ISBN for a product.

  • Edition: Is a new Work, but is derived as a revision from an existing work as opposed to being entirely new.

  • Work Family is a collection of works which are different editions of each other.

The Work ID will be an arbitrary ISBN representative from a product in the equivalence class.

The Work Family ID will be an arbitrary Work ID (ISBN) representative from a work in the equivalence class.

Create Crossref Metadata table

Create Book table

Dataset Name

oaebu

Table Name

book

Average Runtime

~1 min

Average Download Size

5-50 MB

Table Type

Sharded

Create intermediate tables

For each data partner's tables containing ISBN, create new matched tables which extend the original data with new work_id and work_family_id columns.

The schemas for these tables are identical to the raw Telescope's schemas, with the addition of work_ids and work_family_ids.

Dataset Name

oaebu_intermediate

Table Names

{data_source}_matched

Average Runtime

1-5 min

Average Download Size

0-1 MB

Table Type

Sharded

Create Book Product table

The ONIX workflow takes the metrics fetched through various telescopes, then aggregates and joins them to the book records in the publisher's ONIX feed.

Dataset Name

oaebu

Table Names

book_product

Average Runtime

1 min

Average Download Size

100-2000 MB

Table Type

Sharded

Create Export tables

Step three of the ONIX workflow is to export the book_product table to a sequence of flattened data export tables. The data in these tables is not materially different to the book product table, just organised in a way better suited for dashboards in Looker Studio.

Since these are date-sharded tables, their names will be updated each time the workflow is run. When using Google's Looker (previously Data Studio), it is preferable for us to use a static naming scheme. For this reason, after creating the (sharded) export and quality analysis tables, we also create/update a view for table. These views have a static name. By referencing the view, we can keep the Looker dashboards up-to-date without manual intervention.

Dataset Name

data_export

Table Names*

metrics, metrics_author , metrics_city , metrics_country , metrics_institution , metrics_subject_bic , metrics_subject_bisac , metrics_subject_thema

Table Names

institution_list, unmatched_book_metrics

Average Runtime

~1 min

Average Download Size

1 MB

Table Type

Sharded

*Table names prefixed with oaebu_{publisher}_book_

This table contains metrics, organised by month, that are linked to each book. The country, city, institution and referrals expand on this to provided further useful breakdowns of metrics.

This table contains metrics, organised by month and author, that are linked to each author.

This table contains metrics, organised by month and city of measured usage, that are linked to each book.

This table contains metrics, organised by month and country of measured usage, that are linked to each book.

This table contains metrics, organised by month and institution for which there is measured activity linked to each book.

This table contains metrics, organised by month and BIC subject type, that are linked to each book.

This table contains metrics, organised by month and BISAC subject type, that are linked to each book.

This table contains metrics, organised by month and Thema subject type, that are linked to each book.

This table is a list of each unique Institution where metrics are linked too. It is primarily used for drop-down fields, or where a list of all the institutions independent of metrics is desired.

Create Latest Views

Because the export tables are all sharded by date, once the workflow has run the export table names will be updated. This is an issue for Looker Studio, which looks for a specific table name to pull data from. For this reason, the final step of the workflow is to create/update a set of views for both the export tables and their QA counterparts. The first run will create the views, subsequent runs will update each view to point to the appropriate (latest) table.

Dataset Name

data_export_latest, oaebu_data_qa_latest

Table Names*

Average Runtime

~1 min

Average Download Size

0 MB

Table Type

View

A list of data partners that the workflow will use to aggregate and filter the data. The corresponding for each data partner should be present. This should contain one type partner and at least one type partner.

The ONIX workflow uses the ONIX table created by an ONIX telescope (, , ) to do the following:

Crossref Metadata is required to proceed. The ISBNs for each work is obtained from the publisher's Onix table. For each of these ISBNs, the Crossref Metadata table produced by the is queried. Refer to the task

The book list table () is a collection of works and their relevant details for the relative publisher. The table accommodates a title's Crossref metadata and separate chapters.

The output is the book product table (), containing one row per unique book, with a nested month field, which groups all the metrics relating to that book for each calendar month. This table is the main output of the workflow and contains all of the aggregated and filtered data from all of a a publisher's data partners/sources.

*The table names are a copy of the tables created in the (data_export_latest dataset) and the and (oaebu_data_qa_latest dataset) tasks with the date shard removed.

🔭
ONIX Telescope
Thoth
OAPEN Metadata
Academic Observatory workflows
Crossref Metadata
Book Metrics Institution
Book Product Subjects BIC
Book Product Subjects BISAC
Book Product Subjects Thema
Create Export Tables
Create QA ISBN
Create QA Aggregate
❌
✅
sensors
ONIX Workflow Overview
ONIX
data
Work ID ISBN
Work Family ID ISBN
Work ID ISBN Errors
schema
schema
Book Metrics
Book Metrics Author
Book Metrics City
Book Metrics Country
Institution List