📚
Book Analytics Service
  • 📚Dashboard overview
    • Book Analytics Service
    • How the Dashboard works
    • Dashboard data sources
    • How to use your Dashboard
    • More information and contact us
      • Glossary
      • License
      • Contributing Guide
  • 🖱️Installing BAD Workflows
  • đź”­Workflows & Telescopes
    • Workflow Schedule
    • Data Telescopes
      • Google Analytics Universal
      • Google Books
      • IRUS Fulcrum
      • IRUS OAPEN
      • JSTOR
      • UCL Discovery
      • UCL Sales
    • Metadata Telescopes
      • OAPEN Metadata
      • ONIX
      • Thoth
    • ONIX Workflow
      • Data Partners
      • Schemas
      • Crossref Metadata
Powered by GitBook
On this page
  • How does the Dashboard collect and process book usage data?
  • Is the Dashboard data COUNTER-conformant?
  • How do we deal with bot activity?
  • How is the Dashboard data protected?
  1. Dashboard overview

How the Dashboard works

PreviousBook Analytics ServiceNextDashboard data sources

Last updated 1 month ago

How does the Dashboard collect and process book usage data?

At the heart of the Dashboard’s technology stack is a valid ONIX feed which includes the metadata of a book 'work' that a partner wishes it to be represented in the dashboard. We use ONIX because this is the book industry’s standard metadata interchange format that publishers use to share information about the books that they have published.

Our workflows collect book usage data from multiple sources (learn more about our ) and public bibliographic metadata from Crossref. Data from these sources is integrated with the ONIX feed, using the ISBN-13 identifier to identify works and combine usage data from multiple sources. The partner’s ONIX feed serves as the source of truth for a work’s metadata, such as book title, authors, and related works. Crossref bibliographic metadata can be used to match DOIs with book ISBNs, which are then matched with the ISBNs in the partner’s ONIX feed.

Our workflows refer to the code which controls this data integration; all of which is built on an open-source workflow system. These workflows first fetch, process, disambiguate, and analyse data about books from multiple sources, and save this data to Google Cloud’s BigQuery data warehouse. The data processing by our workflows include the following steps:

  1. Ingesting data via individual data workflows (called 'telescopes') from Crossref metadata, Google Analytics, Google Books, JSTOR, IRUS Fulcrum, IRUS OAPEN, a publisher’s ONIX feed (obtained via SFTP, or from the OAPEN Library, or from Thoth), UCL Discovery, and other data sources as required by the partner.

  2. A series of analytic workflows to process and combine the data ingested by the telescope workflows.

  3. The processed data in the Google Cloud BigQuery data warehouse is then visualised in dashboards provided by Looker Studio, a dashboarding solution offered by Google.

The information from our data sources is refreshed on a regular basis to obtain new data, and keep the Dashboards up-to-date. Updated usage data for all sources is available on the dashboard typically on the first Monday after the fourth of the month.

Is the Dashboard data COUNTER-conformant?

Please see our overview which gives you a detailed description of each data source.

How do we deal with bot activity?

Bot identification is the responsibility of the platforms themselves, as they have access to the individual usage data, which we do not. Platforms that are using COUNTER-conformant standards (such as IRUS OAPEN usage statistics) should only include genuine, user-driven usage, as activity generated by internet robots and crawlers must be excluded from all COUNTER usage reports.

How is the Dashboard data protected?

We receive usage data from platform providers in an aggregated and anonymised format: individual usage data has been stripped out so that the data we receive is an aggregation and can’t be traced back to individuals. In the event that platform usage reports contain location information such as individual IP address, this information is anonymised by our workflows before it is provided to the Dashboard. For example, IRUS OAPEN usage reports do contain IP addresses, therefore this data is downloaded and anonymised within an OAPEN Google Cloud project located in Europe. The transformed data, with IP addresses removed and replaced with city or country information, is then sent to the Dashboard.

Since we do not collect any personally identifiable information, GDPR does not apply to our data.

Each partner’s data is kept in a separate Google Cloud project (located in the USA). Access to each partner's Google Cloud project is controlled with user access permissions (username and password credentials), providing strong security and privacy. Only Dashboard staff have access to this partner data.

Provided data is used only for the purposes of the Dashboards: it is not sold on or made available to any other parties for any reason.

📚
data sources
Dashboard data sources