How the Dashboard works
Last updated
Last updated
At the heart of the Dashboard’s technology stack is a valid ONIX feed which includes the metadata of a book 'work' that a partner wishes it to be represented in the dashboard. We use ONIX because this is the book industry’s standard metadata interchange format that publishers use to share information about the books that they have published.
Our workflows collect book usage data from multiple sources (learn more about our ) and public bibliographic metadata from Crossref. Data from these sources is integrated with the ONIX feed, using the ISBN-13 identifier to identify works and combine usage data from multiple sources. The partner’s ONIX feed serves as the source of truth for a work’s metadata, such as book title, authors, and related works. Crossref bibliographic metadata can be used to match DOIs with book ISBNs, which are then matched with the ISBNs in the partner’s ONIX feed.
Our workflows refer to the code which controls this data integration; all of which is built on an open-source workflow system. These workflows first fetch, process, disambiguate, and analyse data about books from multiple sources, and save this data to Google Cloud’s BigQuery data warehouse. The data processing by our workflows include the following steps:
Ingesting data via individual data workflows (called 'telescopes') from Crossref metadata, Google Analytics, Google Books, JSTOR, IRUS Fulcrum, IRUS OAPEN, a publisher’s ONIX feed (obtained via SFTP, or from the OAPEN Library, or from Thoth), UCL Discovery, and other data sources as required by the partner.
A series of analytic workflows to process and combine the data ingested by the telescope workflows.
The processed data in the Google Cloud BigQuery data warehouse is then visualised in dashboards provided by Looker Studio, a dashboarding solution offered by Google.
The information from our data sources is refreshed on a regular basis to obtain new data, and keep the Dashboards up-to-date. Updated usage data for all sources is available on the dashboard typically on the first Monday after the fourth of the month.
Please see our overview which gives you a detailed description of each data source.
Bot identification is the responsibility of the platforms themselves, as they have access to the individual usage data, which we do not. Platforms that are using COUNTER-conformant standards (such as IRUS OAPEN usage statistics) should only include genuine, user-driven usage, as activity generated by internet robots and crawlers must be excluded from all COUNTER usage reports.
We receive usage data from platform providers in an aggregated and anonymised format: individual usage data has been stripped out so that the data we receive is an aggregation and can’t be traced back to individuals. In the event that platform usage reports contain location information such as individual IP address, this information is anonymised by our workflows before it is provided to the Dashboard. For example, IRUS OAPEN usage reports do contain IP addresses, therefore this data is downloaded and anonymised within an OAPEN Google Cloud project located in Europe. The transformed data, with IP addresses removed and replaced with city or country information, is then sent to the Dashboard.
Since we do not collect any personally identifiable information, GDPR does not apply to our data.
Each partner’s data is kept in a separate Google Cloud project (located in the USA). Access to each partner's Google Cloud project is controlled with user access permissions (username and password credentials), providing strong security and privacy. Only Dashboard staff have access to this partner data.
Provided data is used only for the purposes of the Dashboards: it is not sold on or made available to any other parties for any reason.