How the Dashboard works
How does the Dashboard collect and process book usage data?
At the heart of the Dashboard’s technology stack is a valid ONIX feed which includes the metadata of the works a partner wishes to represent in the dashboard. We use ONIX because this is the book industry’s standard metadata interchange format that publishers use to share information about the books that they have published.
Our workflows collect book usage data from multiple sources (learn more about our data sources) and public bibliographic metadata from Crossref. Data from these sources is integrated with the ONIX feed, using the ISBN-13 identifier to identify works and combine usage data from multiple sources. The partner’s ONIX feed serves as the source of truth for a work’s metadata, such as book title, authors, and related works. Crossref bibliographic metadata can be used to match DOIs with book ISBNs, which are then matched with the ISBNs in the partner’s ONIX feed.
Our workflows are the code which controls the data integration; all of which is built on an open-source workflow system. The workflows fetch, process, disambiguate, and analyse data about books from multiple sources, and this data is saved to Google Cloud’s BigQuery data warehouse. The next steps of data processing include:
Ingesting data via telescope workflows from Crossref metadata, Google Analytics, Google Books, JSTOR, IRUS Fulcrum, IRUS OAPEN, a publisher’s ONIX feed (obtained via SFTP, or from the OAPEN Library, or from Thoth), UCL Discovery, and
A series of analytic workflows to process and combine the data ingested by the telescope workflows. The processed data in the Google Cloud BigQuery data warehouse is then visualised in dashboards provided by Looker Studio, a dashboarding solution offered by Google.
The information from our data sources is refreshed on a regular basis, keeping the Dashboard up-to-date. Updated usage data for all sources is available on the dashboard typically on the first Monday after the fourth of the month.
Is the Dashboard data COUNTER-conformant?
Please see our Dashboard data sources overview which gives you a detailed overview of each source.
How do we deal with bot activity?
Bot identification is the responsibility of the platforms themselves, as they have access to the individual usage data, which we do not. Platforms that are using COUNTER-conformant standards (such as IRUS OAPEN usage statistics) should only include genuine, user-driven usage, as activity generated by internet robots and crawlers must be excluded from all COUNTER usage reports.
How is the Dashboard data protected?
We receive usage data from platform providers in aggregated and anonymised format: individual usage data is stripped out so that the data we receive is an aggregation and can’t be traced back to individuals. In the event that platform usage reports contain location information such as individual IP address, this information is anonymised before it is provided to the Dashboard. For example, IRUS OAPEN usage reports do contain IP addresses, therefore this data is downloaded and anonymised within an OAPEN Google Cloud project located in Europe. The transformed data, with IP addresses removed and replaced with city or country information, is then sent to the Dashboard.
Since we do not collect any personally identifiable information, GDPR does not apply to our data.
Each partner’s data is kept in a separate Google Cloud project (located in the USA). Access to each is controlled with user access permissions (username and password credentials), providing strong security and privacy. Only Dashboard staff have access to this partner data.
Data provided is used only for the purposes of the Dashboards: it is not sold on or made available to any other parties for any reason.
Last updated