📚
Book Analytics Service
  • 📚Dashboard overview
    • Book Analytics Service
    • How the Dashboard works
    • Dashboard data sources
    • How to use your Dashboard
    • More information and contact us
      • Glossary
      • License
      • Contributing Guide
  • 🖱️Installing BAD Workflows
  • 🔭Workflows & Telescopes
    • Workflow Schedule
    • Data Telescopes
      • Google Analytics Universal
      • Google Books
      • IRUS Fulcrum
      • IRUS OAPEN
      • JSTOR
      • UCL Discovery
      • UCL Sales
    • Metadata Telescopes
      • OAPEN Metadata
      • ONIX
      • Thoth
    • ONIX Workflow
      • Data Partners
      • Schemas
      • Crossref Metadata
Powered by GitBook
On this page
  • The Google Sheet
  • Access
  • Telescope kwargs
  • Sheet ID (sheet_id)
  • Telescope Tasks
  • Data Download
  • Data Transform
  • BigQuery Load
  1. Workflows & Telescopes
  2. Data Telescopes

UCL Sales

UCL Sales is the sales data for all books from University College London. This telescope does not make up part of a greater workflow such as the ONIX Workflow. UCL's data is used directly by its own dashboard.

Dataset Name

ucl

Table Name

ucl_sales

Table Type

Partitioned

Average Runtime

5-10 min

Average Download Size

~1 MB

Harvest Type

Google Sheet

Run Schedule

Monthly on the 4th

Catch-up Missed Runs

Each Run Includes All Data

The Google Sheet

UCL Sales is UCL's private record of all of their book sales. As such, there is no reason for an API to exist for it. UCL instead keeps the project up to date with their sales data by adding a new tab containing data to a Google Sheet each month. We pull the following fields from the file and append the sheet_month and release_date fields.

Unlike UCL Discoverydata, UCL Sales data stores the ISBN13 instead of the eprint ID, which makes our job much easier.

Access

Access to the Google Sheet can be granted using the sheet user interface (click on Share at the top right of the page). The telescope will access the Google Sheet via a service account, which will need to be given read access (Viewer) by supplying the service account's email address.

Telescope kwargs

Sheet ID (sheet_id)

The ID of the Google Sheet. The ID can be found in its URI, which will have the form of https://docs.google.com/spreadsheets/d/[SHEET_ID].

Telescope Tasks

Data Download

For the given release month, checks that the relevant sheet (i.e. the tab on the Google Sheet) exists. Downloads the data from the Google Sheet and writes it to a .csvfile. This file is uploaded to a Google Cloud Storage bucket

Data Transform

Some data cleaning is necessary as the data in the Google Sheet is copy-pasted there manually. Some months have slightly different formats for headings or duplicated headings, empty rows etc. These are all corrected in the Data Transform stage, and the bad data is dropped. We then pull all of the relevant fields and upload this as a .jsonl file to cloud storage.

BigQuery Load

The transformed data is loaded from the Google Cloud bucket into a partitioned BigQuery table. The table is in the ucl dataset (which will be created should it not exist yet). Since the data is partitioned on the release month, there will only be a single table named ucl_sales.

Name
Type
Mode
Description

ISBN13

STRING

NULLABLE

ISBN13 of the book.

Title

STRING

NULLABLE

Title of the book.

Country

STRING

NULLABLE

Country of sale

Quantity

INTEGER

NULLABLE

Number of items sold

Sale_Type

STRING

NULLABLE

Sale type - free/paid/return

Year

INTEGER

NULLABLE

Year of sale

Month

INTEGER

NULLABLE

Month of sale

release_date

DATE

REQUIRED

Last day of the release month. Table is partitioned on this column.

sheet_month

STRING

REQUIRED

The sheet date from which this record is attributed to

PreviousUCL DiscoveryNextMetadata Telescopes

Last updated 1 month ago

🔭
✅
❌