UCL Sales
UCL Sales is the sales data for all UCL books. This telescope does not make up part of a greater workflow such as the ONIX Workflow. Its data is used directly by its own dashboard.
Dataset Name
ucl
Table Name
ucl_sales
Table Type
Partitioned
Average Runtime
5-10 min
Average Download Size
~1 MB
Harvest Type
Google Sheet
Run Schedule
Monthly on the 4th
Catch-up Missed Runs
Each Run Includes All Data
The Google Sheet
UCL Sales is UCL's private record of all of their sales. As such, there is no reason for an API to exist for it. UCL instead keeps us up to date with their sales data by adding a new tab to a Google Sheet each month. We pull the following fields from the file and append the sheet_month and release_date fields.
Unlike UCL Discovery, UCL Sales stores the ISBN13 instead of the eprint ID, which makes our job much easier.
Access
Access to the sheet can be granted using the sheet UI (Share at the top right of the page). The telescope will access the sheet via a service account, which will need to be given read access (Viewer) by supplying the account's email address.
Telescope kwargs
Sheet ID (sheet_id)
The ID of the google sheet. The ID can be found in its URI, which will have the form of https://docs.google.com/spreadsheets/d/[SHEET_ID]
.
Telescope Tasks
Data Download
For the given release month, checks that the relevant sheet (tab on the Google Sheet) exists. Downloads the data from the Google Sheet and writes it to a .csv
file. This file is uploaded to a Google Cloud Storage.
Data Transform
Some data cleaning is necessary as the data in the Google Sheet is copy-pasted there manually. Some months have slightly different formats for headings or duplicated headings, empty rows etc. These are all corrected and the bad data is dropped. We then pull all of the relevant fields and upload this as a jsonl file to cloud storage.
BigQuery Load
The transformed data is loaded from the Google Cloud bucket into a partitioned BigQuery table. The table is in the ucl dataset (which will be created should it not exist yet). Since the data is partitioned on the release month, there will only be a single table named ucl_sales.
ISBN13
STRING
NULLABLE
ISBN13 of the book.
Title
STRING
NULLABLE
Title of the book.
Country
STRING
NULLABLE
Country of sale
Quantity
INTEGER
NULLABLE
Number of items sold
Sale_Type
STRING
NULLABLE
Sale type - free/paid/return
Year
INTEGER
NULLABLE
Year of sale
Month
INTEGER
NULLABLE
Month of sale
release_date
DATE
REQUIRED
Last day of the release month. Table is partitioned on this column.
sheet_month
STRING
REQUIRED
The sheet date from which this record is attributed to
Last updated