# UCL Sales

UCL Sales is the sales data for all books from University College London. This telescope does not make up part of a greater workflow such as the ONIX Workflow. UCL's data is used directly by its own dashboard.

| Dataset Name               | ucl                  |
| -------------------------- | -------------------- |
| Table Name                 | ucl\_sales           |
| Table Type                 | Partitioned          |
| Average Runtime            | 5-10 min             |
| Average Download Size      | \~1 MB               |
| Harvest Type               | Google Sheet         |
| Run Schedule               | Monthly on the 4th   |
| Catch-up Missed Runs       | :white\_check\_mark: |
| Each Run Includes All Data | :x:                  |

## The Google Sheet

UCL Sales is UCL's private record of all of their book sales. As such, there is no reason for an API to exist for it. UCL instead keeps the project up to date with their sales data by adding a new tab containing data to a Google Sheet each month. We pull the following fields from the file and append the sheet\_month and release\_date fields.

Unlike [ucl-discovery](https://the-academic-observatory.gitbook.io/bas/workflows-and-telescopes/data-telescopes/ucl-discovery "mention")data, UCL Sales data stores the ISBN13 instead of the eprint ID, which makes our job much easier.

### Access

Access to the Google Sheet can be granted using the sheet user interface (click on *Share* at the top right of the page). The telescope will access the Google Sheet via a service account, which will need to be given read access (*Viewer*) by supplying the service account's email address.

## Telescope kwargs

### Sheet ID (sheet\_id)

The ID of the Google Sheet. The ID can be found in its URI, which will have the form of `https://docs.google.com/spreadsheets/d/[SHEET_ID]`.

## Telescope Tasks

### Data Download

For the given release month, checks that the relevant sheet (i.e. the tab on the Google Sheet) exists. Downloads the data from the Google Sheet and writes it to a `.csv`file.  This file is uploaded to a Google Cloud Storage bucket

### Data Transform

Some data cleaning is necessary as the data in the Google Sheet is copy-pasted there manually. Some months have slightly different formats for headings or duplicated headings, empty rows etc. These are all corrected in the Data Transform stage, and the bad data is dropped. We then pull all of the relevant fields and upload this as a .jsonl file to cloud storage.

### BigQuery Load

The transformed data is loaded from the Google Cloud bucket into a partitioned BigQuery table. The table is in the *ucl* dataset (which will be created should it not exist yet). Since the data is partitioned on the release month, there will only be a single table named *ucl\_sales*.

<table><thead><tr><th width="147.96875">Name</th><th width="119.64453125">Type</th><th width="115.89453125">Mode</th><th>Description</th></tr></thead><tbody><tr><td>ISBN13</td><td>STRING</td><td>NULLABLE</td><td>ISBN13 of the book.</td></tr><tr><td>Title</td><td>STRING</td><td>NULLABLE</td><td>Title of the book.</td></tr><tr><td>Country</td><td>STRING</td><td>NULLABLE</td><td>Country of sale</td></tr><tr><td>Quantity</td><td>INTEGER</td><td>NULLABLE</td><td>Number of items sold</td></tr><tr><td>Sale_Type</td><td>STRING</td><td>NULLABLE</td><td>Sale type - free/paid/return</td></tr><tr><td>Year</td><td>INTEGER</td><td>NULLABLE</td><td>Year of sale</td></tr><tr><td>Month</td><td>INTEGER</td><td>NULLABLE</td><td>Month of sale</td></tr><tr><td>release_date</td><td>DATE</td><td>REQUIRED</td><td>Last day of the release month. Table is partitioned on this column.</td></tr><tr><td>sheet_month</td><td>STRING</td><td>REQUIRED</td><td>The sheet date from which this record is attributed to</td></tr></tbody></table>
