Google Books

Documentation for the Google Books telescope

The Google Books Partner program enables selling books through the Google Play store and offering a preview on Google books. The program makes books discoverable to Google users around the world on Google books. When readers find a book on Google Books, they can preview a limited number of pages to decide if they're interested in it. Readers can also follow links to buy the book or borrow or download it when applicable.

As a publisher Google Books download reports are available at https://play.google.com/books/publish/

Currently there are 3 report types available:

  • Google Play sales summary report

  • Google Play sales transaction report

  • Google Books Traffic Report

The telescope collects data from the last 2 reports.

Dataset Name

google

Table Names

google_books_sales, google_books_traffic

Table Type

Partitioned

Average Runtime

10 min

Average Download Size

1-100 MB

Harvest Type

SFTP

Run Schedule

Weekly

Catch-up Missed Runs

Each Run Includes All Data

Airflow connections

Name Description

sftp_service

The username, password and host name used to connect to the SFTP server

Authentication

The reports are downloaded from https://play.google.com/books/publish/. To get access to the reports the publisher needs to give access to a google service account. This service account can then be used to login on this web page and download each report manually.

Downloading Reports Manually

There is no API available to download the Google Books report and it is quite challenging to automate the Google login process through tools such as Selenium, because of Google's bot detection triggering a reCAPTCHA. Until this step can be automated, the reports need to be downloaded manually each month. For each publisher and for both the sales transaction report and the traffic report:

  • A report should be created for exactly 1 month (e.g. starting 2021-01-01 and ending 2021-01-31).

  • All titles should be selected.

  • All countries should be selected.

  • The traffic report is organised by 'Book'.

  • It is important to save the file with the right name, this should be in the following format (<file_suffix> is optional):

    • GoogleSalesTransactionReport_<file_suffix>YYYY_MM.csv or

    • GoogleBooksTrafficReport_<file_suffix>YYYY_MM.csv

  • Upload each report to the SFTP server.

    • Add it to the folder /google_books_<publisher>/upload

    • Files are automatically moved between folders; do not move files between folders manually

Telescope Tasks

Data Download & Transform

The download step connects to the SFTP server. The telescope looks in the relevant publisher's upload folder for the file format specified above. Any telescope DAG run will harvest all instances of the matching files (regardless of the date associated). Before downloading, the files on the SFTP server are moved to the in_progress folder.

Once downloaded, each report is transformed. The transform process re-formats headings and dates such that they are consistent. It also performs an integrity check on the reported dates. None of the raw data is modified in any way. The partition date (the report's associated month) is appended to each row at the end of the transform step.

Big Query Load

The transformed data is loaded from the Google Cloud bucket. There are two resulting datasets from each telescope run, both of which will be loaded into their own partitioned BigQuery table under the google dataset (which will be created should it not exist yet). Then, the google_books_sales and google_books_traffic table partitions are loaded. Since the data is partitioned on the release month, there will only be a single table for each of these report types.

Table Schema - Google Books Sales

nametypemodedescription

Transaction_Date

DATE

REQUIRED

The date of the transaction.

Id

STRING

REQUIRED

A unique identifier for this transaction.

Product

STRING

NULLABLE

In UCL Press case "Single Purchase" (a normal sale). Can also be "Rental".

Type

STRING

NULLABLE

Type of transaction (can be 'sale' or 'refund').

Preorder

STRING

NULLABLE

Whether this transaction applied to a preorder. In UCL Press case 'None': The transaction didn't involve a preorder.

Qty

INTEGER

NULLABLE

The number of units in the transaction. Negative for refunds.

Primary_ISBN

STRING

NULLABLE

The primary ISBN or other identifier the book, prefixed by a single quotation mark so spreadsheet programs will display the entire ISBN.

Imprint_Name

STRING

REQUIRED

The template used for the book.

Title

STRING

REQUIRED

The title of the book.

Author

STRING

NULLABLE

The author of the book.

Original_List_Price_Currency

STRING

NULLABLE

The original currency of the book's list price.

Original_List_Price

FLOAT

NULLABLE

The original list price of the book.

List_Price_Currency

STRING

NULLABLE

The currency of the book's list price. If currency conversion was enabled, this is the currency of purchase as seen by the buyer.

List_Price_tax_inclusive_

FLOAT

NULLABLE

The book's list prince including tax.

List_Price_tax_exclusive_

FLOAT

NULLABLE

The book's list price excluding tax.

Country_of_Sale

STRING

NULLABLE

The country where the buyer bought the book.

Publisher_Revenue_Perc

FLOAT

NULLABLE

The publisher's percentage of the list price.

Publisher_Revenue

FLOAT

NULLABLE

The amount of revenue earned by the publisher. This will be negative if the transaction was a refund. Negative for refunds. The currency is the same as the payment currency.

Payment_Currency

STRING

NULLABLE

The currency of the publisher's earnings.

Payment_Amount

FLOAT

NULLABLE

The amount earned by the publisher for this transaction. Negative for refunds.

Currency_Conversion_Rate

FLOAT

NULLABLE

If the list price and payment amount are in different currencies, the rate of exchange between the two currencies.

Line_of_Business

STRING

NULLABLE

This field is not present for some publishers (UCL Press). For ANU Press the field value is “E-Book”.

release_date

DATE

REQUIRED

Last day of the release month. Table is partitioned on this column.

Table Schema - Google Books Traffic

nametypemodedescription

Primary_ISBN

STRING

NULLABLE

The primary identifier (e.g., ISBN) of the book. This column appears in the report if data is organized by book.

Title

STRING

REQUIRED

The title of the book.

Book_Visits_BV_

INTEGER

NULLABLE

A Book Visit is registered each time a unique user views one of your books on Google Books. This number includes informational page views (such as the “About this book” page) as well as preview content page views.

BV_with_Pages_Viewed

INTEGER

NULLABLE

The number of Book Visits in which users accessed preview pages of your book. This doesn’t include visits where a user accessed only informational pages for your books.

Non_Unique_Buy_Clicks

INTEGER

NULLABLE

The number of clicks on links for purchasing the book on retailer websites (including your website, if you provided a buy link).

BV_with_Buy_Clicks

INTEGER

NULLABLE

The number of visits which included a click on a purchase link.

Buy_Link_CTR

FLOAT

NULLABLE

The clickthrough rate for purchase links. The values are percentages.

Pages_Viewed

INTEGER

NULLABLE

The total number of unique preview content pages that a user viewed in a given session (counted as a 24-hour period). If a user views the same page of your book twice during a session, only a single page view is registered.

release_date

DATE

REQUIRED

Last day of the release month. Table is partitioned on this column.

Last updated