Google Books
Documentation for the Google Books telescope
The Google Books Partner program enables selling books through the Google Play store and offering a preview on Google books. The program makes books discoverable to Google users around the world on Google books. When readers find a book on Google Books, they can preview a limited number of pages to decide if they're interested in it. Readers can also follow links to buy the book or borrow or download it when applicable.
As a publisher Google Books download reports are available at https://play.google.com/books/publish/
Currently there are 3 report types available:
Google Play sales summary report
Google Play sales transaction report
Google Books Traffic Report
The telescope collects data from the last 2 reports.
Dataset Name
Table Names
google_books_sales, google_books_traffic
Table Type
Partitioned
Average Runtime
10 min
Average Download Size
1-100 MB
Harvest Type
SFTP
Run Schedule
Daily
Catch-up Missed Runs
Each Run Includes All Data
Airflow connections
sftp_service
The username, password and host name used to connect to the SFTP server
Authentication
The reports are downloaded from https://play.google.com/books/publish/
. To get access to the reports the publisher needs to give access to a google service account. This service account can then be used to login on this web page and download each report manually.
Downloading Reports Manually
There is no API available to download the Google Books report and it is quite challenging to automate the Google login process through tools such as Selenium, because of Google's bot detection triggering a reCAPTCHA. Until this step can be automated, the reports need to be downloaded manually each month. For each publisher and for both the sales transaction report and the traffic report:
A report should be created for exactly 1 month (e.g. starting 2021-01-01 and ending 2021-01-31).
All titles should be selected.
All countries should be selected.
The traffic report is organised by 'Book'.
It is important to save the file with the right name, this should be in the following format (<file_suffix> is optional):
GoogleSalesTransactionReport_<file_suffix>YYYY_MM.csv
orGoogleBooksTrafficReport_<file_suffix>YYYY_MM.csv
Upload each report to the SFTP server.
Add it to the folder
/google_books_<publisher>/upload
Files are automatically moved between folders; do not move files between folders manually
Telescope Tasks
Data Download & Transform
The download step connects to the SFTP server. The telescope looks in the relevant publisher's upload folder for the file format specified above. Any telescope DAG run will harvest all instances of the matching files (regardless of the date associated). Before downloading, the files on the SFTP server are moved to the in_progress folder.
Once downloaded, each report is transformed. The transform process re-formats headings and dates such that they are consistent. It also performs an integrity check on the reported dates. None of the raw data is modified in any way. The partition date (the report's associated month) is appended to each row at the end of the transform step.
Big Query Load
The transformed data is loaded from the Google Cloud bucket. There are two resulting datasets from each telescope run, both of which will be loaded into their own partitioned BigQuery table under the google dataset (which will be created should it not exist yet). Then, the google_books_sales and google_books_traffic table partitions are loaded. Since the data is partitioned on the release month, there will only be a single table for each of these report types.
Table Schema - Google Books Sales
Transaction_Date
DATE
REQUIRED
The date of the transaction.
Id
STRING
REQUIRED
A unique identifier for this transaction.
Product
STRING
NULLABLE
In UCL Press case "Single Purchase" (a normal sale). Can also be "Rental".
Type
STRING
NULLABLE
Type of transaction (can be 'sale' or 'refund').
Preorder
STRING
NULLABLE
Whether this transaction applied to a preorder. In UCL Press case 'None': The transaction didn't involve a preorder.
Qty
INTEGER
NULLABLE
The number of units in the transaction. Negative for refunds.
Primary_ISBN
STRING
NULLABLE
The primary ISBN or other identifier the book, prefixed by a single quotation mark so spreadsheet programs will display the entire ISBN.
Imprint_Name
STRING
REQUIRED
The template used for the book.
Title
STRING
REQUIRED
The title of the book.
Author
STRING
NULLABLE
The author of the book.
Original_List_Price_Currency
STRING
NULLABLE
The original currency of the book's list price.
Original_List_Price
FLOAT
NULLABLE
The original list price of the book.
List_Price_Currency
STRING
NULLABLE
The currency of the book's list price. If currency conversion was enabled, this is the currency of purchase as seen by the buyer.
List_Price_tax_inclusive_
FLOAT
NULLABLE
The book's list prince including tax.
List_Price_tax_exclusive_
FLOAT
NULLABLE
The book's list price excluding tax.
Country_of_Sale
STRING
NULLABLE
The country where the buyer bought the book.
Publisher_Revenue_Perc
FLOAT
NULLABLE
The publisher's percentage of the list price.
Publisher_Revenue
FLOAT
NULLABLE
The amount of revenue earned by the publisher. This will be negative if the transaction was a refund. Negative for refunds. The currency is the same as the payment currency.
Payment_Currency
STRING
NULLABLE
The currency of the publisher's earnings.
Payment_Amount
FLOAT
NULLABLE
The amount earned by the publisher for this transaction. Negative for refunds.
Currency_Conversion_Rate
FLOAT
NULLABLE
If the list price and payment amount are in different currencies, the rate of exchange between the two currencies.
Line_of_Business
STRING
NULLABLE
This field is not present for some publishers (UCL Press). For ANU Press the field value is “E-Book”.
release_date
DATE
REQUIRED
Last day of the release month. Table is partitioned on this column.
Table Schema - Google Books Traffic
Primary_ISBN
STRING
NULLABLE
The primary identifier (e.g., ISBN) of the book. This column appears in the report if data is organized by book.
Title
STRING
REQUIRED
The title of the book.
Book_Visits_BV_
INTEGER
NULLABLE
A Book Visit is registered each time a unique user views one of your books on Google Books. This number includes informational page views (such as the “About this book” page) as well as preview content page views.
BV_with_Pages_Viewed
INTEGER
NULLABLE
The number of Book Visits in which users accessed preview pages of your book. This doesn’t include visits where a user accessed only informational pages for your books.
Non_Unique_Buy_Clicks
INTEGER
NULLABLE
The number of clicks on links for purchasing the book on retailer websites (including your website, if you provided a buy link).
BV_with_Buy_Clicks
INTEGER
NULLABLE
The number of visits which included a click on a purchase link.
Buy_Link_CTR
FLOAT
NULLABLE
The clickthrough rate for purchase links. The values are percentages.
Pages_Viewed
INTEGER
NULLABLE
The total number of unique preview content pages that a user viewed in a given session (counted as a 24-hour period). If a user views the same page of your book twice during a session, only a single page view is registered.
release_date
DATE
REQUIRED
Last day of the release month. Table is partitioned on this column.
Last updated