IRUS Fulcrum
Documentation for the IRUS Fulcrum telescope
Last updated
Documentation for the IRUS Fulcrum telescope
Last updated
The IRUS Fulcrum telescope collects usage statistics for titles accessed via the Fulcrum Platform. Usage data is accessible through IRUS in much the same way as the IRUS OAPEN telescope. Unlike IRUS OAPEN, IRUS Fulcrum does not record sensitive IP address information. This makes dealing with the data much simpler.
The earliest available data for the Fulcrum platform is April 2022. It follows that all data is of COUNTER 5 standard.
The following airflow connections are required:
Name | Description |
---|---|
Fields passed as keyword arguments to the telescope upon instantiation.
A list of publisher names. Usage stats from Fulcrum will be filtered on these publisher names. Many institutions have many publisher names associated with them, so it is important that all related names are provided.
The download is done via an API call to IRUS:
Where the requestor ID is the API key for the IRUS API. The telescope will use the same begin and end dates (YYYY-MM) in order to retrieve data on a per-month basis. The requestor ID is the irus_api airflow connection.
A second call to the API is made with the following appended to the above URL:
Which splits the data by country, leaving us with two datasets. These datasets will be referred to as the total and country datasets.
Before making any changes to the data, these datasets are uploaded to a Google storage bucket
The transform step has a few things to achieve:
Collate the total and country datasets into a single object
Remove columns that are not of interest to us
Add the release month to each row as a partitioning column
Remove rows from the data that do not relate to the publisher of interest
The result of points 1 -> 3 are evident in the schema. The final point requires some communication with the publisher. This is because a single publisher may have published titles under more than one name. For example, University of Michigan has 10 associated publishing names. These names are listed as part of a dictionary in the telescope.
The resulting transformed file is uploaded to a Google Cloud bucket.
The transformed data is loaded from the Google Cloud bucket into a partitioned BigQuery table in the irus dataset, which will be created if it does not yet exist. Since the data is partitioned on the release month, there will only be a single table named irus_fulcrum.
name | type | mode | description |
---|---|---|---|
proprietary_id
STRING
NULLABLE
Proprietary identifier of the book.
ISBN
STRING
NULLABLE
ISBN of the book.
book_title
STRING
NULLABLE
Title of the book
publisher
STRING
NULLABLE
The publisher
authors
STRING
NULLABLE
The names of the authors
event_month
STRING
NULLABLE
The investigated month.
total_item_investigations
INTEGER
NULLABLE
The total number of item investigations.
total_item_requests
INTEGER
NULLABLE
The total number of item requests.
unique_item_investigations
INTEGER
NULLABLE
The number of unique item investigations.
unique_item_requests
INTEGER
NULLABLE
The number of unique item requests.
country
RECORD
REPEATED
Record to store statistics on the country level.
country.name
STRING
NULLABLE
The country name of the client registered by IRUS.
country.code
STRING
NULLABLE
The country code of the client registered by IRUS.
country.total_item_investigations
INTEGER
NULLABLE
The total number of item investigations.
country.total_item_requests
INTEGER
NULLABLE
The total number of item requests.
country.unique_item_investigations
INTEGER
NULLABLE
The number of unique item investigations.
country.unique_item_requests
INTEGER
NULLABLE
The number of unique item requests.
release_date
DATE
REQUIRED
Last day of the release month. Table is partitioned on this column.
Dataset Name
irus
Table Name
irus_fulcrum
Table Type
Partitioned
Average Runtime
10 min
Average Download Size
1-10 MB
Harvest Type
API
Run Schedule
Monthly on the 4th
Catch-up Missed Runs
Each Run Includes All Data
irus_api
The IRUS requestor_id/api_key - required to access the IRUS platform