This repository has been archived by the owner on Dec 2, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
RIALTO Combine Load Procedure
Michael J. Giarlo edited this page Oct 18, 2018
·
1 revision
Use these steps when writing to an empty store.
cd ~/workspace/rialto-etl
- Make sure you have the latest ETL code
- Get the SPARQL Proxy URL and API key from shared_configs. Put these values into
config/settings.local.yml
or the corresponding environment variables. - Connect to the Stanford VPN using full-tunnel mode
- Test the connection by sending a simple count query to the SPARQL Proxy
- Extract, Transform, Load - Organizations from Profiles
- Ensure you have the CAP/Profiles API key in either
config/settings.local.yml
or an environment variable. See shared_configs. - Run the organization ETL steps
- Ensure you have the CAP/Profiles API key in either
- Extract, Transform, Load - Researchers from Profiles
- Extract, Transform, Load - Grants from SeRA
- Using the
researchers.ndj
file from the researchers extract step above, run the grant ETL steps. Note that researchers without SUNet IDs will not have their grants imported.
- Using the
- Extract, Transform, Load - Publications from Web of Science
- Using the
researchers.ndj
file from the researchers extract step above, run the publications ETL steps. This process will create new co-authors, link publications to authors, create new topics, link topics to publications, and link publications to grants.
- Using the
Use these steps when loading data into a store that already has data.
- Querying the data-store for people will get people who have been historically affiliated, which will be more people than we care to update (due to time to load). We may want to re-query Profiles for "current people" or we could mark "inactive" people?