-
Notifications
You must be signed in to change notification settings - Fork 56
Ingesting data from a authenticated REST API using Job Secrets
In this tutorial you are going to learn how to use Secrets in a data job.
You like to read news daily and are a huge Taylor Swift fan. Let's combine these passions into a single data job, which searches for Taylor Swift news and stores them in a database.
For source of our data, you are going to use the free, key protected API of newsapi.org.
Users who wants to learn how to use Secrets in a data job. Before starting with this tutorial you should be familiar with basic concepts, explained in Hello World Data Job and Ingesting data from REST API into Database.
If you have all the prerequisites in place, the completion of this tutorial should take 10 to 15 minutes.
Since Job Secrets are stored securely, you'll need a pre-configured installation of the VDK Control Service and Hashicorp Vault:
- A VDK Control Service installation Install VDK Control Service with custom SDK and a local VDK SDK installation configured to use it
- A Configured VDK Control Service/Hashicorp vault integration Configuring Hashicorp Vault Instance for storing Secrets
In the first part of tutorial we are going to obtain the API key and store it as a Job Secret.
NOTE: in this tutorial you can use a pre-existing job, or create a new one by following the commands below:
Create a data job
Create a data job, by executing the following command:
vdk create -n taylor-swift-news -t my-team
This will create a taylor-swift-news directory with some sample data jobs file inside. Delete the files so that only the empty directory remains.
Go to newsapi.org and click the "Get API Key" button. Fill in the form and copy the API Key.
You can use the "vdk secrets" command to store and retrieve secrets via the command line. If you are using the vdk cli on a private/secure console, you can directly set a secret via the following command
vdk secrets -n taylor-swift-news -t my-team --set "api_key" "<your API Key goes here>"
Alternatively you can use the "--set-prompt" option and then you'll get prompted to enter it and it won't be kept in your console's history.
vdk secrets -n taylor-swift-news -t my-team --set-prompt "api_key"
Now, let's create a data job step which uses the API key to retrieve the news you are interested in.
Create a new python file, named 10_get_data.py in the data job directory. You should have the following file structure.
taylor-swift-news/
├── 10_get_data.py
Now that you've created the python file you need, let's fill in the code. This python data job does the following:
- Get the API key from the job secrets
- Prepare and execute the request for the newsapi.com
- Send the received data to the data base
10_get_data.py
import requests
from datetime import date, timedelta
from vdk.api.job_input import IJobInput
def run(job_input: IJobInput):
# Get the API Key from the Job Secrets
api_key = job_input.get_secret('api_key')
# Get yesterday's date
yesterday_date = date.today() - timedelta(days=1)
# Get the data
url = "https://newsapi.org/v2/everything"
params = {
"q": "Taylor Swift",
"from": yesterday_date.strftime("%Y-%m-%d"),
"sortBy": "popularity",
"language": "en",
"apiKey": api_key,
}
response = requests.get(url, params=params)
response.raise_for_status()
data = response.json()
# Send the data to the DB
payload = {'articles': data['articles']}
job_input.send_object_for_ingestion(
payload=payload,
destination_table="taylor_swift_news"
)
Congratulations! You've completed this tutorial and learned how to set and use secrets in a data job.
SDK - Develop Data Jobs
SDK Key Concepts
Control Service - Deploy Data Jobs
Control Service Key Concepts
- Scheduling a Data Job for automatic execution
- Deployment
- Execution
- Production
- Properties and Secrets
Operations UI
Community
Contacts