Skip to content

Latest commit

 

History

History
71 lines (54 loc) · 6.86 KB

README.md

File metadata and controls

71 lines (54 loc) · 6.86 KB

DeepLynx-Airflow-Provider

This custom Airflow provider package allows you to create Airflow tasks that interact with the DeepLynx data warehouse. Utilizing operators specifically designed for DeepLynx, this package enables seamless integration and communication between Airflow and DeepLynx, facilitating data management and processing workflows.

Installation

Install from PyPI

To install the provider package from PyPI, simply run:

pip install airflow-provider-deeplynx

Install locally

  • Clone the repository to your local machine.
  • Navigate to the cloned repository directory: cd airflow-provider-deeplynx
  • Install the package using pip: pip install .
    • for development, you can install in editable mode with pip install -e .

Environment Variables

This package uses some environment variables for configuration. The environment variables should be set and available where your Airflow instance is running.

  • SSL_CERT_FILE: This should be the Airflow accessible path to the file containing the INL SSL certificate authority. This may be needed depending on your DeepLynx instance's setup DeepLynx Authentication Methods.
  • DEEPLYNX_DATA_TEMP_FOLDER: This is the Airflow environment path for where data is downloaded. If no value is set, this defaults to AIRFLOW_HOME/logs/data.

Usage

Typical communication with DeepLynx requires a bearer token, so the first task of a DeepLynx DAG is usually to generate a token, which can be done with GetOauthTokenOperator. This token should be passed to downstream tasks using XComs, the token generation task_id, and the key token. GetOauthTokenOperator requires either a conn_id of an Airflow Connection of type DeepLynx, or the parameters host, api_key, and api_secret. It is recommended to create a new Airflow connection of type DeepLynx through the Airflow UI, and input values for DeepLynx URL, API Key, and API Secret. You can then use this DeepLynx connection's id to set the conn_id for any airflow operators in this package (alternatively, you can supply the host parameter).

Navigate to the Connections page with Admin -> Connections. image

Most functionality can be understood by looking at the provided Example Dags. Class level documentation is also provided.

Example DAGs

Example DAGs are provided in deeplynx_provider/example_dags. Copy the full directory into your airflow DAG_FOLDER to have them loaded into your airflow environment.

A functional test DAG for the airflow-provider-deeplynx package. Users should create a DeepLynx connection in Airflow with URL, API Key, and API Secret. To run the DAG, supply the DeepLynx connection_id, optionally create a new container_name, and keep data_source_name as TC-201. This DAG will:

  • check if the supplied container_name exists and retrieve the container_id if so; if that container name does not exist, it will create a new container with the supplied name.
  • import container ontology and typemappings from Container_Export.json
  • set the data source active (named TC-201)
  • import timeseries data
  • query timeseries data using two different methods
  • upload the timeseries data result

This DAG shows how you can use the DeepLynxConfigurationOperator to create a custom configuration for DeepLynx communication. It requires that you already have a DeepLynx container and data source created, and that you input your connection_id, container_id, and data_source_id.

This DAG shows all the package supported ways that you can query for metatypes, relationships, and how to perform a graph query. This example requires users to create a graph in DeepLynx, and then to edit the DAG file itself so that the query bodies, parameters, and properties match your given graph data.

This DAG shows how you can get a DeepLynx token using GetOauthTokenOperator by directly specifying host, api_key, and api_secret (instead of using conn_id)

Class Documentation

Class documentation is available here. It was generated using pdoc and the command pdoc --output-dir=docs deeplynx_provider ran from the root of this project.

DeepLynx Config

Communication with DeepLynx using this package can be configured with various options like SSL certificate and local file writing locations. Most of the time, the default DeepLynx config will work just fine, but to learn more continue reading.

The operators in this provider package use the Deep Lynx Python SDK to communicate with DeepLynx. The DeepLynxConfigurationOperator can be used to set your Configuration exactly how you want it, and this configuration is then passed to a task instance XCom so that downstream tasks derived from DeepLynxBaseOperator can use this configuration.

DeepLynx Authentication

This package is setup to use token authentication with DeepLynx, but other authentication methods are supported by setting the DeepLynx Config.

Notes

Gotchas

  • If using this Airflow package in a Docker environment to talk to a Dockerized DeepLynx, you should likely set your Deeplynx host/url to http://host.docker.internal:8090.

Other Documentation

  • Airflow documentation on creating a custom provider here
  • airflow-provider-sample project here