Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workday employee daily history model + monthly summary model + updates to employee surrogate key #5

Merged
merged 21 commits into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .buildkite/scripts/run_models.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ cd integration_tests
dbt deps
dbt seed --target "$db" --full-refresh
dbt run --target "$db" --full-refresh
dbt test --target "$db"
dbt run --target "$db"
dbt test --target "$db"
dbt run --vars '{employee_history_enabled: true}' --target "$db" --full-refresh
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
dbt test --target "$db"

dbt run-operation fivetran_utils.drop_schemas_automation --target "$db"
47 changes: 47 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,50 @@
# dbt_workday v0.2.0
## 🚨 Breaking Changes 🚨
- Created a surrogate key `employee_id` in `workday__employee_overview` that combines `worker_id`, `position_id`, and `position_start_date`. This accounts for edge cases like when:
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
- A worker can hold multiple positions concurrently.
- A position being held by multiple workers concurrently.
- A worker being rehired for the same position.

## 🚀 Feature Updates 🚀
- We have added three end models in the [`models/workday_history`](https://github.com/fivetran/dbt_workday/tree/main/models/workday_history) folder [thanks to support from Fivetran's history mode feature](https://fivetran.com/docs/core-concepts/sync-modes/history-mode). These models provide historical daily data looks into crucial worker/employee Workday models, as well as allowing users to assess monthly summary metrics. These end models include:

- `workday__employee_daily_history`: Each record is a daily record in an employee, starting with its first active date and updating up toward either the current date (if still active) or its last active date. This will allow customers to track the daily history of their employees from when they started.

- `workday__monthly_summary`: Each record is a month, aggregated from the last day of each month of the employee daily history. This captures monthly metrics of workers, such as average salary, churned and retained employees, etc.

- `workday_worker_position_org_daily_history`: Each record is a daily record for a worker/position/organization combination, starting with its first active date and updating up toward either the current date (if still active) or its last active date. This will allow customers to tie in organizations to employees via other organization models (such as `workday__organization_overview`) more easily in their warehouses.

- We have added staging history mode models in the [`models/staging/workday_history`](https://github.com/fivetran/dbt_workday/tree/main/models/staging/workday_history) folder. This allows customers to utilize the Fivetran history mode feature, which records every version of each record in the source table from the moment this mode is activated in the equivalent tables.

- These staging models include:

- `stg_workday__personal_information_history`: Containing historical records of a worker's personal information.
- `stg_workday__worker_history`: Containing historical records of a worker's history.
- `stg_workday__worker_position_history`: Containing historical records of a worker's position history.
- `stg_workday__worker_position_organization_history`: Containing historical records of a worker's position and organization history.

- We have then utilized the `workday__employee_daily_history` model in the [`models/workday_history`](https://github.com/fivetran/dbt_workday/tree/main/models/workday_history) folder [based off of Fivetran's history mode feature](https://fivetran.com/docs/core-concepts/sync-modes/history-mode), pulling from Workday HCM source models you can view in the [`models/staging/workday_history`](https://github.com/fivetran/dbt_workday/tree/main/models/staging/)

- We have kept the `stg_workday__worker_position_organization_history` model separate, as organizational data is too flexible in Workday to effectively join in the majority of data. We leave it to the customer to use their best judgement in joining this data into other end models in their own warehouse. [See the DECISIONLOG for more details](https://github.com/fivetran/dbt_workday/blob/main/DECISIONLOG.md).

- These models are disabled by default due to their size, so you will need to set the below variable configurations for each of the individual models you want to utilize in your `dbt_project.yml`.

```yml
vars:
employee_history_enabled: true ##Ex: employee_history_enabled: true
```

- We have also added the `workday__monthly_summary` model in the [`models/workday_history`](https://github.com/fivetran/dbt_workday/tree/main/models/workday_history) folder. This table aggregates high-level monthly metrics to track changes over time to overall employee data for a customer.

- We have chosen not to implement incremental logic in the history models due to the future-facing updating of Workday HCM transactions beyond current daily updates. [See the DECISIONLOG](https://github.com/fivetran/dbt_workday/blob/main/DECISIONLOG.md) for more details.

- We support the option to pull from both your Workday HCM and History Mode connectors simultaneously from their specific database/schemas. We also support pulling from just your History Mode connector on its own and bypassing the standard connector on its own. [See more detailed instructions in the README](https://github.com/fivetran/dbt_workday/blob/main/README.md#configuring-your-workday-history-mode-database-and-schema-variables).
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved

- Workday HCM History Mode models can contain a multitude of rows if you bring in all historical data, so we've introduced the flexibility to set first date filters to bring in only the historical data you need. [More details can be found in the README](https://github.com/fivetran/dbt_workday/blob/main/README.md#filter-your-workday-hcm-history-mode-models).

## 🚘 Under the Hood 🚘
- Created `int_workday__worker_employee_enhanced` model to simplify end model processing in the `workday__employee_overview`, which is now focused on generating the surrogate key.

# dbt_workday v0.1.1

[PR #4](https://github.com/fivetran/dbt_workday/pull/4) contains the following updates:
Expand Down
18 changes: 18 additions & 0 deletions DECISIONLOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
## On not adding incremental logic into the Workday HCM History models
Generally, when working with large volume models like the ones created by Fivetran History Mode, we tend to implement incremental models. [See Salesforce](https://github.com/fivetran/dbt_salesforce?tab=readme-ov-file#optional-step-4-utilizing-salesforce-history-mode-records) for a particular example of that implementation.

However, in the Workday HCM case, we have found that History Mode does not fit the use case for incremental logic due to the following reasons.
* Transactions can be future-dated. The most common case is an employee being hired for a future date beyond the current date, so an incremental run will pick up numerous records in the future, leading to potential duplications down the road for an employee's records.
* There are additional cases where an employee's record can be updated in the past beyond a common incremental window.

For this reason, we will recommend users utilize the `--full-refresh` method to grab records to maintain accuracy. So we recommend that you optimize your refresh strategy when using this package to reduce warehouse load and minimize costs.
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved

We welcome all attempts to optimize this strategy though, and would be open to enhancements to the package!

## Why we kept the worker position organization history model separate from the employee daily history model

The intent of the `workday__employee_daily_history` model was to combine historical data from all relevant worker history models and gather a daily look at that data based on employee and worker.

However, with `stg_workday__worker_position_organization_history`, the values for organization are too customizable, and thus impossible to just into an `employee_daily_history` model with any clear definitions.

Instead we have decided to keep the model separate in `workday__worker_position_org_history`, leaving end customers the ability to configure what organizations they end up joining into the employee daily history within their warehouses. The `int_workday__employee_history` model provides a solid guide into configuring your own custom-type history mode model.
89 changes: 80 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,20 +24,24 @@ The main focus of the package is to transform the core object tables into analyt
- Adds column-level testing where applicable. For example, all primary keys are tested for uniqueness and non-null values.
- Provides insight into your Workday HCM data across the following grains:
- Employee, job, organization, position.
- Generates a comprehensive data dictionary of your Workday HCM data through the [dbt docs site](https://fivetran.github.io/dbt_workday/).
- Gather daily historical records of employees.

This package generates a comprehensive data dictionary of your Workday HCM data through the [dbt docs site](https://fivetran.github.io/dbt_workday/).

> This package does not apply freshness tests to source data due to the variability of survey cadences.

<!--section="workday_model"-->
The following table provides a detailed list of all models materialized within this package by default.
> TIP: See more details about these models in the package's [dbt docs site](https://fivetran.github.io/dbt_workday/#!/overview/workday).

| **model** | **description** |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| [workday__employee_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__employee_overview) | Each record represents an employee with enriched personal information and the positions they hold. This helps measure employee demographic and geographical distribution, overall retention and turnover, and compensation analysis of their employees. |
| [workday__job_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__job_overview) | Each record represents a job with enriched details on job profiles and job families. This allows users to understand recruitment patterns and details within a job and job groupings. |
| [workday__organization_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__organization_overview) | Each record represents organization, organization roles, as well as positions and workers tied to these organizations. This allows end users to slice organizational data at any grain to better analyze organizational structures. |
| [workday__position_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__position_overview) | Each record represents a position with enriched data on positions. This allows end users to understand position availabilities, vacancies, cost to optimize hiring efforts. |
| **model** | **description** |**available in Quickstart?**
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
| ------------------------- | ------------------------------------------------------------------------------------------------------------------|------------------------------
| [workday__employee_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__employee_overview) | Each record represents an employee with enriched personal information and the positions they hold. This helps measure employee demographic and geographical distribution, overall retention and turnover, and compensation analysis of their employees. | Yes
| [workday__job_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__job_overview) | Each record represents a job with enriched details on job profiles and job families. This allows users to understand recruitment patterns and details within a job and job groupings. | Yes
| [workday__organization_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__organization_overview) | Each record represents organization, organization roles, as well as positions and workers tied to these organizations. This allows end users to slice organizational data at any grain to better analyze organizational structures. | Yes
| [workday__position_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__position_overview) | Each record represents a position with enriched data on positions. This allows end users to understand position availabilities, vacancies, cost to optimize hiring efforts. | Yes
| [workday__employee_daily_history](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__employee_daily_history) | Each record represents a daily record for an employee, employee position, and employee personal information within Workday HCM, to help customers gather the most historically accurate data regarding their employees. | No
| [workday__monthly_summary](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__monthly_summary) | Each record is a month, aggregated from the last day of each month of the employee daily history. This captures monthly aggregated metrics to track trends like employee additions and churns, salary movements, demographic changes, etc. | No
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
<!--section-end-->

# 🎯 How do I use the dbt package?
Expand All @@ -57,12 +61,12 @@ dispatch:
```

## Step 2: Install the package
Include the following Workday package version in your `packages.yml` file:
Include the following Workday HCM package version in your `packages.yml` file:
> TIP: Check [dbt Hub](https://hub.getdbt.com/) for the latest installation instructions or [read the dbt docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.
```yml
packages:
- package: fivetran/workday
version: [">=0.1.0", "<0.2.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=0.2.0", "<0.3.0"] # we recommend using ranges to capture non-breaking changes automatically
```

## Step 3: Define database and schema variables
Expand Down Expand Up @@ -91,6 +95,73 @@ Please be aware that the native `source.yml` connection set up in the package wi

To connect your multiple schema/database sources to the package models, follow the steps outlined in the [Union Data Defined Sources Configuration](https://github.com/fivetran/dbt_fivetran_utils/tree/releases/v0.4.latest#union_data-source) section of the Fivetran Utils documentation for the union_data macro. This will ensure a proper configuration and correct visualization of connections in the DAG.

## (Optional) Step 4: Utilizing Workday HCM History Mode records
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved

If you have History Mode enabled for your Workday HCM connector, we now include support for the worker, worker position, worker position organization, and personal information tables directly. You can view these files in the [`staging/workday_history`](https://github.com/fivetran/dbt_workday/blob/main/models/staging/workday_history) folder. This staging data then flows into the employee daily history model, which in turn populates the monthly summary model. This will allow you access to your historical data for these tables for the most accurate record of your data over time.

### IMPORTANT: How To Update Your History Models
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
To ensure maximum value for these history mode models and avoid messy historical data that could come with picking and choosing which fields you bring in, **all fields in your Workday HCM history mode connector are being synced into the workday history staging models**.


To update the history mode models, you must follow these steps:
1) Go to your Fivetran Workday HCM History Mode connector page.
2) Update the fields that you are bringing into the model.
3) Run a `dbt run --full-refresh` on the specific staging models you've updated to bring in these fields and all the historical data available with these fields.
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved

We are aware that bringing in additional fields will be very process-heavy, so we do emphasize caution in making changes to your history mode connector. It would be best to batch as many field changes as possible before executing a `--full-refresh` to save on processing.


### Configuring Your Workday HCM History Mode Database and Schema Variables
Customers leveraging the Workday HCM connector generally fall into one of two categories when taking advantage of History mode. They either have one connector that is syncing non-historical records and a separate connector that syncs historical records, **or** they have one connector that is syncing historical records. We have designed this feature to support both scenarios.
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved

#### Option 1: Two connectors, one with non-historical data and another with historical data
If you are gathering data from both standard Workday HCM as well as Workday HCM History Mode, and your target database and schema differ as well, you will need to add an additional configuration for the history schema and database to your `dbt_project.yml`.

```yml
vars:
workday_database: your_database_name # workday by default
workday_schema: your_schema_name

workday_history_database: your_history_database_name # workday_history by default
workday_history_schema: your_history_schema_name
```

#### Option 2: One connector being used to sync historical data
Perhaps you may only want to use the Workday HCM History Mode to bring in your data. Because the Workday HCM schema is pointing to the default `workday` schema and database, you will want to add the following variable into your `dbt_project.yml` to point it to the `workday_history` equivalents.

```yml
vars:
workday_database: your_history_database_name # workday by default
workday_schema: your_history_schema_name

workday_history_database: your_history_database_name # workday_history by default
workday_history_schema: your_history_schema_name
```

**IMPORTANT**: If you utilize Option 2, you must sync the equivalent enabled tables and fields in your history mode connector that are being brought into your end reports. Examine your data lineage and the model fields within the `workday` folder to see which tables and fields you are using and need to bring in and sync in the history mode connector.

### Enabling Workday HCM History Mode Models
The History Mode models can get quite expansive since it will take in **ALL** historical records, so we've disabled them by default. You can enable the history models you'd like to utilize by adding the below variable configurations within your `dbt_project.yml` file for the equivalent models.

```yml
# dbt_project.yml

...
vars:
employee_history_enabled: true # False by default. Only use if you have history mode enabled and wish to view the full historical record.
```

### Filter your Workday HCM History Mode models
By default, these history models are set to bring in all your data from Workday HCM History, but you may be interested in bringing in only a smaller sample of historical records, given the relative size of the Workday HCM history source tables. By default, the package will use `2020-01-01` as the minimum date for the historical end models. This date was chosen to ensure there was a limit to the amount of historical data processed on first run. This default may be overwritten to your liking by leveraging the below variables.

We have set up where conditions in our staging models to allow you to bring in only the data you need to run in. You can set a global history filter that would apply to all of our staging history models in your `dbt_project.yml`:


```yml
vars:
employee_history_start_date: 'YYYY-MM-DD' # The first `_fivetran_start` date you'd like to filter data on in all your history models.
```


## (Optional) Step 4: Additional configurations

Expand Down
Loading