Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workday employee daily history model + monthly summary model + updates to employee surrogate key #5

Merged
merged 21 commits into from
Apr 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 2 additions & 3 deletions .buildkite/scripts/run_models.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ dbt deps
dbt seed --target "$db" --full-refresh
dbt run --target "$db" --full-refresh
dbt test --target "$db"
dbt run --target "$db"
dbt test --target "$db"

dbt run --vars '{employee_history_enabled: true}' --target "$db"
dbt test --vars '{employee_history_enabled: true}' --target "$db"
dbt run-operation fivetran_utils.drop_schemas_automation --target "$db"
57 changes: 57 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,60 @@
# dbt_workday v0.2.0
Lots of major updates! [PR #5](https://github.com/fivetran/dbt_workday/pull/5) includes the following changes:

## 🚨 Breaking Changes 🚨
- We are now materializing staging models as ephemeral rather than views, as they are mostly redundant with the source tables and are primarily designed for preparing models for final transformation. Previous staging views will no longer be used and will be considered stale.

## 🔑 New Primary Key 🔑
- Created a surrogate key `employee_id` in `workday__employee_overview` that combines `worker_id`, `source_relation`, `position_id`, and `position_start_date`. This accounts for edge cases like when:
- A worker can hold multiple positions concurrently.
- A position being held by multiple workers concurrently.
- A worker being rehired for the same position.

## 🚀 Feature Updates 🚀
- We have added three end models in the [`models/workday_history`](https://github.com/fivetran/dbt_workday/tree/main/models/workday_history) folder [thanks to support from Fivetran's history mode feature](https://fivetran.com/docs/core-concepts/sync-modes/history-mode). These models provide historical daily data looks into crucial worker/employee Workday models, as well as allowing users to assess monthly summary metrics. These end models include:

- `workday__employee_daily_history`: Each record is a daily record in an employee, starting with its first active date and updating up toward either the current date (if still active) or its last active date. This will allow customers to track the daily history of their employees from when they started.

- `workday__monthly_summary`: Each record is a month, aggregated from the last day of each month of the employee daily history. This captures monthly metrics of workers, such as average salary, churned and retained employees, etc.

- `workday_worker_position_org_daily_history`: Each record is a daily record for a worker/position/organization combination, starting with its first active date and updating up toward either the current date (if still active) or its last active date. This will allow customers to tie in organizations to employees via other organization models (such as `workday__organization_overview`) more easily in their warehouses.

- We have added staging history mode models in the [`models/workday_history/staging`](https://github.com/fivetran/dbt_workday/tree/main/models/workday_history/staging) folder. This allows customers to utilize the Fivetran history mode feature, which records every version of each record in the source table from the moment this mode is activated in the equivalent tables.

- These staging models include:

- `stg_workday__personal_information_history`: Containing historical records of a worker's personal information.
- `stg_workday__worker_history`: Containing historical records of a worker's history.
- `stg_workday__worker_position_history`: Containing historical records of a worker's position history.
- `stg_workday__worker_position_organization_history`: Containing historical records of a worker's position and organization history.

- We have then utilized the `workday__employee_daily_history` model in the [`models/workday_history`](https://github.com/fivetran/dbt_workday/tree/main/models/workday_history) folder [based off of Fivetran's history mode feature](https://fivetran.com/docs/core-concepts/sync-modes/history-mode), pulling from Workday HCM source models you can view in the [`models/workday_history/staging`](https://github.com/fivetran/dbt_workday/tree/main/models/workday_history/staging) folder.

- We have kept the `stg_workday__worker_position_organization_history` model separate, as organizational data is too flexible in Workday to effectively join in the majority of data. We leave it to the customer to use their best judgement in joining this data into other end models in their own warehouse. [See the DECISIONLOG for more details](https://github.com/fivetran/dbt_workday/blob/main/DECISIONLOG.md).

- These models are disabled by default due to their size, so you will need to set the below variable configurations for each of the individual models you want to utilize in your `dbt_project.yml`.

```yml
vars:
employee_history_enabled: true
```

- Users can set a custom `employee_history_start_date` filter to narrow down the number of historical records they bring into your staging and end models. By default, the package will use the minimum `_fivetran_start` date to generate the start date for the final daily history models. This default may be overwritten to your liking by leveraging the below variable.

```yml
vars:
employee_history_start_date: 'YYYY-MM-DD' # The first `_fivetran_start` date you'd like to filter data on in all your history models.
```

- We have also added the `workday__monthly_summary` model in the [`models/workday_history`](https://github.com/fivetran/dbt_workday/tree/main/models/workday_history) folder. This table aggregates high-level monthly metrics to track changes over time to overall employee data for a customer.

- We have chosen not to implement incremental logic in the history models due to the future-facing updating of Workday HCM transactions beyond current daily updates. [See the DECISIONLOG](https://github.com/fivetran/dbt_workday/blob/main/DECISIONLOG.md) for more details.

- Workday HCM History Mode models can contain a multitude of rows if you bring in all historical data, so we've introduced the flexibility to set first date filters to bring in only the historical data you need. [More details can be found in the README](https://github.com/fivetran/dbt_workday/blob/main/README.md#filter-your-workday-hcm-history-mode-models).

## 🚘 Under the Hood 🚘
- Created `int_workday__worker_employee_enhanced` model to simplify end model processing in the `workday__employee_overview`, which is now focused on generating the surrogate key.

# dbt_workday v0.1.1

[PR #4](https://github.com/fivetran/dbt_workday/pull/4) contains the following updates:
Expand Down
16 changes: 16 additions & 0 deletions DECISIONLOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
## On not adding incremental logic into the Workday HCM History models
Generally, when working with large volume models like the ones created by Fivetran History Mode, we tend to implement incremental models. [See Salesforce](https://github.com/fivetran/dbt_salesforce?tab=readme-ov-file#optional-step-4-utilizing-salesforce-history-mode-records) for a particular example of that implementation.

However, in the Workday HCM case, we have found that History Mode does not fit the use case for incremental logic due to the following reasons.
* Transactions can be future-dated. The most common case is an employee being hired for a future date beyond the current date, so an incremental run will pick up numerous records in the future, leading to potential duplications down the road for an employee's records.
* There are additional cases where an employee's record can be updated in the past beyond a common incremental window.

We welcome all attempts to optimize this strategy though, and would be open to enhancements to the package!

## Why we kept the worker position organization history model separate from the employee daily history model

The intent of the `workday__employee_daily_history` model was to combine historical data from all relevant worker history models and gather a daily look at that data based on employee and worker.

However, with `stg_workday__worker_position_organization_history`, the values for organization are too customizable, and thus impossible to just into an `employee_daily_history` model with any clear definitions.

Instead we have decided to keep the model separate in `workday__worker_position_org_history`, leaving end customers the ability to configure what organizations they end up joining into the employee daily history within their warehouses. The `int_workday__employee_history` model provides a solid guide into configuring your own custom-type history mode model.
54 changes: 43 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,20 +24,26 @@ The main focus of the package is to transform the core object tables into analyt
- Adds column-level testing where applicable. For example, all primary keys are tested for uniqueness and non-null values.
- Provides insight into your Workday HCM data across the following grains:
- Employee, job, organization, position.
- Generates a comprehensive data dictionary of your Workday HCM data through the [dbt docs site](https://fivetran.github.io/dbt_workday/).
- Gather daily historical records of employees.

This package generates a comprehensive data dictionary of your Workday HCM data through the [dbt docs site](https://fivetran.github.io/dbt_workday/).

> This package does not apply freshness tests to source data due to the variability of survey cadences.

<!--section="workday_model"-->
The following table provides a detailed list of all models materialized within this package by default.
> TIP: See more details about these models in the package's [dbt docs site](https://fivetran.github.io/dbt_workday/#!/overview/workday).

| **model** | **description** |
| ------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| [workday__employee_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__employee_overview) | Each record represents an employee with enriched personal information and the positions they hold. This helps measure employee demographic and geographical distribution, overall retention and turnover, and compensation analysis of their employees. |
| [workday__job_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__job_overview) | Each record represents a job with enriched details on job profiles and job families. This allows users to understand recruitment patterns and details within a job and job groupings. |
| [workday__organization_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__organization_overview) | Each record represents organization, organization roles, as well as positions and workers tied to these organizations. This allows end users to slice organizational data at any grain to better analyze organizational structures. |
| [workday__position_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__position_overview) | Each record represents a position with enriched data on positions. This allows end users to understand position availabilities, vacancies, cost to optimize hiring efforts. |
| **model** | **description** | Available in Quickstart?
| ------------------------- | ------------------------------------------------------------------------------------------------------------------|------------------------------
| [workday__employee_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__employee_overview) | Each record represents an employee with enriched personal information and the positions they hold. This helps measure employee demographic and geographical distribution, overall retention and turnover, and compensation analysis of their employees. | Yes
[workday__job_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__job_overview) | Each record represents a job with enriched details on job profiles and job families. This allows users to understand recruitment patterns and details within a job and job groupings. | Yes
| [workday__organization_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__organization_overview) | Each record represents organization, organization roles, as well as positions and workers tied to these organizations. This allows end users to slice organizational data at any grain to better analyze organizational structures. | Yes
| [workday__position_overview](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__position_overview) | Each record represents a position with enriched data on positions. This allows end users to understand position availabilities, vacancies, cost to optimize hiring efforts. | Yes
| [workday__employee_daily_history](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__employee_daily_history) | Each record represents a daily record for an employee, employee position, and employee personal information within Workday HCM, to help customers gather the most historically accurate data regarding their employees. | No
| [workday__monthly_summary](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__monthly_summary) | Each record is a month, aggregated from the last day of each month of the employee daily history. This captures monthly aggregated metrics to track trends like employee additions and churns, salary movements, demographic changes, etc. | No
| [workday__worker_position_org_daily_history](https://fivetran.github.io/dbt_workday/#!/model/model.workday.workday__worker_position_org_daily_history) | Each record is a daily record for a worker/position/organization combination, starting with its first active date and updating up toward either the current date (if still active) or its last active date. This will allow customers to tie in organizations to employees via other organization models (such as `workday__organization_overview`) more easily in their warehouses. | No

<!--section-end-->

# 🎯 How do I use the dbt package?
Expand All @@ -57,12 +63,12 @@ dispatch:
```

## Step 2: Install the package
Include the following Workday package version in your `packages.yml` file:
Include the following Workday HCM package version in your `packages.yml` file:
> TIP: Check [dbt Hub](https://hub.getdbt.com/) for the latest installation instructions or [read the dbt docs](https://docs.getdbt.com/docs/package-management) for more information on installing packages.
```yml
packages:
- package: fivetran/workday
version: [">=0.1.0", "<0.2.0"] # we recommend using ranges to capture non-breaking changes automatically
version: [">=0.2.0", "<0.3.0"] # we recommend using ranges to capture non-breaking changes automatically
```

## Step 3: Define database and schema variables
Expand Down Expand Up @@ -91,8 +97,34 @@ Please be aware that the native `source.yml` connection set up in the package wi

To connect your multiple schema/database sources to the package models, follow the steps outlined in the [Union Data Defined Sources Configuration](https://github.com/fivetran/dbt_fivetran_utils/tree/releases/v0.4.latest#union_data-source) section of the Fivetran Utils documentation for the union_data macro. This will ensure a proper configuration and correct visualization of connections in the DAG.

## (Optional) Step 4: Utilizing Workday HCM History Mode

If you have History Mode enabled for your Workday HCM connector, we now include support for the worker, worker position, worker position organization, and personal information tables directly. You can view these files in the [`staging`](https://github.com/fivetran/dbt_workday/blob/main/models/workday_history/staging) folder. This staging data then flows into the employee daily history model, which in turn populates the monthly summary model. This will allow you access to your historical data for these tables for the most accurate record of your data over time.

### Enabling Workday HCM History Mode Models
The History Mode models can get quite expansive since it will take in **ALL** historical records, so we've disabled them by default. You can enable the history models you'd like to utilize by adding the below variable configurations within your `dbt_project.yml` file for the equivalent models.

```yml
# dbt_project.yml

...
vars:
employee_history_enabled: true # False by default. Only use if you have history mode enabled and wish to view the full historical record.
```

### Filter your Workday HCM History Mode models
By default, these history models are set to bring in all your data from Workday HCM History, but you may be interested in bringing in only a smaller sample of historical records, given the relative size of the Workday HCM history source tables. By default, the package will use the minimum `_fivetran_start` date for the historical end models. This default may be overwritten to your liking by leveraging the below variable.

We have set up where conditions in our staging models to allow you to bring in only the data you need to run in. You can set a global history filter that would apply to all of our staging history models in your `dbt_project.yml`:

```yml
vars:
employee_history_start_date: 'YYYY-MM-DD' # The first `_fivetran_start` date you'd like to filter data on in all your history models.
```

The default date value in our models is set at `2005-03-01` (the month Workday was founded), designed for if you want to capture all available data by default. If you choose to set a custom date value as outlined above, these models will take the greater of either this value or the minimum `_fivetran_start` date in the source data. They will then be used for creating the first dates available with historical data in your daily history models.

## (Optional) Step 4: Additional configurations
## (Optional) Step 5: Additional configurations

### Changing the Build Schema
By default this package will build the Workday HCM staging models within a schema titled (<target_schema> + `_stg_workday`) and the Workday HCM final models within a schema titled (<target_schema> + `_workday`) in your target database. If this is not where you would like your modeled Workday HCM data to be written to, add the following configuration to your `dbt_project.yml` file:
Expand Down Expand Up @@ -121,7 +153,7 @@ vars:
</details>


## (Optional) Step 5: Orchestrate your models with Fivetran Transformations for dbt Core™
## (Optional) Step 6: Orchestrate your models with Fivetran Transformations for dbt Core™
<details><summary>Expand for details</summary>
<br>

Expand Down
14 changes: 11 additions & 3 deletions dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
config-version: 2
name: 'workday'
version: '0.1.1'
version: '0.2.0'
require-dbt-version: [">=1.3.0", "<2.0.0"]

models:
Expand All @@ -10,8 +10,16 @@ models:
intermediate:
+materialized: ephemeral
staging:
+materialized: view
+materialized: ephemeral
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is technically a breaking change as the old views are now going to be stale. I would call this out in the CHANGELOG as a breaking change and inform customers that the previous staging views will no longer be used.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, don't forget to add the link to the PR in the CHANGELOG for all these changes.

Copy link
Contributor Author

@fivetran-avinash fivetran-avinash Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the CHANGELOG.

+schema: stg_workday
base:
+materialized: view
workday_history:
+materialized: table
fivetran-joemarkiewicz marked this conversation as resolved.
Show resolved Hide resolved
intermediate:
+materialized: view
staging:
+materialized: ephemeral

vars:
job_profile: "{{ source('workday','job_profile') }}"
Expand All @@ -34,4 +42,4 @@ vars:
person_contact_email_address: "{{ source('workday','person_contact_email_address') }}"
worker_position_history: "{{ source('workday','worker_position_history') }}"
worker_leave_status: "{{ source('workday','worker_leave_status') }}"
worker_position_organization_history: "{{ source('workday','worker_position_organization_history') }}"
worker_position_organization_history: "{{ source('workday','worker_position_organization_history') }}"
2 changes: 1 addition & 1 deletion docs/catalog.json

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions docs/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/manifest.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/run_results.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'workday_integration_tests'
version: '0.1.1'
version: '0.2.0'
config-version: 2

profile: 'integration_tests'
Expand Down
Loading