Skip to content

Commit

Permalink
Observability (#15)
Browse files Browse the repository at this point in the history
  • Loading branch information
fmind authored Jul 23, 2024
1 parent 5598082 commit be5bf88
Show file tree
Hide file tree
Showing 66 changed files with 13,734 additions and 707 deletions.
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,29 @@
## v1.1.0 (2024-07-21)

### Feat

- **kpi**: add key performance indicators
- **mlproject**: add mlflow project and tasks
- **monitoring**: add mlflow.evaluate API
- **lineage**: add lineage features through mlflow data api
- **explanations**: add explainability features and tooling
- **data**: add train, test, and sample data
- **notification**: add service and alerts with plyer
- **observability**: add alerting with plyer notifications
- **observability**: add infrastructure through mlflow system metrics

### Fix

- **kpi**: add key performance indicators
- **projects**: change naming convention
- **evaluation**: add evaluation files
- **loading**: use version or alias for loading models
- **warnings**: improve styles and remove warnings
- **mlflow**: remove input examples following the addition of lineage
- **paths**: fix path for explanation job
- **data**: fix models explanations name
- **data**: add parquet data

## v1.0.1 (2024-06-28)

### Fix
Expand Down
9 changes: 9 additions & 0 deletions MLproject
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# https://mlflow.org/docs/latest/projects.html

name: bikes
python_env: python_env.yaml
entry_points:
main:
parameters:
conf_file: path
command: "PYTHONPATH=src python -m bikes {conf_file}"
104 changes: 97 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,13 @@ You can use this package as part of your MLOps toolkit or platform (e.g., Model
- [Programming](#programming)
- [Language: Python](#language-python)
- [Version: Pyenv](#version-pyenv)
- [Observability](#observability)
- [Reproducibility: Mlflow Project](#reproducibility-mlflow-project)
- [Monitoring : Mlflow Evaluate](#monitoring--mlflow-evaluate)
- [Alerting: Plyer](#alerting-plyer)
- [Lineage: Mlflow Dataset](#lineage-mlflow-dataset)
- [Explainability: SHAP](#explainability-shap)
- [Infrastructure: Mlflow System Metrics](#infrastructure-mlflow-system-metrics)
- [Tips](#tips)
- [AI/ML Practices](#aiml-practices)
- [Data Catalog](#data-catalog)
Expand Down Expand Up @@ -150,10 +157,10 @@ job:
KIND: TrainingJob
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets.parquet
path: data/targets_train.parquet
```
This config file instructs the program to start a `TrainingJob` with 2 parameters:
Expand All @@ -173,6 +180,8 @@ $ poetry run [package] confs/tuning.yaml
$ poetry run [package] confs/training.yaml
$ poetry run [package] confs/promotion.yaml
$ poetry run [package] confs/inference.yaml
$ poetry run [package] confs/evaluations.yaml
$ poetry run [package] confs/explanations.yaml
```

In production, you can build, ship, and run the project as a Python package:
Expand Down Expand Up @@ -210,7 +219,7 @@ You can invoke the actions from the [command-line](https://www.pyinvoke.org/) or

```bash
# execute the project DAG
$ inv dags
$ inv projects
# create a code archive
$ inv packages
# list other actions
Expand All @@ -231,13 +240,16 @@ $ inv --list
- **cleans.coverage** - Clean the coverage tool.
- **cleans.dist** - Clean the dist folder.
- **cleans.docs** - Clean the docs folder.
- **cleans.environment** - Clean the project environment file.
- **cleans.folders** - Run all folders tasks.
- **cleans.mlruns** - Clean the mlruns folder.
- **cleans.mypy** - Clean the mypy tool.
- **cleans.outputs** - Clean the outputs folder.
- **cleans.poetry** - Clean poetry lock file.
- **cleans.pytest** - Clean the pytest tool.
- **cleans.projects** - Run all projects tasks.
- **cleans.python** - Clean python caches and bytecodes.
- **cleans.requirements** - Clean the project requirements file.
- **cleans.reset** - Run all tools, folders, and sources tasks.
- **cleans.ruff** - Clean the ruff tool.
- **cleans.sources** - Run all sources tasks.
Expand All @@ -251,8 +263,6 @@ $ inv --list
- **containers.build** - Build the container image with the given tag.
- **containers.compose** - Start up docker compose.
- **containers.run** - Run the container image with the given tag.
- **dags.all (dags)** - Run all DAG tasks.
- **dags.job** - Run the project for the given job name.
- **docs.all (docs)** - Run all docs tasks.
- **docs.api** - Document the API with pdoc using the given format and output directory.
- **docs.serve** - Serve the API docs with pdoc using the given format and computer port.
Expand All @@ -267,6 +277,10 @@ $ inv --list
- **mlflow.serve** - Start mlflow server with the given host, port, and backend uri.
- **packages.all (packages)** - Run all package tasks.
- **packages.build** - Build a python package with the given format.
- **projects.all (projects)** - Run all project tasks.
- **projects.environment** - Export the project environment file.
- **projects.requirements** - Export the project requirements file.
- **projects.run** - Run an mlflow project from MLproject file.

## Workflows

Expand Down Expand Up @@ -719,6 +733,82 @@ Select your programming environment.
- **Alternatives**:
- Manual installation: time consuming

## Observability

### Reproducibility: [Mlflow Project](https://mlflow.org/docs/latest/projects.html)

- **Motivations**:
- Share common project formats.
- Ensure the project can be reused.
- Avoid randomness in project execution.
- **Limitations**:
- Mlflow Project is best suited for small projects.
- **Alternatives**:
- [DVC](https://dvc.org/): both data and models.
- [Metaflow](https://metaflow.org/): focus on machine learning.
- **[Apache Airflow](https://airflow.apache.org/)**: for large scale projects.

### Monitoring : [Mlflow Evaluate](https://mlflow.org/docs/latest/model-evaluation/index.html)

- **Motivations**:
- Compute the model metrics.
- Validate model with thresholds.
- Perform post-training evaluations.
- **Limitations**:
- Mlflow Evaluate is less feature-rich as alternatives.
- **Alternatives**:
- **[Giskard](https://www.giskard.ai/)**: open-core and super complete.
- **[Evidently](https://www.evidentlyai.com/)**: open-source with more metrics.
- [Arize AI](https://arize.com/): more feature-rich but less flexible.
- [Graphana](https://grafana.com/): you must do everything yourself.

### Alerting: [Plyer](https://github.com/kivy/plyer)

- **Motivations**:
- Simple solution.
- Send notifications on system.
- Cross-system: Mac, Linux, Windows.
- **Limitations**:
- Should not be used for large scale projects.
- **Alternatives**:
- [Slack](https://slack.com/): for chat-oriented solutions.
- [Datadog](https://www.datadoghq.com/): for infrastructure oriented solutions.

### Lineage: [Mlflow Dataset](https://mlflow.org/docs/latest/tracking/data-api.html)

- **Motivations**:
- Store information in Mlflow.
- Track metadata about run datasets.
- Keep URI of the dataset source (e.g., website).
- **Limitations**:
- Not as feature-rich as alternative solutions.
- **Alternatives**:
- [Databricks Lineage](https://docs.databricks.com/en/admin/system-tables/lineage.html): limited to Databricks.
- [OpenLineage and Marquez](https://marquezproject.github.io/): open-source and flexible.

### Explainability: [SHAP](https://shap.readthedocs.io/en/latest/)

- **Motivations**:
- Most popular toolkit.
- Support various models (linear, model, ...).
- Integration with Mlflow through the [SHAP module](https://mlflow.org/docs/latest/python_api/mlflow.shap.html).
- **Limitations**:
- Super slow on large dataset.
- Mlflow SHAP module is not mature enough.
- **Alternatives**:
- [LIME](https://github.com/marcotcr/lime): not maintained anymore.

### Infrastructure: [Mlflow System Metrics](https://mlflow.org/docs/latest/system-metrics/index.html)

- **Motivations**:
- Track infrastructure information (RAM, CPU, ...).
- Integrated with Mlflow tracking.
- Provide hardware insights.
- **Limitations**:
- Not as mature as alternative solutions.
- **Alternatives**:
- [Datadog](https://www.datadoghq.com/): popular and mature solution.

# Tips

This sections gives some tips and tricks to enrich the develop experience.
Expand All @@ -736,10 +826,10 @@ This tag can then be associated to a reader/writer implementation in a configura
```yaml
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets.parquet
path: data/targets_train.parquet
```

In this package, the implementation are described in `src/[package]/io/datasets.py` and selected by `KIND`.
Expand Down
8 changes: 8 additions & 0 deletions confs/evaluations.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
job:
KIND: EvaluationsJob
inputs:
KIND: ParquetReader
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets_train.parquet
12 changes: 12 additions & 0 deletions confs/explanations.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
job:
KIND: ExplanationsJob
inputs_samples:
KIND: ParquetReader
path: data/inputs_test.parquet
limit: 100
models_explanations:
KIND: ParquetWriter
path: outputs/models_explanations.parquet
samples_explanations:
KIND: ParquetWriter
path: outputs/samples_explanations.parquet
4 changes: 2 additions & 2 deletions confs/inference.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ job:
KIND: InferenceJob
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_test.parquet
outputs:
KIND: ParquetWriter
path: outputs/predictions.parquet
path: outputs/predictions_test.parquet
4 changes: 2 additions & 2 deletions confs/training.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ job:
KIND: TrainingJob
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets.parquet
path: data/targets_train.parquet
4 changes: 2 additions & 2 deletions confs/tuning.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ job:
KIND: TuningJob
inputs:
KIND: ParquetReader
path: data/inputs.parquet
path: data/inputs_train.parquet
targets:
KIND: ParquetReader
path: data/targets.parquet
path: data/targets_train.parquet
Binary file added data/inputs_test.parquet
Binary file not shown.
Binary file renamed data/inputs.parquet → data/inputs_train.parquet
Binary file not shown.
Binary file added data/targets_test.parquet
Binary file not shown.
Binary file renamed data/targets.parquet → data/targets_train.parquet
Binary file not shown.
Loading

0 comments on commit be5bf88

Please sign in to comment.