-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
4d91cf4
commit 9fef273
Showing
14 changed files
with
607 additions
and
115 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,104 @@ | ||
# %% [markdown] | ||
# # Example: Conditional Parameter Grids | ||
# | ||
# This example shows the usage of `PyExperimenter` with a conditional parameter grid. We will programmatically define the parameter combinations of a support vector machine, instead of generating the entire cartesian product from the parameters defined in the config file. | ||
# | ||
# To execute this notebook you need to install: | ||
# ``` | ||
# pip install py_experimenter | ||
# pip install scikit-learn | ||
# ``` | ||
|
||
# %% [markdown] | ||
# ## Experiment Configuration File | ||
# This notebook shows an example execution of `PyExperimenter` based on an experiment configuration file. Further explanation about the usage of `PyExperimenter` can be found in the [documentation](https://tornede.github.io/py_experimenter/usage.html). Here, we only define keyfields and resultfields and do not set the parameter values in the experiment configuration file as we will create the parameter grid programmatically. | ||
|
||
import random | ||
from random import randint | ||
from time import sleep | ||
|
||
import numpy as np | ||
from sklearn.datasets import load_iris | ||
from sklearn.model_selection import cross_validate | ||
from sklearn.pipeline import make_pipeline | ||
from sklearn.preprocessing import StandardScaler | ||
from sklearn.svm import SVC | ||
|
||
from py_experimenter.experimenter import PyExperimenter | ||
# %% | ||
from py_experimenter.result_processor import ResultProcessor | ||
|
||
|
||
def run_svm(parameters: dict, result_processor: ResultProcessor, custom_config: dict): | ||
sleep(randint(0, 5)) | ||
seed = parameters['seed'] | ||
random.seed(seed) | ||
np.random.seed(seed) | ||
|
||
data = load_iris() | ||
|
||
X = data.data | ||
y = data.target | ||
|
||
# Create Support Vector Machine with parameters dependent on the kernel | ||
kernel = parameters['kernel'] | ||
if kernel == 'linear': | ||
svc = SVC(kernel=parameters['kernel']) | ||
elif kernel == 'poly': | ||
svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma'], coef0=parameters['coef0'], degree=parameters['degree']) | ||
elif kernel == 'rbf': | ||
svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma']) | ||
|
||
svc = SVC() | ||
|
||
model = make_pipeline(StandardScaler(), svc) | ||
|
||
if parameters['dataset'] != 'iris': | ||
raise ValueError("Example error") | ||
|
||
scores = cross_validate(model, X, y, | ||
cv=parameters['cross_validation_splits'], | ||
scoring=('accuracy', 'f1_micro'), | ||
return_train_score=True | ||
) | ||
|
||
result_processor.process_results({ | ||
'train_f1': np.mean(scores['train_f1_micro']), | ||
'train_accuracy': np.mean(scores['train_accuracy']) | ||
}) | ||
|
||
result_processor.process_results({ | ||
'test_f1': np.mean(scores['test_f1_micro']), | ||
'test_accuracy': np.mean(scores['test_accuracy'])}) | ||
|
||
|
||
experimenter = PyExperimenter(experiment_configuration_file_path='conditional_example.yml', name="SVM_experimenter_01") | ||
|
||
combinations = [{'kernel': 'rbf', 'gamma': gamma, 'degree': None, 'coef0': None} for gamma in ['0.1', '0.3']] | ||
combinations += [{'kernel': 'poly', 'gamma': gamma, 'degree': degree, 'coef0': coef0} | ||
for gamma in ['0.1', '0.3'] for degree in ['3', '4'] for coef0 in ['0.0', '0.1']] | ||
combinations += [{'kernel': 'linear', 'gamma': None, 'degree': None, 'coef0': None}] | ||
|
||
# Fill experimenter | ||
experimenter.fill_table_from_combination(parameters={'seed': ['1', '2', '3', '4', '5'], | ||
'dataset': ['iris'], | ||
'cross_validation_splits': ['5']}, | ||
fixed_parameter_combinations=combinations) | ||
|
||
# showing database table | ||
experimenter.get_table() | ||
|
||
# %% [markdown] | ||
# ### Execute PyExperimenter | ||
# All experiments are executed one after the other by the same `PyExperimenter` due to `max_experiments=-1`. If just a single one or a predifined number of experiments should be executed, the `-1` has to be replaced by the according amount. | ||
# The first parameter, i.e. `run_svm`, relates to the actual method that should be executed with the given keyfields of the table. | ||
|
||
# %% | ||
experimenter.execute(run_svm, max_experiments=-1) | ||
|
||
# showing database table | ||
experimenter.get_table() | ||
|
||
# %% [markdown] | ||
# ### CodeCarbon | ||
# Note that `CodeCarbon` is activated by default, collecting information about the carbon emissions of each experiment. Have a look at our [general usage example](https://tornede.github.io/py_experimenter/examples/example_general_usage.html) and the according [documentation of CodeCarbon fields](https://tornede.github.io/py_experimenter/usage.html#codecarbon-fields) for more information. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
PY_EXPERIMENTER: | ||
n_jobs: 1 | ||
|
||
Database: | ||
provider: sqlite | ||
database: py_experimenter | ||
table: | ||
name: example_conditional_grid | ||
keyfields: | ||
- dataset | ||
- cross_validation_splits: int | ||
- seed: int | ||
- kernel | ||
- gamma: DECIMAL | ||
- degree: int | ||
- coef0: DECIMAL | ||
result_timestamps: false | ||
resultfields: | ||
- train_f1: DECIMAL | ||
- train_accuracy: DECIMAL | ||
- test_f1: DECIMAL | ||
- test_accuracy: DECIMAL | ||
|
||
CUSTOM: | ||
path: sample_data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,46 @@ | ||
PY_EXPERIMENTER: | ||
n_jobs: 1 | ||
|
||
Database: | ||
provider: sqlite | ||
database: py_experimenter | ||
table: | ||
name: example_general_usage | ||
keyfields: | ||
- dataset | ||
- cross_validation_splits: int | ||
- seed: int | ||
- kernel | ||
result_timestamps: False | ||
resultfields: | ||
- pipeline: LONGTEXT | ||
- train_f1: DECIMAL | ||
- train_accuracy: DECIMAL | ||
- test_f1: DECIMAL | ||
- test_accuracy: DECIMAL | ||
|
||
Experiments: | ||
dataset: | ||
- iris | ||
cross_validation_splits: | ||
- 5 | ||
seed: | ||
- 2 | ||
- 4 | ||
- 6 | ||
kernel: | ||
- linear | ||
- poly | ||
- rbf | ||
- sigmoid | ||
|
||
Custom: | ||
datapath: sample_data | ||
|
||
CodeCarbon: | ||
offline_mode: False | ||
measure_power_secs: 25 | ||
tracking_mode: process | ||
log_level: error | ||
save_to_file: True | ||
output_dir: output/CodeCarbon |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,151 @@ | ||
# %% [markdown] | ||
# # Example: General Usage | ||
# | ||
# This example shows the general usage of `PyExperimenter`, from creating an experiment configuration file, over the actual execution of (dummy) experiments, to the extraction of experimental results. | ||
# | ||
# To execute this notebook you need to install: | ||
# ``` | ||
# pip install py_experimenter | ||
# pip install scikit-learn | ||
# ``` | ||
|
||
# %% [markdown] | ||
# ## Experiment Configuration File | ||
# This notebook shows an example execution of `PyExperimenter` based on an experiment configuration file. Further explanation about the usage of `PyExperimenter` can be found in the [documentation](https://tornede.github.io/py_experimenter/usage.html). | ||
|
||
# %% | ||
from py_experimenter.experimenter import PyExperimenter | ||
from py_experimenter.result_processor import ResultProcessor | ||
from sklearn.svm import SVC | ||
from sklearn.preprocessing import StandardScaler | ||
from sklearn.pipeline import make_pipeline | ||
from sklearn.model_selection import cross_validate | ||
from sklearn.datasets import load_iris | ||
import numpy as np | ||
import random | ||
import os | ||
|
||
|
||
def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict): | ||
seed = parameters['seed'] | ||
random.seed(seed) | ||
np.random.seed(seed) | ||
|
||
data = load_iris() | ||
# In case you want to load a file from a path | ||
# path = os.path.join(custom_config['path'], parameters['dataset']) | ||
# data = pd.read_csv(path) | ||
|
||
X = data.data | ||
y = data.target | ||
|
||
model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto')) | ||
result_processor.process_results({ | ||
'pipeline': str(model) | ||
}) | ||
|
||
if parameters['dataset'] != 'iris': | ||
raise ValueError("Example error") | ||
|
||
scores = cross_validate(model, X, y, | ||
cv=parameters['cross_validation_splits'], | ||
scoring=('accuracy', 'f1_micro'), | ||
return_train_score=True | ||
) | ||
|
||
result_processor.process_results({ | ||
'train_f1': np.mean(scores['train_f1_micro']), | ||
'train_accuracy': np.mean(scores['train_accuracy']) | ||
}) | ||
|
||
result_processor.process_results({ | ||
'test_f1': np.mean(scores['test_f1_micro']), | ||
'test_accuracy': np.mean(scores['test_accuracy']) | ||
}) | ||
|
||
|
||
experimenter = PyExperimenter(experiment_configuration_file_path="example.yml", name='example_notebook') | ||
|
||
|
||
# %% | ||
experimenter.fill_table_from_config() | ||
|
||
experimenter.fill_table_with_rows(rows=[ | ||
{'dataset': 'error_dataset', 'cross_validation_splits': 3, 'seed': 42, 'kernel': 'linear'}]) | ||
|
||
# showing database table | ||
experimenter.get_table() | ||
|
||
# %% [markdown] | ||
# ### Execute PyExperimenter | ||
# All experiments are executed one after the other by the same `PyExperimenter` due to `max_experiments=-1`. If just a single one or a predifined number of experiments should be executed, the `-1` has to be replaced by the according amount | ||
# | ||
# The first parameter, i.e. `run_ml`, relates to the actual method that should be executed with the given keyfields of the table. | ||
|
||
# %% | ||
experimenter.execute(run_ml, max_experiments=-1) | ||
|
||
# showing database table | ||
experimenter.get_table() | ||
|
||
# %% [markdown] | ||
# ### Restart Failed Experiments | ||
# | ||
# As experiments fail at some time, those experiments were reset for another try with `reset_experiments()`. The `status` describes which table rows should be replace. In this example all failed experiments, i.e. having `status==error`, are reset. Experiments can also be reset based on multiple status by simply passing a list of status, e.g. `experimenter.reset_experiments('error', 'done')`. In that case, all experiments with status 'error' or 'done' will be reset. | ||
|
||
# %% | ||
experimenter.reset_experiments('error') | ||
|
||
# showing database table | ||
experimenter.get_table() | ||
|
||
# %% [markdown] | ||
# After the reset of failed experiments, they can be executed again as described above. | ||
|
||
# %% | ||
experimenter.execute(run_ml, max_experiments=-1) | ||
|
||
# showing database table | ||
experimenter.get_table() | ||
|
||
# %% [markdown] | ||
# ### Generating Result Table | ||
# | ||
# | ||
# The table containes single experiment results. Those can be aggregated, e.g. to generate the mean over all seeds. | ||
|
||
# %% | ||
result_table_agg = experimenter.get_table().groupby(['dataset']).mean(numeric_only=True) | ||
result_table_agg | ||
|
||
# %% [markdown] | ||
# ### Printing LaTex Table | ||
# | ||
# As `pandas.Dataframe`s can easily be printed as LaTex table, here is an example code for one of the above result columns. | ||
|
||
# %% | ||
print(result_table_agg[['test_f1']].style.to_latex()) | ||
|
||
# %% [markdown] | ||
# ### CodeCarbon | ||
# [CodeCarbon](https://tornede.github.io/py_experimenter/usage/experiment_configuration_file.html#codecarbon) is integrated into `PyExperimenter` to provide information about the carbon emissions of experiments. `CodeCarbon` will create a table with suffix `_codecarbon` in the database, each row containing information about the carbon emissions of a single experiment. | ||
|
||
# %% | ||
experimenter.get_codecarbon_table() | ||
|
||
# %% [markdown] | ||
# #### Aggregating CodeCarbon Results | ||
# | ||
# The carbon emission information of `CodeCarbon` can be easily aggregated via `pandas.Dataframe`. | ||
|
||
# %% | ||
carbon_emissions = experimenter.get_codecarbon_table().groupby(['project_name']).sum(numeric_only=True) | ||
carbon_emissions | ||
|
||
# %% [markdown] | ||
# #### Printing CodeCarbon Results as LaTex Table | ||
# | ||
# Furthermore, the resulting `pandas.Dataframe` can easily be printed as LaTex table. | ||
|
||
# %% | ||
print(carbon_emissions[['energy_consumed_kw', 'emissions_kg']].style.to_latex()) |
Oops, something went wrong.