Skip to content

Commit

Permalink
Commit everthing to reset pc
Browse files Browse the repository at this point in the history
  • Loading branch information
LukasFehring committed Nov 21, 2023
1 parent 4d91cf4 commit 9fef273
Show file tree
Hide file tree
Showing 14 changed files with 607 additions and 115 deletions.
104 changes: 104 additions & 0 deletions conditional_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# %% [markdown]
# # Example: Conditional Parameter Grids
#
# This example shows the usage of `PyExperimenter` with a conditional parameter grid. We will programmatically define the parameter combinations of a support vector machine, instead of generating the entire cartesian product from the parameters defined in the config file.
#
# To execute this notebook you need to install:
# ```
# pip install py_experimenter
# pip install scikit-learn
# ```

# %% [markdown]
# ## Experiment Configuration File
# This notebook shows an example execution of `PyExperimenter` based on an experiment configuration file. Further explanation about the usage of `PyExperimenter` can be found in the [documentation](https://tornede.github.io/py_experimenter/usage.html). Here, we only define keyfields and resultfields and do not set the parameter values in the experiment configuration file as we will create the parameter grid programmatically.

import random
from random import randint
from time import sleep

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import cross_validate
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC

from py_experimenter.experimenter import PyExperimenter
# %%
from py_experimenter.result_processor import ResultProcessor


def run_svm(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
sleep(randint(0, 5))
seed = parameters['seed']
random.seed(seed)
np.random.seed(seed)

data = load_iris()

X = data.data
y = data.target

# Create Support Vector Machine with parameters dependent on the kernel
kernel = parameters['kernel']
if kernel == 'linear':
svc = SVC(kernel=parameters['kernel'])
elif kernel == 'poly':
svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma'], coef0=parameters['coef0'], degree=parameters['degree'])
elif kernel == 'rbf':
svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma'])

svc = SVC()

model = make_pipeline(StandardScaler(), svc)

if parameters['dataset'] != 'iris':
raise ValueError("Example error")

scores = cross_validate(model, X, y,
cv=parameters['cross_validation_splits'],
scoring=('accuracy', 'f1_micro'),
return_train_score=True
)

result_processor.process_results({
'train_f1': np.mean(scores['train_f1_micro']),
'train_accuracy': np.mean(scores['train_accuracy'])
})

result_processor.process_results({
'test_f1': np.mean(scores['test_f1_micro']),
'test_accuracy': np.mean(scores['test_accuracy'])})


experimenter = PyExperimenter(experiment_configuration_file_path='conditional_example.yml', name="SVM_experimenter_01")

combinations = [{'kernel': 'rbf', 'gamma': gamma, 'degree': None, 'coef0': None} for gamma in ['0.1', '0.3']]
combinations += [{'kernel': 'poly', 'gamma': gamma, 'degree': degree, 'coef0': coef0}
for gamma in ['0.1', '0.3'] for degree in ['3', '4'] for coef0 in ['0.0', '0.1']]
combinations += [{'kernel': 'linear', 'gamma': None, 'degree': None, 'coef0': None}]

# Fill experimenter
experimenter.fill_table_from_combination(parameters={'seed': ['1', '2', '3', '4', '5'],
'dataset': ['iris'],
'cross_validation_splits': ['5']},
fixed_parameter_combinations=combinations)

# showing database table
experimenter.get_table()

# %% [markdown]
# ### Execute PyExperimenter
# All experiments are executed one after the other by the same `PyExperimenter` due to `max_experiments=-1`. If just a single one or a predifined number of experiments should be executed, the `-1` has to be replaced by the according amount.
# The first parameter, i.e. `run_svm`, relates to the actual method that should be executed with the given keyfields of the table.

# %%
experimenter.execute(run_svm, max_experiments=-1)

# showing database table
experimenter.get_table()

# %% [markdown]
# ### CodeCarbon
# Note that `CodeCarbon` is activated by default, collecting information about the carbon emissions of each experiment. Have a look at our [general usage example](https://tornede.github.io/py_experimenter/examples/example_general_usage.html) and the according [documentation of CodeCarbon fields](https://tornede.github.io/py_experimenter/usage.html#codecarbon-fields) for more information.
25 changes: 25 additions & 0 deletions conditional_example.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
PY_EXPERIMENTER:
n_jobs: 1

Database:
provider: sqlite
database: py_experimenter
table:
name: example_conditional_grid
keyfields:
- dataset
- cross_validation_splits: int
- seed: int
- kernel
- gamma: DECIMAL
- degree: int
- coef0: DECIMAL
result_timestamps: false
resultfields:
- train_f1: DECIMAL
- train_accuracy: DECIMAL
- test_f1: DECIMAL
- test_accuracy: DECIMAL

CUSTOM:
path: sample_data
46 changes: 46 additions & 0 deletions example.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
PY_EXPERIMENTER:
n_jobs: 1

Database:
provider: sqlite
database: py_experimenter
table:
name: example_general_usage
keyfields:
- dataset
- cross_validation_splits: int
- seed: int
- kernel
result_timestamps: False
resultfields:
- pipeline: LONGTEXT
- train_f1: DECIMAL
- train_accuracy: DECIMAL
- test_f1: DECIMAL
- test_accuracy: DECIMAL

Experiments:
dataset:
- iris
cross_validation_splits:
- 5
seed:
- 2
- 4
- 6
kernel:
- linear
- poly
- rbf
- sigmoid

Custom:
datapath: sample_data

CodeCarbon:
offline_mode: False
measure_power_secs: 25
tracking_mode: process
log_level: error
save_to_file: True
output_dir: output/CodeCarbon
151 changes: 151 additions & 0 deletions general_example.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
# %% [markdown]
# # Example: General Usage
#
# This example shows the general usage of `PyExperimenter`, from creating an experiment configuration file, over the actual execution of (dummy) experiments, to the extraction of experimental results.
#
# To execute this notebook you need to install:
# ```
# pip install py_experimenter
# pip install scikit-learn
# ```

# %% [markdown]
# ## Experiment Configuration File
# This notebook shows an example execution of `PyExperimenter` based on an experiment configuration file. Further explanation about the usage of `PyExperimenter` can be found in the [documentation](https://tornede.github.io/py_experimenter/usage.html).

# %%
from py_experimenter.experimenter import PyExperimenter
from py_experimenter.result_processor import ResultProcessor
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import cross_validate
from sklearn.datasets import load_iris
import numpy as np
import random
import os


def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
seed = parameters['seed']
random.seed(seed)
np.random.seed(seed)

data = load_iris()
# In case you want to load a file from a path
# path = os.path.join(custom_config['path'], parameters['dataset'])
# data = pd.read_csv(path)

X = data.data
y = data.target

model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto'))
result_processor.process_results({
'pipeline': str(model)
})

if parameters['dataset'] != 'iris':
raise ValueError("Example error")

scores = cross_validate(model, X, y,
cv=parameters['cross_validation_splits'],
scoring=('accuracy', 'f1_micro'),
return_train_score=True
)

result_processor.process_results({
'train_f1': np.mean(scores['train_f1_micro']),
'train_accuracy': np.mean(scores['train_accuracy'])
})

result_processor.process_results({
'test_f1': np.mean(scores['test_f1_micro']),
'test_accuracy': np.mean(scores['test_accuracy'])
})


experimenter = PyExperimenter(experiment_configuration_file_path="example.yml", name='example_notebook')


# %%
experimenter.fill_table_from_config()

experimenter.fill_table_with_rows(rows=[
{'dataset': 'error_dataset', 'cross_validation_splits': 3, 'seed': 42, 'kernel': 'linear'}])

# showing database table
experimenter.get_table()

# %% [markdown]
# ### Execute PyExperimenter
# All experiments are executed one after the other by the same `PyExperimenter` due to `max_experiments=-1`. If just a single one or a predifined number of experiments should be executed, the `-1` has to be replaced by the according amount
#
# The first parameter, i.e. `run_ml`, relates to the actual method that should be executed with the given keyfields of the table.

# %%
experimenter.execute(run_ml, max_experiments=-1)

# showing database table
experimenter.get_table()

# %% [markdown]
# ### Restart Failed Experiments
#
# As experiments fail at some time, those experiments were reset for another try with `reset_experiments()`. The `status` describes which table rows should be replace. In this example all failed experiments, i.e. having `status==error`, are reset. Experiments can also be reset based on multiple status by simply passing a list of status, e.g. `experimenter.reset_experiments('error', 'done')`. In that case, all experiments with status 'error' or 'done' will be reset.

# %%
experimenter.reset_experiments('error')

# showing database table
experimenter.get_table()

# %% [markdown]
# After the reset of failed experiments, they can be executed again as described above.

# %%
experimenter.execute(run_ml, max_experiments=-1)

# showing database table
experimenter.get_table()

# %% [markdown]
# ### Generating Result Table
#
#
# The table containes single experiment results. Those can be aggregated, e.g. to generate the mean over all seeds.

# %%
result_table_agg = experimenter.get_table().groupby(['dataset']).mean(numeric_only=True)
result_table_agg

# %% [markdown]
# ### Printing LaTex Table
#
# As `pandas.Dataframe`s can easily be printed as LaTex table, here is an example code for one of the above result columns.

# %%
print(result_table_agg[['test_f1']].style.to_latex())

# %% [markdown]
# ### CodeCarbon
# [CodeCarbon](https://tornede.github.io/py_experimenter/usage/experiment_configuration_file.html#codecarbon) is integrated into `PyExperimenter` to provide information about the carbon emissions of experiments. `CodeCarbon` will create a table with suffix `_codecarbon` in the database, each row containing information about the carbon emissions of a single experiment.

# %%
experimenter.get_codecarbon_table()

# %% [markdown]
# #### Aggregating CodeCarbon Results
#
# The carbon emission information of `CodeCarbon` can be easily aggregated via `pandas.Dataframe`.

# %%
carbon_emissions = experimenter.get_codecarbon_table().groupby(['project_name']).sum(numeric_only=True)
carbon_emissions

# %% [markdown]
# #### Printing CodeCarbon Results as LaTex Table
#
# Furthermore, the resulting `pandas.Dataframe` can easily be printed as LaTex table.

# %%
print(carbon_emissions[['energy_consumed_kw', 'emissions_kg']].style.to_latex())
Loading

0 comments on commit 9fef273

Please sign in to comment.