Commit everthing to reset pc

tornede · Nov 21, 2023 · 9fef273 · 9fef273
1 parent 4d91cf4
commit 9fef273
Show file tree

Hide file tree

Showing 14 changed files with 607 additions and 115 deletions.
diff --git a/conditional_example.py b/conditional_example.py
@@ -0,0 +1,104 @@
+# %% [markdown]
+# # Example: Conditional Parameter Grids
+#
+# This example shows the usage of `PyExperimenter` with a conditional parameter grid. We will programmatically define the parameter combinations of a support vector machine, instead of generating the entire cartesian product from the parameters defined in the config file.
+#
+# To execute this notebook you need to install:
+# ```
+# pip install py_experimenter
+# pip install scikit-learn
+# ```
+
+# %% [markdown]
+# ## Experiment Configuration File
+# This notebook shows an example execution of `PyExperimenter` based on an experiment configuration file. Further explanation about the usage of `PyExperimenter` can be found in the [documentation](https://tornede.github.io/py_experimenter/usage.html). Here, we only define keyfields and resultfields and do not set the parameter values in the experiment configuration file as we will create the parameter grid programmatically.
+
+import random
+from random import randint
+from time import sleep
+
+import numpy as np
+from sklearn.datasets import load_iris
+from sklearn.model_selection import cross_validate
+from sklearn.pipeline import make_pipeline
+from sklearn.preprocessing import StandardScaler
+from sklearn.svm import SVC
+
+from py_experimenter.experimenter import PyExperimenter
+# %%
+from py_experimenter.result_processor import ResultProcessor
+
+
+def run_svm(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
+    sleep(randint(0, 5))
+    seed = parameters['seed']
+    random.seed(seed)
+    np.random.seed(seed)
+
+    data = load_iris()
+
+    X = data.data
+    y = data.target
+
+    # Create Support Vector Machine with parameters dependent on the kernel
+    kernel = parameters['kernel']
+    if kernel == 'linear':
+        svc = SVC(kernel=parameters['kernel'])
+    elif kernel == 'poly':
+        svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma'], coef0=parameters['coef0'], degree=parameters['degree'])
+    elif kernel == 'rbf':
+        svc = SVC(kernel=parameters['kernel'], gamma=parameters['gamma'])
+
+    svc = SVC()
+
+    model = make_pipeline(StandardScaler(), svc)
+
+    if parameters['dataset'] != 'iris':
+        raise ValueError("Example error")
+
+    scores = cross_validate(model, X, y,
+                            cv=parameters['cross_validation_splits'],
+                            scoring=('accuracy', 'f1_micro'),
+                            return_train_score=True
+                            )
+
+    result_processor.process_results({
+        'train_f1': np.mean(scores['train_f1_micro']),
+        'train_accuracy': np.mean(scores['train_accuracy'])
+    })
+
+    result_processor.process_results({
+        'test_f1': np.mean(scores['test_f1_micro']),
+        'test_accuracy': np.mean(scores['test_accuracy'])})
+
+
+experimenter = PyExperimenter(experiment_configuration_file_path='conditional_example.yml', name="SVM_experimenter_01")
+
+combinations = [{'kernel': 'rbf', 'gamma': gamma, 'degree': None, 'coef0': None} for gamma in ['0.1', '0.3']]
+combinations += [{'kernel': 'poly', 'gamma': gamma, 'degree': degree, 'coef0': coef0}
+                 for gamma in ['0.1', '0.3'] for degree in ['3', '4'] for coef0 in ['0.0', '0.1']]
+combinations += [{'kernel': 'linear', 'gamma': None, 'degree': None, 'coef0': None}]
+
+# Fill experimenter
+experimenter.fill_table_from_combination(parameters={'seed': ['1', '2', '3', '4', '5'],
+                                                     'dataset': ['iris'],
+                                                     'cross_validation_splits': ['5']},
+                                         fixed_parameter_combinations=combinations)
+
+# showing database table
+experimenter.get_table()
+
+# %% [markdown]
+# ### Execute PyExperimenter
+# All experiments are executed one after the other by the same `PyExperimenter` due to `max_experiments=-1`. If just a single one or a predifined number of experiments should be executed, the `-1` has to be replaced by the according amount.
+# The first parameter, i.e. `run_svm`, relates to the actual method that should be executed with the given keyfields of the table.
+
+# %%
+experimenter.execute(run_svm, max_experiments=-1)
+
+# showing database table
+experimenter.get_table()
+
+# %% [markdown]
+# ### CodeCarbon
+# Note that `CodeCarbon` is activated by default, collecting information about the carbon emissions of each experiment. Have a look at our [general usage example](https://tornede.github.io/py_experimenter/examples/example_general_usage.html) and the according [documentation of CodeCarbon fields](https://tornede.github.io/py_experimenter/usage.html#codecarbon-fields) for more information.
diff --git a/conditional_example.yml b/conditional_example.yml
@@ -0,0 +1,25 @@
+PY_EXPERIMENTER:
+  n_jobs: 1
+
+  Database:
+    provider: sqlite
+    database: py_experimenter
+    table: 
+      name: example_conditional_grid
+      keyfields:
+        - dataset
+        - cross_validation_splits: int
+        - seed: int
+        - kernel
+        - gamma: DECIMAL
+        - degree: int
+        - coef0: DECIMAL
+      result_timestamps: false
+      resultfields:
+        - train_f1: DECIMAL
+        - train_accuracy: DECIMAL
+        - test_f1: DECIMAL
+        - test_accuracy: DECIMAL
+
+  CUSTOM:
+    path: sample_data
diff --git a/example.yml b/example.yml
@@ -0,0 +1,46 @@
+PY_EXPERIMENTER:
+  n_jobs: 1
+
+  Database:
+    provider: sqlite
+    database: py_experimenter
+    table: 
+      name: example_general_usage
+      keyfields:
+        - dataset
+        - cross_validation_splits: int
+        - seed: int 
+        - kernel
+      result_timestamps: False
+      resultfields:
+        - pipeline: LONGTEXT
+        - train_f1: DECIMAL
+        - train_accuracy: DECIMAL
+        - test_f1: DECIMAL
+        - test_accuracy: DECIMAL
+
+  Experiments:
+    dataset:
+      - iris
+    cross_validation_splits:
+      - 5
+    seed:
+      - 2
+      - 4
+      - 6
+    kernel:
+      - linear
+      - poly
+      - rbf
+      - sigmoid
+
+  Custom:
+    datapath: sample_data
+
+  CodeCarbon:
+    offline_mode: False
+    measure_power_secs: 25
+    tracking_mode: process
+    log_level: error
+    save_to_file: True
+    output_dir: output/CodeCarbon
diff --git a/general_example.py b/general_example.py
@@ -0,0 +1,151 @@
+# %% [markdown]
+# # Example: General Usage
+#
+# This example shows the general usage of `PyExperimenter`, from creating an experiment configuration file, over the actual execution of (dummy) experiments, to the extraction of experimental results.
+#
+# To execute this notebook you need to install:
+# ```
+# pip install py_experimenter
+# pip install scikit-learn
+# ```
+
+# %% [markdown]
+# ## Experiment Configuration File
+# This notebook shows an example execution of `PyExperimenter` based on an experiment configuration file. Further explanation about the usage of `PyExperimenter` can be found in the [documentation](https://tornede.github.io/py_experimenter/usage.html).
+
+# %%
+from py_experimenter.experimenter import PyExperimenter
+from py_experimenter.result_processor import ResultProcessor
+from sklearn.svm import SVC
+from sklearn.preprocessing import StandardScaler
+from sklearn.pipeline import make_pipeline
+from sklearn.model_selection import cross_validate
+from sklearn.datasets import load_iris
+import numpy as np
+import random
+import os
+
+
+def run_ml(parameters: dict, result_processor: ResultProcessor, custom_config: dict):
+    seed = parameters['seed']
+    random.seed(seed)
+    np.random.seed(seed)
+
+    data = load_iris()
+    # In case you want to load a file from a path
+    # path = os.path.join(custom_config['path'], parameters['dataset'])
+    # data = pd.read_csv(path)
+
+    X = data.data
+    y = data.target
+
+    model = make_pipeline(StandardScaler(), SVC(kernel=parameters['kernel'], gamma='auto'))
+    result_processor.process_results({
+        'pipeline': str(model)
+    })
+
+    if parameters['dataset'] != 'iris':
+        raise ValueError("Example error")
+
+    scores = cross_validate(model, X, y,
+                            cv=parameters['cross_validation_splits'],
+                            scoring=('accuracy', 'f1_micro'),
+                            return_train_score=True
+                            )
+
+    result_processor.process_results({
+        'train_f1': np.mean(scores['train_f1_micro']),
+        'train_accuracy': np.mean(scores['train_accuracy'])
+    })
+
+    result_processor.process_results({
+        'test_f1': np.mean(scores['test_f1_micro']),
+        'test_accuracy': np.mean(scores['test_accuracy'])
+    })
+
+
+experimenter = PyExperimenter(experiment_configuration_file_path="example.yml", name='example_notebook')
+
+
+# %%
+experimenter.fill_table_from_config()
+
+experimenter.fill_table_with_rows(rows=[
+    {'dataset': 'error_dataset', 'cross_validation_splits': 3, 'seed': 42, 'kernel': 'linear'}])
+
+# showing database table
+experimenter.get_table()
+
+# %% [markdown]
+# ### Execute PyExperimenter
+# All experiments are executed one after the other by the same `PyExperimenter` due to `max_experiments=-1`. If just a single one or a predifined number of experiments should be executed, the `-1` has to be replaced by the according amount
+#
+# The first parameter, i.e. `run_ml`, relates to the actual method that should be executed with the given keyfields of the table.
+
+# %%
+experimenter.execute(run_ml, max_experiments=-1)
+
+# showing database table
+experimenter.get_table()
+
+# %% [markdown]
+# ### Restart Failed Experiments
+#
+# As experiments fail at some time, those experiments were reset for another try with `reset_experiments()`. The `status` describes which table rows should be replace. In this example all failed experiments, i.e. having `status==error`, are reset. Experiments can also be reset based on multiple status by simply passing a list of status, e.g. `experimenter.reset_experiments('error', 'done')`. In that case, all experiments with status 'error' or 'done' will be reset.
+
+# %%
+experimenter.reset_experiments('error')
+
+# showing database table
+experimenter.get_table()
+
+# %% [markdown]
+# After the reset of failed experiments, they can be executed again as described above.
+
+# %%
+experimenter.execute(run_ml, max_experiments=-1)
+
+# showing database table
+experimenter.get_table()
+
+# %% [markdown]
+# ### Generating Result Table
+#
+#
+# The table containes single experiment results. Those can be aggregated, e.g. to generate the mean over all seeds.
+
+# %%
+result_table_agg = experimenter.get_table().groupby(['dataset']).mean(numeric_only=True)
+result_table_agg
+
+# %% [markdown]
+# ### Printing LaTex Table
+#
+# As `pandas.Dataframe`s can easily be printed as LaTex table, here is an example code for one of the above result columns.
+
+# %%
+print(result_table_agg[['test_f1']].style.to_latex())
+
+# %% [markdown]
+# ### CodeCarbon
+# [CodeCarbon](https://tornede.github.io/py_experimenter/usage/experiment_configuration_file.html#codecarbon) is integrated into `PyExperimenter` to provide information about the carbon emissions of experiments. `CodeCarbon` will create a table with suffix `_codecarbon` in the database, each row containing information about the carbon emissions of a single experiment.
+
+# %%
+experimenter.get_codecarbon_table()
+
+# %% [markdown]
+# #### Aggregating CodeCarbon Results
+#
+# The carbon emission information of `CodeCarbon` can be easily aggregated via `pandas.Dataframe`.
+
+# %%
+carbon_emissions = experimenter.get_codecarbon_table().groupby(['project_name']).sum(numeric_only=True)
+carbon_emissions
+
+# %% [markdown]
+# #### Printing CodeCarbon Results as LaTex Table
+#
+# Furthermore, the resulting `pandas.Dataframe` can easily be printed as LaTex table.
+
+# %%
+print(carbon_emissions[['energy_consumed_kw', 'emissions_kg']].style.to_latex())