Skip to content

Commit

Permalink
Add SSH connection as option to the database credentials file (#169)
Browse files Browse the repository at this point in the history
  • Loading branch information
LukasFehring authored Feb 20, 2024
1 parent acf09a6 commit 8eca12b
Show file tree
Hide file tree
Showing 21 changed files with 1,708 additions and 1,185 deletions.
12 changes: 8 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -137,14 +137,18 @@ dmypy.json

# todo
todo.md
config/database_credentials.cfg
config/example*.cfg

# Configs
config/database_credentials.yml
config/example_conditional_grid.yml
config/example_general_usage.yml
config/example_logtables.yml
config/example_pause_and_continue.yml
output/

# codecarbon
.codecarbon.config
emissions.csv
config/*.yml

# development folder
development/
development/
4 changes: 3 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@ v1.4.0 (??.??.2024)

Feature
-------
- Changed supported experiment configuration file type to YAML.
- Change the supported database configuration file type to YAML.
- Change the supported credentials file type to YAML.
- Add support for ssh jump hosts in the database connection.


v1.3.2 (23.01.2024)
Expand Down
4 changes: 4 additions & 0 deletions config/database_credentials.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[CREDENTIALS]
host=apollo.ai.uni-hannover.de
user=testuser_pyexperimenter
password=c2ncKK3siSBkCuGE
4 changes: 0 additions & 4 deletions config/example_database_credentials.cfg

This file was deleted.

16 changes: 16 additions & 0 deletions config/example_database_credentials.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
CREDENTIALS:
Database:
user: example_user
password: example_password
Connection:
Standard:
server: example.mysqlserver.com
Ssh:
server: example.sshmysqlserver.com (address from ssh server)
address: example.sslserver.com
port: optional_ssh_port
remote_address: optional_mysql_server_address
remote_port: optional_mysql_server_port
local_address: optional_local_address
local_port: optional_local_port
passphrase: optional_ssh_passphrase
43 changes: 36 additions & 7 deletions docs/source/usage/database_credential_file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,42 @@
Database Credential File
------------------------

When working with ``MySQL`` as a database provider, an additional database credential file is needed, containing the credentials for accessing the database:
When working with ``MySQL`` as a database provider, an additional database credential file is needed, containing the credentials for accessing the database.
By default, this file is located at ``config/database_credentials.yml``. If this is not the case, the corresponding path has to be explicitly given when :ref:`executing <execution>` ``PyExperimenter``.
Below is an example of a database credential file, that connects to a server with the address ``example.mysqlserver.com`` using the user ``example_user`` and the password ``example_password``.

.. code-block::
.. code-block:: yaml
[CREDENTIALS]
host = <host>
user = <user>
password = <password>
CREDENTIALS:
Database:
user: example_user
password: example_password
Connection:
Standard:
server: example.mysqlserver.com
By default, this file is located at ``config/database_credentials.cfg``. If this is not the case, the corresponding path has to be explicitly given when :ref:`executing <execution>` ``PyExperimenter``.
However, for security reasons, databases might only be accessible from a specific IP address. In these cases, one can use an ssh jumphost. This means that ``PyExperimenter`` will first connect to the ssh server
that has access to the database and then connect to the database server from there. This is done by adding an additional ``Ssh`` section to the database credential file.
The following example shows how to connect to a database server using an SSH server with the address ``ssh_hostname`` and the port ``optional_ssh_port``.

.. code-block:: yaml
CREDENTIALS:
Database:
user: example_user
password: example_password
Connection:
Standard:
server: example.sshmysqlserver.com
Ssh:
server: example.mysqlserver.com (address from ssh server)
address: ssh_hostname (either name/ip address of the ssh server or a name from you local ssh config file)
port: optional_ssh_port (default: 22)
passphrase: passphrase
remote_address: optional_mysql_server_address (default: 127.0.0.1)
remote_port: optional_mysql_server_port (default: 3306)
local_address: optional_local_address (default: 127.0.0.1)
local_port: optional_local_port (default: 3306)
.. note::
Note that we do not support further parameters for the SSH connection, such as explicitly setting the private key file. To use these, you have to adapt your local ssh config file.
38 changes: 26 additions & 12 deletions docs/source/usage/execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Executing PyExperimenter

The actual execution of ``PyExperimenter`` only needs a few lines of code. Please make sure that you have created the :ref:`experiment configuration file <experiment_configuration_file>` and defined the :ref:`experiment function <experiment_function>` beforehand.

.. code-block::
.. code-block:: python
from py_experimenter.experimenter import PyExperimenter
Expand All @@ -24,22 +24,22 @@ Creating a PyExperimenter

A ``PyExperimenter`` can be created without any further information, assuming the :ref:`experiment configuration file <experiment_configuration_file>` can be accessed at its default location.

.. code-block::
.. code-block:: python
experimenter = PyExperimenter()
Additionally, further information can be given to ``PyExperimenter``:

- ``experiment_configuration_file_path``: The path of the :ref:`experiment configuration file <experiment_configuration_file>`. Default: ``config/experiment_configuration.cfg``.
- ``database_credential_file_path``: The path of the :ref:`database credential file <database_credential_file>`. Default: ``config/database_credentials.cfg``
- ``use_ssh_tunnel``: Specifies if a SSH tunnel will be used to connect to the database. Default: ``False``. If ``use_ssh_tunnel`` is set to ``True``, creating a ``PyExperimenter`` will also open an ssh tunnel, which should be :ref:`closed manually <close_ssh_tunnel>`. The details of the ssh-connection have to be specified in the :ref:`database credential file <database_credential_file>`.
- ``database_name``: The name of the database to manage the experiments. If given, it will overwrite the database name given in the `experiment_configuration_file_path`.
- ``table_name``: The name of the database table to manage the experiments. If given, it will overwrite the table name given in the `experiment_configuration_file_path`.
- ``use_codecarbon``: Specifies if :ref:`CodeCarbon <experiment_configuration_file_codecarbon>` will be used to track experiment emissions. Default: ``True``.
- ``name``: The name of the experimenter, which will be added to the database table of each executed experiment. If using the PyExperimenter on an HPC system, this can be used for the job ID, so that the according log file can easily be found. Default: ``PyExperimenter``.
- ``logger_name``: The name of the logger, which will be used to log information about the execution of the PyExperimenter. If there already exists a logger with the given ``logger_name``, it will be used instead. However, the ``log_file`` will be ignored in this case. The logger will then be passed to every component of ``PyExperimenter``, so that all information is logged to the same file. Default: ``py-experimenter``.
- ``log_level``: The log level of the logger. Default: ``INFO``.
- ``log_file``: The path of the log file. Default: ``py-experimenter.log``.

- ``log_file``: The path of the log file. Default: ``py-experimenter.log``.

-------------------
Fill Database Table
Expand All @@ -59,7 +59,7 @@ Fill Table From Experiment Configuration File

The database table can be filled with the cartesian product of the keyfields defined in the :ref:`experiment configuration file <experiment_configuration_file>`.

.. code-block::
.. code-block:: python
experimenter.fill_table_from_config()
Expand All @@ -72,7 +72,7 @@ Fill Table With Specific Rows

Alternatively, or additionally, specific rows can be added to the table. Note that ``rows`` is a list of dicts, where each dict has to contain a value for each keyfield. A more complex example featuring a conditional experiment grid can be found in the :ref:`examples section <examples>`.

.. code-block::
.. code-block:: python
experimenter.fill_table_with_rows(rows=[
{
Expand All @@ -97,7 +97,7 @@ Execute Experiments

An experiment can be executed easily with the following call:

.. code-block::
.. code-block:: python
experimenter.execute(
experiment_function = run_experiment,
Expand All @@ -117,7 +117,7 @@ Reset Experiments

Each database table contains a ``status`` column, summarizing the current state of an experiment. Experiments can be reset based on these states. If this is done, the table rows having a given status will be deleted, and corresponding new rows without results will be created. A comma separated list of ``status`` has to be provided.

.. code-block::
.. code-block:: python
experimenter.reset_experiments(<status>, <status>, ...)
Expand All @@ -138,7 +138,7 @@ Obtain Results

The current content of the database table can be obtained as a ``pandas.DataFrame``. This can, for example, be used to generate a result table and export it to LaTeX.

.. code-block::
.. code-block:: python
result_table = experimenter.get_table()
result_table = result_table.groupby(['dataset']).mean()[['seed']]
Expand All @@ -164,11 +164,11 @@ Tracking information about the carbon footprint of experiments is supported via
Pausing and Unpausing Experiments
---------------------------------

For convenience, we support pausing and unpausing experiments. This means that you can use one ``PyExperimenter`` to start an experiment, which will be paused after certain operations. Therefore, it can be resumed later on. Afterwards, depending on the parametrization of ``execute()`` of the ``PyExperimenter`` instance (see :ref:`asdf <execute_experiments:>`), the experimenter terminates or another experiment will be started.
For convenience, we support pausing and unpausing experiments. This means that you can use one ``PyExperimenter`` to start an experiment, which will be paused after certain operations. Therefore, it can be resumed later on. Afterwards, depending on the parametrization of ``execute()`` of the ``PyExperimenter`` instance (see :ref:`in Execute Experiments <execute_experiments>`), the experimenter terminates or another experiment will be started.

To pause an experiment, the experiment function has to return the state ``ExperimentStatus.PAUSED``:

.. code-block::
.. code-block:: python
def run_experiment_until_pause(keyfields: dict, result_processor: ResultProcessor, custom_fields: dict):
# do something
Expand All @@ -187,7 +187,7 @@ To pause an experiment, the experiment function has to return the state ``Experi
At a later point in time, the experiment can be unpaused and continued. This can be done by calling ``unpause_experiment()`` on ``PyExperimenter`` instance given the specific ``experiment_id`` of the experiment to continue, together with a separate experiment function, which only contains experiment code to be executed after the pause. Note that only a single ``experiment_id`` can be executed at the same time, i.e. there is no parallelization of unpausing multiple ``experiment_id`` supported.

.. code-block::
.. code-block:: python
def run_experiment_after_pause(keyfields: dict, result_processor: ResultProcessor, custom_fields: dict):
# do something
Expand All @@ -201,3 +201,17 @@ At a later point in time, the experiment can be unpaused and continued. This can
A complete example on how to pause and continue an experiment can be found in the :ref:`examples section <examples>`.



.. _close_ssh_tunnel:

----------------
Close SSH Tunnel
----------------

If an SSH tunnel was opened during the creation of the ``PyExperimenter``, it has to be closed manually by calling the following method:

.. code-block:: python
experimenter.execute(...)
experimenter.close_ssh_tunnel()
6 changes: 3 additions & 3 deletions docs/source/usage/experiment_function.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Experiment Function

The execution of a single experiment has to be defined within a function. The function is called with the ``keyfields`` values of a database entry. The results are meant to be processed to be written into the database, i.e. as ``resultfields``. During the experiment different information can be logged into ``logtables``.

.. code-block::
.. code-block:: python
import os
from py_experimenter.result_processor import ResultProcessor
Expand Down Expand Up @@ -58,7 +58,7 @@ Push Data To Resultfields

``Resultfields`` can be filled any time during the execution process by calling the following code within your experiment function, e.g. ``run_ml``. Note that a resultfield is meant to be written once, if you re-write a resultfield, the old value will be overwritten. Furthermore note that you do not have to write all resultfields at once, but can also only write a subset as demonstrated in the example above. Multiple in-depth examples showcasing the usage of resultfields can be found within the :ref:`examples section <examples>`.

.. code-block::
.. code-block:: python
result_processor.process_results({
'<resultfield_name>': <resultfield_value>,
Expand All @@ -75,7 +75,7 @@ Push Data To Logtables

``Logtables`` can be filled any time during the execution process by calling the following code within your experiment function, e.g. ``run_ml``. An in-depth example showcasing the usage of logtables can be found within the :ref:`examples section <examples>`.

.. code-block::
.. code-block:: python
result_processor.process_logs({
'<logtable_name>': {
Expand Down
Loading

0 comments on commit 8eca12b

Please sign in to comment.