From f3ea38c84206f9ef0581d921b277552dc3944683 Mon Sep 17 00:00:00 2001 From: kurman <74940256+kurman@users.noreply.github.com> Date: Wed, 15 Jun 2022 15:58:26 -0700 Subject: [PATCH] torchx/release - 0.2.0 (#522) Summary: Release: 0.2.0 Pull Request resolved: https://github.com/pytorch/torchx/pull/522 Reviewed By: d4l3k Differential Revision: D37184876 Pulled By: kurman fbshipit-source-id: 94d5eb073e0f65207547c89e528899747cbf25dd --- CHANGELOG.md | 56 +++++++++++++++++++++++++++++++++++++++++++++++ torchx/version.py | 2 +- 2 files changed, 57 insertions(+), 1 deletion(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index d81a6a5e9..f7c18da5a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,61 @@ # CHANGELOG +## torchx-0.2.0 + +* Milestone: https://github.com/pytorch/torchx/milestone/4 + +* `torchx.schedulers` + * DeviceMounts + * New mount type 'DeviceMount' that allows mounting a host device into a container in the supported schedulers (Docker, AWS Batch, K8). Custom accelerators and network devices such as Infiniband or Amazon EFA are now supported. + * Slurm + * Scheduler integration now supports "max_retries" the same way that our other schedulers do. This only handles whole job level retries and doesn't support per replica retries. + * Autodetects "nomem" setting by using `sinfo` to get the "Memory" setting for the specified partition + * More robust slurmint script + * Kubernetes + * Support for k8s device plugins/resource limits + * Added "devices" list of (str, int) tuples to role/resource + * Added devices.py to map from named devices to DeviceMounts + * Added logic in kubernetes_scheduler to add devices from resource to resource limits + * Added logic in aws_batch_scheduler and docker_scheduler to add DeviceMounts for any devices from resource + * Added "priority_class" argument to kubernetes scheduler to set the priorityClassName of the volcano job. + * Ray + * fixes for distributed training, now supported in Beta + +* `torchx.specs` + * Moved factory/builder methods from datastruct specific "specs.api" to "specs.factory" module + +* `torchx.runner` + * Renamed "stop" method to "cancel" for consistency. `Runner.stop` is now deprecated + * Added warning message when "name" parameter is specified. It is used as part of Session name, which is deprecated so makes "name" obsolete. + * New env variable TORCHXCONFIG for specified config + +* `torchx.components` + * Removed "base" + "torch_dist_role" since users should prefer to use the `dist.ddp` components instead + * Removed custom components for example apps in favor of using builtins. + * Added "env", "max_retries" and "mounts" arguments to utils.sh + +* `torchx.cli` + * Better parsing of configs from a string literal + * Added support to delimit kv-pairs and list values with "," and ";" interchangeably + * allow the default scheduler to be specified via .torchxconfig + * better invalid scheduler messaging + * Log message about how to disable workspaces + * Job cancellation support via `torchx cancel ` + +`torchx.workspace` + * Support for .dockerignore files used as include lists to fixe some behavioral differences between how .dockerignore files are interpreted by torchx and docker + +* Testing + * Component tests now run sequentially + * Components can be tested with a runner using `components.components_test_base.ComponentTestCase#run_component()` method. + +* Additional Changes + * Updated Pyre configuration to preemptively guard again upcoming semantic changes + * Formatting changes from black 22.3.0 + * Now using pyfmt with usort 1.0 and the new import merging behavior. + * Added script to automatically get system diagnostics for reporting purposes + + ## torchx-0.1.2 Milestone: https://github.com/pytorch/torchx/milestones/3 diff --git a/torchx/version.py b/torchx/version.py index 3cd3dec45..20b824cda 100644 --- a/torchx/version.py +++ b/torchx/version.py @@ -14,7 +14,7 @@ # 0.1.0bN # Beta release # 0.1.0rcN # Release Candidate # 0.1.0 # Final release -__version__ = "0.2.0dev0" +__version__ = "0.2.0" # Use the github container registry images corresponding to the current package # version.