Skip to content

5.8.0

Compare
Choose a tag to compare
@DailyDreaming DailyDreaming released this 04 Jan 23:01
· 393 commits to master since this release
79792b7

Changelog

Highlighted Features Added

  • Toil server now exposes workflow tasks via WES (#4046).
  • Toil server now has a --wes_dialect agc option that will hide any tasks that don't have Amazon Batch job IDs, and put the IDs in the task names for those that do (#4047).
  • Toil jobs now accept an accelerators requirement, like accelerators=1 or accelerators={'kind': 'gpu', 'brand': 'nvidia', 'count': 2} (#4163)
  • Include total requested cores for each job type in toil stats (#4173)
  • Toil jobs now expose job.accelerators to workflow
  • Add prefix suffix params to AbstractFileStore.getLocalTempFile and AbstractFileStore.getLocalTempFileName (#4273)
  • CWL: --no-compute-checksum, --strict-cpu-limit, --disable-validate, and --fast-parser are now available

Breaking Changes

  • Toil's built-in autoscaler now guesses that some memory and disk space on nodes will not actually be available for jobs; pass --assumeZeroOverhead to revert to the old behavior (#2103)

CWL

  • CWL job unit and display names have been changed to make more sense as task names, and management of them has been unified into a CWLNamedJob. (#4046/#4047)
  • CWL CUDARequirement is parsed by cwltool and turned into a requirement for the minimum requested number of nvidia GPU accelerators (#3982)
  • fix false warning when outputSource contains only one None value (#4300)

Kubernetes

  • KubernetesBatchSystem can add nvidia.com/gpu and amd.com/gpu resource requests for jobs that request those accelerators (#4163)
  • KubernetesBatchSystem can request GPUs by model key, if nodes are labeled appropriately (#4163)

Dependencies

Misc

  • Toil WES server now accepts requests that leave out workflow_params. (#4037)
  • The MessageBus has been expanded to use pypubsub, and now has MessageInbox and MessageOutbox objects to represent connections to it. (#4046/#4047)
  • ToilMetrics now rides on the MessageBus rails. (#4046/#4047)
  • Toil workflows now have a --writeMessages option, which takes a file to which a line-oriented stream of MessageBus messages will be written. Reading this file will allow you to recover the current state of the workflow. (#4046/#4047)
  • Add code for warning check to be used when launching cluster with AWS. (#3514)
  • Use a CI prebake image for gitlab testing. (#4185)
  • Toil clusters now have /var/tmp as the default temporary directory, since they often make large temporary files (#4148)
  • Adds basic testing for slurm using a slurm docker cluster by running sample workflows. (#3856)
  • Add message bus documentation (#4239)
  • SingleMachineBatchSystem can schedule nvidia GPU accelerators, limiting the concurrent jobs to no more than there are accelerators to support, and setting CUDA_VISIBLE_DEVICES in the tasks' environments to tell them which nvidia GPU(s) to use. (#4163)
  • AWSBatchBatchSystem can use AWS Batch's GPU resource to provide nvidia GPU accelerators (#4163)
  • Toil jobs no longer need to re-run after their child/followOn/service jobs in order to delete themselves. (#3188)
  • Message bus is now thread safe (#4276)
  • Docker build has been updated with new Aventer Mesos deb URL (fixes #4290)
  • docker binary in the container has been updated to that included in the Ubuntu repos (fixes #4282)
  • Singularity in the appliance has been updated to 3.10 which is >=3.9, for cgroups v2 support.
  • Base Ubuntu container image for the appliance has been updated to 22.04, which has a new enough libc for Debian's Singularity 3.10 debs.
  • Safer type usage checking for systems without boto3 installed
  • Tests are now more runnable post-installation. Temporary paths are not selected based upon the location of the tests themselves. (#4287)

Bug Fixes

  • Only use /var/run/user if XDG tells us we have it in our session. Otherwise we will try other places, including /run/lock/toil. (#4170)
  • toil destroy-cluster: terminate stopped instances when destroying the cluster (#4271)
  • fileJobStore: handle arbitrary os.link errors to work on some filesystems (#2232)

Thank you to our contributors!