Skip to content

Simplified Engine. Enhanced support for distributed configuration on GPUs, XLA devices

Pre-release
Pre-release
Compare
Choose a tag to compare
@vfdev-5 vfdev-5 released this 06 Jun 20:06
· 30 commits to v0.4.0 since this release

PyTorch-Ignite 0.4.0 RC - Release Notes

Core

BC breaking changes

  • Simplified engine - BC breaking change (#940 #939 #938)
    • no more internal patching of torch DataLoader.
    • seed argument of Engine.run is deprecated.
    • previous behaviour can be achieved with DeterministicEngine, introduced in #939.
  • Make all Events be CallableEventsWithFilter (#788).
  • Make ignite compatible only with pytorch >1.0 (#1016).
    • ignite is tested on the latest and nightly versions of pytorch.
    • exact compatibility with previous versions can be checked here.
  • Remove deprecated arguments from BaseLogger (#1051).
  • Deprecated CustomPeriodicEvent (#984).
  • RunningAverage now computes output quantity average instead of a sum in DDP (#991).
  • Checkpoint stores now files with .pt extension instead of .pth (#873).
  • Arguments archived of Checkpoint and ModelCheckpoint are deprecated (#873).
  • Now create_supervised_trainer and create_supervised_evaluator do not move model to device (#910).

New Features and bug fixes

Ignite Distributed [Experimental]

  • Introduction of ignite.distributed as idist module (#1045)
    • common interface for distributed applications and helper methods, e.g. get_world_size(), get_rank(), ...
    • supports native torch distributed configuration, XLA devices.
    • metrics computation works in all supported distributed configurations: GPUs and TPUs.

Engine & Events

  • Add flexibility on event handlers by packing triggering events (#868).
  • Engine argument is now optional in event handlers (#889, #919).
  • We initialize engine.state before calling engine.run (#1028).
  • Engine can run on dataloader based on IterableDataset and without specifying epoch_length (#1077).
  • Added user keys into Engine's state dict (#914).
  • Bug fixes in Engine class (#1048, #994).
  • Now epoch_length argument is optional (#985)
    • suitable to work with finite-unknown-length iterators.
  • Added times in engine.state (#958).

Metrics

  • Add Frequency metric for ops/s calculations (#760, #783, #976).
  • Metrics computation can be customized with introduced MetricUsage (#979, #1054)
    • batch-wise/epoch-wise or customly programmed metric's update and compute methods.
  • Metric can be detached (#827).
  • Fixed bug in RunningAverage when output is torch tensor (#943).
  • Improved computation performance of EpochMetric (#967).
  • Fixed average recall value of ConfusionMatrix (#846).
  • Now metrics can be serialized using dill (#930).
  • Added support for nested metric values (#968).

Handlers and utils

  • Checkpoint : improved filename when score value is Integer (#758).
  • Checkpoint : fix returning worst model of the saved models. (#745).
  • Checkpoint : load_objects can load single object checkpoints (#772).
  • Checkpoint : we now save only one checkpoint per priority (#847).
  • Checkpoint : added kwargs to Checkpoint.load_objects (#861).
  • Checkpoint : now saves model.module.state_dict() for DDP and DP (#1086).
  • Checkpoint and related: other improvements (#937).
  • Support namedtuple for convert_tensor (#740).
  • Added decorator one_rank_only (#882).
  • Update common.py (#904).

Contrib

  • Added FastaiLRFinder (#596).

Metrics

  • Added Roc Curve and Precision/Recall Curve to the metrics (#875).

Parameters scheduling

  • Enabled multi params group for LRScheduler (#1027).
  • Parameters scheduling improvements (#1072, #859).

Support of experiment tracking systems

  • Add NeptuneLogger (#730, #821, #951, #954).
  • Add TrainsLogger (#1020, #1036, #1043).
  • Add WandbLogger (#926).
  • Added visdom_logger to common module (#796).
  • TensorboardX is no longer mandatory if pytorch>=1.2 (#858).
  • Simplified BaseLogger attach APIs (#1006).
  • Added kwargs to loggers' constructors and respective setup functions (#1015).

Time profiling

  • Added basic time profiler to contrib.handlers (#729).

Bug fixes (some of PRs)

  • ProgressBar output not in sync with epoch counts (#773).
  • Fixed ProgressBar.log_message (#768).
  • Progressbar now accounts for epoch_length argument (#785).
  • Fixed broken ProgressBar if data is iterator without epoch length (#995).
  • Improved setup_logger for multiple calls (#962).
  • Fixed incorrect log position (#1099).
  • Added missing colon to logging message (#1101).

Examples

  • Basic example of FastaiLRFinder on MNIST (#838).
  • CycleGAN auto-mixed precision training example with NVidia/Apex or native torch.cuda.amp (#888).
  • Added setup_logger to mnist examples (#953).
  • Added MNIST example on TPU (#956).
  • Benchmark amp on Cifar100 (#917).
  • TrainsLogger semantic segmentation example (#1095).

Housekeeping (some of PRs)


Acknowledgments

🎉 Thanks to our community and all our contributors for the issues, PRs and 🌟 ⭐️ 🌟 !
💯 We really appreciate your implication into the project (in alphabetical order):

@Crissman, @DhDeepLIT, @GabrielePicco, @InCogNiTo124, @itamarwilf, @joxis, @Muhamob, @Yevgnen, @anmolsjoshi, @bendboaz, @bmartinn, @cajanond, @chm90, @cqql, @czotti, @erip, @fdlm, @hoangmit, @Isolet, @jakubczakon, @jkhenning, @kai-tub, @maxfrei750, @michiboo, @mkartik, @sdesrozis, @sisp, @vfdev-5, @willfrey, @xen0f0n, @y0ast, @ykumards