Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ValueError: You selected an invalid strategy name" When DDPStrategy(process_group_backend="gloo") is passed #20526

Open
11philip22 opened this issue Jan 5, 2025 · 1 comment
Labels
bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.4.x

Comments

@11philip22
Copy link

Bug description

When I run this code on Python 3.12.8 with pytorch-lightning 2.4.0 I get a ValueError

What version are you seeing the problem on?

v2.4

How to reproduce the bug

ddp_gloo = DDPStrategy(process_group_backend="gloo")

trainer = Trainer(
    devices=2,
    # devices=1,
    accelerator='gpu',
    strategy=ddp_gloo,
    benchmark=True,
    logger=logger,
    callbacks=[checkpoint_callback, lr_monitor],
    check_val_every_n_epoch=1,
    max_epochs=30,
    # max_epochs=3,
)
trainer.fit(model, data_module)

Error messages and logs

Traceback (most recent call last):
  File "C:\Users\Philip\source\repos\insightface_alignment_lightning\src\train.py", line 59, in <module>
    main()
  File "C:\Users\Philip\source\repos\insightface_alignment_lightning\src\train.py", line 43, in main
    trainer = Trainer(
              ^^^^^^^^
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\utilities\argparse.py", line 70, in insert_env_defaults
    return fn(self, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\trainer\trainer.py", line 395, in __init__
    self._accelerator_connector = _AcceleratorConnector(
                                  ^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py", line 130, in __init__
    self._check_config_and_set_final_flags(
  File "C:\Users\Philip\.conda\envs\lightning\Lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py", line 193, in _check_config_and_set_final_flags
    raise ValueError(
ValueError: You selected an invalid strategy name: `strategy=<lightning.pytorch.strategies.ddp.DDPStrategy object at 0x0000023622FA2240>`. It must be either a string or an instance of `pytorch_lightning.strategies.Strategy`. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai

Environment

Current environment
  • CUDA:
    - GPU:
    - Quadro P6000
    - Quadro P6000
    - available: True
    - version: 12.4
  • Lightning:
    - efficientnet-pytorch: 0.7.1
    - lightning: 2.4.0
    - lightning-utilities: 0.11.9
    - pytorch-lightning: 2.4.0
    - segmentation-models-pytorch: 0.3.5.dev0
    - torch: 2.5.1
    - torchmetrics: 1.6.0
    - torchvision: 0.20.1
  • Packages:
    - absl-py: 2.1.0
    - aiohappyeyeballs: 2.4.4
    - aiohttp: 3.11.11
    - aiosignal: 1.3.2
    - albucore: 0.0.21
    - albumentations: 1.4.23
    - annotated-types: 0.7.0
    - attrs: 24.3.0
    - autocommand: 2.2.2
    - backports.tarfile: 1.2.0
    - brotli: 1.1.0
    - certifi: 2024.12.14
    - cffi: 1.17.1
    - charset-normalizer: 3.4.0
    - colorama: 0.4.6
    - contourpy: 1.3.1
    - cycler: 0.12.1
    - efficientnet-pytorch: 0.7.1
    - eval-type-backport: 0.2.0
    - filelock: 3.16.1
    - fonttools: 4.55.3
    - frozenlist: 1.5.0
    - fsspec: 2024.10.0
    - grpcio: 1.68.1
    - h2: 4.1.0
    - hpack: 4.0.0
    - huggingface-hub: 0.27.0
    - hyperframe: 6.0.1
    - idna: 3.10
    - importlib-metadata: 8.0.0
    - inflect: 7.3.1
    - jaraco.collections: 5.1.0
    - jaraco.context: 5.3.0
    - jaraco.functools: 4.0.1
    - jaraco.text: 3.12.1
    - jinja2: 3.1.4
    - kiwisolver: 1.4.7
    - lightning: 2.4.0
    - lightning-utilities: 0.11.9
    - markdown: 3.7
    - markupsafe: 3.0.2
    - matplotlib: 3.10.0
    - more-itertools: 10.3.0
    - mpmath: 1.3.0
    - multidict: 6.1.0
    - munch: 4.0.0
    - networkx: 3.4.2
    - numpy: 2.2.0
    - opencv-python: 4.10.0.84
    - opencv-python-headless: 4.10.0.84
    - packaging: 24.2
    - pillow: 10.4.0
    - pip: 24.3.1
    - platformdirs: 4.2.2
    - pretrainedmodels: 0.7.4
    - propcache: 0.2.1
    - protobuf: 5.29.2
    - pycocotools: 2.0.8
    - pycparser: 2.22
    - pydantic: 2.10.4
    - pydantic-core: 2.27.2
    - pyparsing: 3.2.0
    - pysocks: 1.7.1
    - python-dateutil: 2.9.0.post0
    - pytorch-lightning: 2.4.0
    - pyyaml: 6.0.2
    - requests: 2.32.3
    - safetensors: 0.5.0
    - scipy: 1.14.1
    - segmentation-models-pytorch: 0.3.5.dev0
    - setuptools: 75.6.0
    - simsimd: 6.2.1
    - six: 1.17.0
    - stringzilla: 3.11.2
    - sympy: 1.13.1
    - tensorboard: 2.18.0
    - tensorboard-data-server: 0.7.2
    - timm: 1.0.12
    - tomli: 2.0.1
    - torch: 2.5.1
    - torchmetrics: 1.6.0
    - torchvision: 0.20.1
    - tqdm: 4.67.1
    - typeguard: 4.3.0
    - typing-extensions: 4.12.2
    - urllib3: 2.2.3
    - werkzeug: 3.1.3
    - wheel: 0.45.1
    - win-inet-pton: 1.1.0
    - yarl: 1.18.3
    - zipp: 3.19.2
    - zstandard: 0.23.0
  • System:
    - OS: Windows
    - architecture:
    - 64bit
    - WindowsPE
    - processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
    - python: 3.12.8
    - release: 10
    - version: 10.0.19045

More info

No response

@11philip22 11philip22 added bug Something isn't working needs triage Waiting to be triaged by maintainers labels Jan 5, 2025
@lantiga
Copy link
Collaborator

lantiga commented Jan 6, 2025

Hey @11philip22 can you show the full imports? I'd like to make sure you're not importing the Trainer and the strategy from different packages, like pytorch_lightning and lightning.

@lantiga lantiga added waiting on author Waiting on user action, correction, or update and removed needs triage Waiting to be triaged by maintainers labels Jan 6, 2025
@lantiga lantiga added repro needed The issue is missing a reproducible example and removed waiting on author Waiting on user action, correction, or update labels Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working repro needed The issue is missing a reproducible example ver: 2.4.x
Projects
None yet
Development

No branches or pull requests

2 participants