We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Notebook - https://www.kaggle.com/code/vigneshwar472/baseline-residualunetse3d Github repo (ResidualUNetSE3D implementation) - https://github.com/wolny/pytorch-3dunet/tree/master
I want to use 2 GPUs simultaneoulsy for training (ddp_notebook strategy) But I do not know, training does not start and 2 GPUs were not in use
I have no idea "why it's not working".
Check the error messages and logs section.
v2.4
Go the Kaggle Notebook https://www.kaggle.com/code/vigneshwar472/baseline-residualunetse3d-train Copy & Edit Run All You will encounter a never ending pause
n = len(folds) for i in range(n): print(f'fold {i} started....') model = ResidualUNetSE3D(in_channels=1, out_channels=6) lm = CZIILightningModule(model=model) logger = CSVLogger(save_dir='/kaggle/working/training_results', name=f'fold_{i}') trainer = Trainer(accelerator='gpu', strategy='ddp_notebook', devices=2, precision='32', gradient_clip_val=None, logger=logger, max_epochs=15, enable_checkpointing=True, enable_progress_bar=True, enable_model_summary=False, inference_mode=True, default_root_dir='/kaggle/working/training_results', num_sanity_val_steps=0) trainer.fit(model=lm, train_dataloaders=DataLoader(folds[i][0], batch_size=1, num_workers=4, shuffle=True), val_dataloaders=DataLoader(folds[i][1], batch_size=1, num_workers=4, shuffle=False)) del model, lm, logger, trainer print(f'fold {i} completed....')
Please go the kaggle notebook and run it
No response
The text was updated successfully, but these errors were encountered:
May be there is some serious issue with strategy='ddp_notebook'
strategy='ddp_notebook'
Sorry, something went wrong.
No branches or pull requests
Bug description
Notebook - https://www.kaggle.com/code/vigneshwar472/baseline-residualunetse3d
Github repo (ResidualUNetSE3D implementation) - https://github.com/wolny/pytorch-3dunet/tree/master
Issue - Training starts on 1x P100 GPU but it does not start on 2x T4 GPU
I want to use 2 GPUs simultaneoulsy for training (ddp_notebook strategy) But I do not know, training does not start and 2 GPUs were not in use
I have no idea "why it's not working".
Check the error messages and logs section.
What version are you seeing the problem on?
v2.4
How to reproduce the bug
Error messages and logs
When used with 2x T4 GPUs
When used with 1x P100 GPU
Environment
Please go the kaggle notebook and run it
More info
No response
The text was updated successfully, but these errors were encountered: