-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
num_labeled in DistributedDataParallel #5
Comments
I think @LiheYoung is correct -- w/ This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here. |
In my experiment, performance w/o the seed is substantially better than w/ a seed. I only ran once, so perhaps this is random variation, but I'm guessing this is due to the issue @LiheYoung and I pointed out above. Red line is w/o seed, blue w/ seed. Edit: This is for
|
Sorry, a little confused. Should we set the seed? I think @LiheYoung is correct. With K gpus, the actual number of labeled data is K*N, rather than N. So we should set labeled number to N/K, or set the same seed for all gpus? |
FixMatch-pytorch/dataset/cifar.py Lines 102 to 106 in 10db592
For each GPU, the corresponding process will create a CIFAR dataset. Since we don't set the fixed seed, the idx is shuffled (line 104) in different ways on different GPUs, which results in having more labeled samples. |
That's right. If you print the idxs, you will find different idxs are generated K times, so actually the labeled data are K times than you set. So we should set labeled number to N/K, or set the same seed for all gpus, period. |
I think there is a bug in the implementation of DDP, see above discussions, please. |
Will it be solved by using a seed? |
Is it necessary to use 4GPUs to reproduce the results with 40 labels? |
Hey! I am wondering if you could reproduce the results with 1 GPU? |
When using DistributedDataParallel, if N labeled training images and K GPUs are used, should we set num_labeled = N / K instead of N? since np.random.shuffle(idx) generate different idxs in different threads.
The text was updated successfully, but these errors were encountered: