Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

num_labeled in DistributedDataParallel #5

Open
LiheYoung opened this issue Mar 27, 2020 · 9 comments
Open

num_labeled in DistributedDataParallel #5

LiheYoung opened this issue Mar 27, 2020 · 9 comments

Comments

@LiheYoung
Copy link

When using DistributedDataParallel, if N labeled training images and K GPUs are used, should we set num_labeled = N / K instead of N? since np.random.shuffle(idx) generate different idxs in different threads.

@bkj
Copy link

bkj commented Sep 17, 2020

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting.

This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

@bkj
Copy link

bkj commented Sep 18, 2020

In my experiment, performance w/o the seed is substantially better than w/ a seed. I only ran once, so perhaps this is random variation, but I'm guessing this is due to the issue @LiheYoung and I pointed out above.

Screen Shot 2020-09-17 at 8 44 57 PM

Red line is w/o seed, blue w/ seed.

Edit: This is for

python -m torch.distributed.launch --nproc_per_node 4 train.py
    --dataset        cifar10           
    --num-labeled    250               
    --arch           wideresnet        
    --batch-size     16                
    --lr             0.03

@chongruo
Copy link

chongruo commented Oct 12, 2020

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting.
This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

That's right. I need to tell you not to use a seed.

Sorry, a little confused. Should we set the seed?

I think @LiheYoung is correct. With K gpus, the actual number of labeled data is K*N, rather than N. So we should set labeled number to N/K, or set the same seed for all gpus?

@chongruo
Copy link

for i in range(num_classes):
idx = np.where(labels == i)[0]
np.random.shuffle(idx)
labeled_idx.extend(idx[:label_per_class])
unlabeled_idx.extend(idx[:])

For each GPU, the corresponding process will create a CIFAR dataset. Since we don't set the fixed seed, the idx is shuffled (line 104) in different ways on different GPUs, which results in having more labeled samples.

@zhifanwu
Copy link

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting.
This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

That's right. I need to tell you not to use a seed.

Sorry, a little confused. Should we set the seed?

I think @LiheYoung is correct. With K gpus, the actual number of labeled data is K*N, rather than N. So we should set labeled number to N/K, or set the same seed for all gpus?

That's right. If you print the idxs, you will find different idxs are generated K times, so actually the labeled data are K times than you set. So we should set labeled number to N/K, or set the same seed for all gpus, period.

@zhifanwu
Copy link

zhifanwu commented Nov 21, 2020

I think @LiheYoung is correct -- w/ DistributedDataParallel you launch N copies of the program. If you don't set the seed, then np.random will sample the dataset differently, and you end up w/ roughly 4 * N more samples than you're expecting.
This should be easy to test by running w/o a seed and w/ a seed -- if I'm right, using a seed will reduce the performance substantially. Running this experiment now, will report results here.

That's right. I need to tell you not to use a seed.

I think there is a bug in the implementation of DDP, see above discussions, please.

@kekmodel
Copy link
Owner

kekmodel commented Dec 11, 2020

Will it be solved by using a seed?

@moucheng2017
Copy link

for i in range(num_classes):
idx = np.where(labels == i)[0]
np.random.shuffle(idx)
labeled_idx.extend(idx[:label_per_class])
unlabeled_idx.extend(idx[:])

For each GPU, the corresponding process will create a CIFAR dataset. Since we don't set the fixed seed, the idx is shuffled (line 104) in different ways on different GPUs, which results in having more labeled samples.

Is it necessary to use 4GPUs to reproduce the results with 40 labels?

@moucheng2017
Copy link

In my experiment, performance w/o the seed is substantially better than w/ a seed. I only ran once, so perhaps this is random variation, but I'm guessing this is due to the issue @LiheYoung and I pointed out above.

Screen Shot 2020-09-17 at 8 44 57 PM

Red line is w/o seed, blue w/ seed.

Edit: This is for

python -m torch.distributed.launch --nproc_per_node 4 train.py
    --dataset        cifar10           
    --num-labeled    250               
    --arch           wideresnet        
    --batch-size     16                
    --lr             0.03

Hey! I am wondering if you could reproduce the results with 1 GPU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants