Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to PyTorch #18

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

NotNANtoN
Copy link

There are many issues raised here because of the use of tensorflow and I also struggle to get it to work. Hence I switched from using the TF StyleGAN repo to using https://github.com/NVlabs/stylegan2-ada-pytorch. This should make the usage of this repo much easier. It works quite nicely in my tests and it might be faster (the NVIDIA people claim the pytorch version is faster, I did not benchmark it).

The only issue I see is that the wikiart model is not compatible with the PyTorch repo - at least it throws an error when trying to convert it from TF to torch. I "solved" this by just defaulting back to the TF repo when using wikiart. This is not beautiful, but it should work. Unfortunately, I cannot test this as I am too stupid to set up TF properly with my GPU. So, if anyone can test if wikiart still works in this PR with TF or knows how to convert it to torch that would be great.

@julienbeisel
Copy link

Hi @NotNANtoN, thanks for this work ! I wanted to try to implement it myself but I saw you did it first :)

Which script did you use to convert the pre-trained TF models to PyTorch ? This one looks promising but that's maybe the one you used: https://github.com/rosinality/stylegan2-pytorch#convert-weight-from-official-checkpoints

@NotNANtoN
Copy link
Author

Hi @julienbeisel! I used https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/legacy.py. We could try out to use the rosinality weight conversion for the wikiart weights (or all weights where the legacy.py loading does not work).

@@ -19,7 +19,7 @@
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
],
install_requires=['tensorflow==1.15',
install_requires=['torch',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

click should be added in the dependencies

It is used by dnnlib: https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/legacy.py#L9

@@ -82,14 +96,49 @@ def __init__(self,
self.num_possible_classes = num_possible_classes
self.style_exists = False

# some stylegan models cannot be converted to pytorch (wikiart)
self.use_tf = style in ("wikiart",)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using 'abstract photos' and it was not working as well.

Error:

Loading networks from abstract photos.pkl...
Traceback (most recent call last):
  File "script.py", line 6, in <module>
    L.hallucinate(file_name="song.mp4")
  File "/Users/julienbeisel/Documents/git repos/dev/lucid-sonic-dreams/lucidsonicdreams/main.py", line 702, in hallucinate
    self.stylegan_init()
  File "/Users/julienbeisel/Documents/git repos/dev/lucid-sonic-dreams/lucidsonicdreams/main.py", line 182, in stylegan_init
    self.Gs = self.legacy.load_network_pkl(f)['G_ema'].to(device) # type: ignore
  File "stylegan2/legacy.py", line 26, in load_network_pkl
    G = convert_tf_generator(tf_G)
  File "stylegan2/legacy.py", line 111, in convert_tf_generator
    raise ValueError('TensorFlow pickle version too low')
ValueError: TensorFlow pickle version too low

I am going to try to convert it differently since other models seem to work.

@julienbeisel
Copy link

Hi @julienbeisel! I used https://github.com/NVlabs/stylegan2-ada-pytorch/blob/main/legacy.py. We could try out to use the rosinality weight conversion for the wikiart weights (or all weights where the legacy.py loading does not work).

Ok thanks! I will try to work on it :)

I forked the repo to try to make it work on my laptop. I will make some comments if I think some parts can be improved, I can also make a branch and do a PR later if you want!

@@ -82,14 +96,49 @@ def __init__(self,
self.num_possible_classes = num_possible_classes
self.style_exists = False

# some stylegan models cannot be converted to pytorch (wikiart)
self.use_tf = style in ("wikiart",)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a nice way to handle the cases where TF is compatible and not torch would be to:

1 - Define a variable engine in the LucidSonicDream object
2 - Create a compatibility dictionary (example : compatibility_dict = {'tf' : ['model_1','model_2'], 'torch' : ['model2']})
3 - Raise an error if the model chosen when the LucidSonicDream object is created is not compatible with the engine (tf or torch)

@NotNANtoN
Copy link
Author

Thanks for you feedback! Please do a PR on this PR ;)

It would be best to just convert all TF models to Torch, but I'm not sure if this is easily doable for conditional GANs.

@julienbeisel
Copy link

Alright I will submit a PR once it's done and I'll try to convert the models. I will also try to fix the comments on the PR.

@julienbeisel
Copy link

@NotNANtoN I spent some time working on it and I really couldn't figure out how to convert these models for PyTorch (nothing is working). I guess the only way to make it work would be to re-train them but it's a lot of work...

@NotNANtoN
Copy link
Author

@julienbeisel I assume it's simply not possible with some models unless we know their exact architecture. Maybe you can just make a PR with your changes and then we'll just use PyTorch wherever possible and for some models TF?

At least this is working and some pytorch is better than none imo. Have you tried the batch_size argument? When I tried it it just slowed things down, which I don't really understand. But I'll look into it

@NotNANtoN
Copy link
Author

Major update: increased speed massively by a factor of 5-7. The generation for a 90 seconds long piece of music now takes 4.30 minutes for 60fps, whereas this was at over 20 minutes for 43 fps before.

@MoemaMike
Copy link

the objective of this branch is to allow stylegan TF models work with pytorch? I already have model generated with stylegan2-ada-pytorch. Is there a readymade solution for that use case?

@NotNANtoN
Copy link
Author

@MoemaMike The objective is to completely switch to PyTorch. All pytorch models trained using the NVIDIA-pytorch repositories work with this branch.

There are some TF models that cannot be converted to pytorch atm, in these cases the library just loads them as tensorflow.

Maybe @mikaelalafriz can review and/or merge this branch soonish.

@MoemaMike
Copy link

ok, thanks, looking forward to trying it out on Colab

@Breeze-Zero
Copy link

unfortunately,my progress sometimes have been killed
how to solve it well

@Breeze-Zero
Copy link

and Setting up PyTorch plugin "upfirdn2d_plugin"... Failed warning

@NotNANtoN
Copy link
Author

Your progress has been killed because your RAM is full. Either get more RAM or some immediate saves need to be made in the code.

As for the plugin: either get compatible cuda drivers with your pytorch version of nvcc "nvidia-cuda-toolkit" or you can add a "return False" in approxiamtely line 27 in the file upfird2nd.py in stylegan2/torch_utils/ops to disable the initialization of the pluging.
Not really beautiful, I know.

@timelf123
Copy link

timelf123 commented Jun 7, 2021

Can't get this going on wildlife or my own pytorch trained models

ValueError                                Traceback (most recent call last)

<ipython-input-9-2b5a7926efc3> in <module>()
      8 L.hallucinate(file_name = 'x.mp4',
---> 9               fps = 60)
     10 
     11 files.download("x.mp4")

6 frames

/usr/local/lib/python3.7/dist-packages/PIL/Image.py in frombytes(self, data, decoder_name, *args)
    798 
    799         if s[0] >= 0:
--> 800             raise ValueError("not enough image data")
    801         if s[1] != 0:
    802             raise ValueError("cannot decode image data")

ValueError: not enough image data

@MaxJohnsen
Copy link

Can't get this going on wildlife or my own pytorch trained models

ValueError                                Traceback (most recent call last)

<ipython-input-9-2b5a7926efc3> in <module>()
      8 L.hallucinate(file_name = 'x.mp4',
---> 9               fps = 60)
     10 
     11 files.download("x.mp4")

6 frames

/usr/local/lib/python3.7/dist-packages/PIL/Image.py in frombytes(self, data, decoder_name, *args)
    798 
    799         if s[0] >= 0:
--> 800             raise ValueError("not enough image data")
    801         if s[1] != 0:
    802             raise ValueError("cannot decode image data")

ValueError: not enough image data

This happens when the batch_size are set to 1 (error in line 600, main.py). Try increasing the batch size.

@chloebubble
Copy link

Can't get this going on wildlife or my own pytorch trained models

ValueError                                Traceback (most recent call last)

<ipython-input-9-2b5a7926efc3> in <module>()
      8 L.hallucinate(file_name = 'x.mp4',
---> 9               fps = 60)
     10 
     11 files.download("x.mp4")

6 frames

/usr/local/lib/python3.7/dist-packages/PIL/Image.py in frombytes(self, data, decoder_name, *args)
    798 
    799         if s[0] >= 0:
--> 800             raise ValueError("not enough image data")
    801         if s[1] != 0:
    802             raise ValueError("cannot decode image data")

ValueError: not enough image data

This happens when the batch_size are set to 1 (error in line 600, main.py). Try increasing the batch size.

I also ran into the same issue, I can confirm increasing the batch size resolves it.

@etrh
Copy link

etrh commented Sep 28, 2021

I've tried everything that possibly could and for the past five days I haven't been able to get this to work at all (both the original lucidsonicdreams and @NotNANtoN 's clmr_clip branch) on A100, V100, and RTX 3090 GPUs. Does anyone here have some suggestions to get this to work? I'm seriously out of ideas to try and I'm really confused as to why it's so hard to set this up on Ampere architecture. I've tinkered a lot with the dnnlib/tflib/custom_ops.py as well. Have reinstalled CUDA, etc. Nothing helps!

It works well on Google Colab and on 1070 Ti, but seems impossible to install on RTX 3090, V100, A100.

I've followed this link step-by-step: https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/

I've also tried nvcr.io/nvidia/tensorflow:20.06-tf1-py3 and nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 Docker images and they also haven't been helpful.

@MaxJohnsen
Copy link

MaxJohnsen commented Sep 28, 2021

I've tried everything possible and for the past five days I haven't been able to get this (both the original lucidsonicdreams and @NotNANtoN 's clmr_clip branch) to work on A100 and V100 GPUs. Does anyone here have some suggestions to finally get this to work? I'm seriously out of ideas to try and I'm really confused as to why it's so hard to get to work on Ampere architecture. It works well on Google Colab and on 1070 Ti, but impossible to install on RTX 3090, V100, A100.

I've followed this link step-by-step: https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/

I've also tried nvcr.io/nvidia/tensorflow:20.06-tf1-py3 and nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 Docker images and they also haven't been helpful.

Frustrating. I currently have a working setup on my RTX 3090 running on Ubuntu 20.04.02. Here is some info about my environment, maybe it can point you in the right direction.

Python 3.7

PyTorch:
torch 1.8.1+cu111 torchaudio 0.8.1 torchvision 0.9.1+cu111

Nvidia:
NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2

@Michipulatos
Copy link

Michipulatos commented Oct 12, 2021

Long shot but figured Id sare - having trouble using LSD with a custom class-conditioned pkl trained via the pytorch variation of stylegan2.

Hallucinating...

Generating frames:   0%|                                                                       | 0/7296 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "test.py", line 7, in <module>
    L.hallucinate(file_name = 'song.mp4')
  File "/home/bethos/tryagain/lucid-sonic-dreams/lucidsonicdreams/main.py", line 754, in hallucinate
    self.generate_frames()
  File "/home/bethos/tryagain/lucid-sonic-dreams/lucidsonicdreams/main.py", line 596, in generate_frames
    w_batch = self.Gs.mapping(noise_batch, class_batch.to(device), truncation_psi=self.truncation_psi)
  File "/home/bethos/anaconda3/envs/sonicstylegan-classes/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "<string>", line 222, in forward
  File "stylegan2/torch_utils/misc.py", line 93, in assert_shape
    raise AssertionError(f'Wrong size for dimension {idx}: got {size}, expected {ref_size}')
AssertionError: Wrong size for dimension 1: got 0, expected 20

@fractaldna22
Copy link

Isn't wikiart fundamentally a VQGAN model? That's trained with pytorch out of the box.
config: http://eaidata.bmk.sh/data/Wikiart_16384/wikiart_f16_16384_8145600.yaml
checkpoint: http://eaidata.bmk.sh/data/Wikiart_16384/wikiart_f16_16384_8145600.ckpt

@Chauban
Copy link

Chauban commented Dec 22, 2021

How to use it please? I go into NotNANtoN 's repo and see the introdunction is yet pip install lucidsonicdreams, but it cannot run with pytorch model yet. What should I do? My test environment is Colab. Thanks so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.