A Minimal Latent Diffusion Model (LDM)

This project implements a latent diffusion model for image generation using PyTorch and the diffusers library. It first pretrains an autoencoder to compress image to latent space, then perform diffusion in the latent space, which can be more efficient than pixel space.

Dependencies

You can install these dependencies using pip:

pip install torch torchvision diffusers tqdm

Usage

Download the dataset

Download celeba images from Google Drive, and extract them to a directory. Please note that the images should be placed in <DATA_ROOT>/<sub_folder_name>/123.jpg, e.g. data_root/celeba/123.jpg, and no other subfolders in the data_root.

You can also use other datasets, just make sure the images are put in one subfolder under data_root.

Training the Autoencoder (takes 1-2 GPU*Day)

The autoencoder needs to be trained first. You can train it using the autoencoder.py script.

python -m torch.distributed.run --nproc_per_node=NUM_GPUS autoencoder.py

Replace NUM_GPUS with the number of GPUs you want to use. Adjust hyperparameters in autoencoder.py as needed. You can debug/run in single GPU by python autoencoder.py

Training the Latent Diffusion Model (takes 2-4 GPU*Day)

Once the autoencoder is trained, you can train the latent diffusion model using latent_diffusion.py.

Set the autoencoder checkpoint path: Update AUTOENCODER_CKPT_PATH in latent_diffusion.py to point to the saved autoencoder checkpoint.
Run the training script:

python -m torch.distributed.run --nproc_per_node=NUM_GPUS latent_diffusion.py

Generating Samples

During training, latent_diffusion.py will periodically generate and save sample images in the current directory. You can monitor these to track the progress of training.

Files

autoencoder.py: Trains the autoencoder.
latent_diffusion.py: Trains the latent diffusion model.
util.py: Contains utility functions for distributed training and seeding.

Notes

This is a minimal implementation of the latent diffusion model, I skipped Adversial Loss during autoencoder pretraining. Parameters are not well tuned, and the model is not trained for a long time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.MD

README.MD

A Minimal Latent Diffusion Model (LDM)

Dependencies

Usage

Download the dataset

Training the Autoencoder (takes 1-2 GPU*Day)

Training the Latent Diffusion Model (takes 2-4 GPU*Day)

Generating Samples

Files

Notes

Files

README.MD

Latest commit

History

README.MD

File metadata and controls

A Minimal Latent Diffusion Model (LDM)

Dependencies

Usage

Download the dataset

Training the Autoencoder (takes 1-2 GPU*Day)

Training the Latent Diffusion Model (takes 2-4 GPU*Day)

Generating Samples

Files

Notes