Skip to content

Training with PPON architecture

victorca25 edited this page Jul 24, 2021 · 6 revisions

Note: this document is outdated and needs to be modified to match the current PPON implementation. Refer to PPON training for current information.

Tips:

  • The training strategy on the paper was:

    *COBranch: lr = 2e−4, decreased by the factor of 2 for every 1000 epochs (1.38e+5 iterations)

    *SOBranch: λ = 1e+3, lr = 1e−4 and halved at every 250 epochs (3.45e+4iterations) 34.5k

    *POBranch: η = 5e−3, lr = 1e−4 and halved at every 250 epochs (3.45e+4iterations) 34.5k

    Using pretrained models, this strategy is not efficient, as models will stop changing too fast. The JSON currently has the original parameters to replicate the paper, but a configuration with: , "lr_gamma": 0.9, "lr_step_sizes": [2000, 1000, 1000] can yield better results. Also, its possible to try adding more intermediate restarts inside each phase.

  • The phases configuration is made using the "train_phase", "phase1_s", "phase2_s" and "phase3_s" parameters. The first one can be used to define from which phase to start training (1, 2 or 3), the other three define at what iteration that phase stops training and changes to the next one. If a phase is to be skipped (for example, skipping phase 1), "train_phase" has to be set to the starting phase (ie 2) and the phases to skip (like "phase1_s") to -1. The phase changes should be coordinated with the learning rate restarts with "lr_step_sizes".

  • By default, PPON uses patch size of 192, but it's possible to change to 128 if the VGG model is also changed to 128. A new VGG feature extractor for patch size of 256 is available too.

Comparing with ESRGAN:

With PPON, training is divided in three phases, each builds after the previous and the heavy part is in the first one, because it’s learning most of the feature extraction besides the basic content (pixel) loss calculation. With just that phase, you get results that are the same as training a SR model with BasicSR (no GAN, etc) for the ESRGAN architecture. T

The second phase is much faster than the first (can be twice as fast per iteration compared to the first) and has no equivalent in ESRGAN and I’ve seen it helps correct colors that sometimes change during upscale. It also helps to remove some artefacts and, in theory, has a loss function (MS-SSIM) that should help a lot in denoising tasks, so it can recover the original structure of the image instead of the noise.

The last phase is feature and adversarial loss and it’s when those components are actually activated (instead of having everything at the same time). This last phase adds all the details, features, textures, etc (high frequency information) If you think about it, it’s automating the training of the simple SR model to use as pretrained like they did in the ESRGAN paper to stabilize the GAN training.

Using a pretrained PPON model, in my tests, has meant that all three phases can be trained in under 10k iterations each and after that, it doesn’t change much.

It’s possible to fine-tune the specific phase you want to target. If your pretrained model is similar enough and phase 1 looks fine, you skip that and train only phase 2 and 3. An example of how to set the variables on the JSON to train only phase 3 for 10k iterations would look like: , "train_phase": 3 , "phase1_s": -1 , "phase2_s": -1 , "phase3_s": 10000

You can use different datasets for each phase to experiment. It’s better if all three go in the same general direction, maybe with progressive difficulty, but for example, phase 1 and 2 can be general images and phase 3 only faces to learn the difficult things.

The model can always output images for all three phases (it’s the default behaviour), so you can see exactly what each of them is doing.

Clone this wiki locally