Skip to content

Latest commit

 

History

History
133 lines (94 loc) · 8.59 KB

README.md

File metadata and controls

133 lines (94 loc) · 8.59 KB

SegNet and Bayesian SegNet Tutorial

This repository contains all the files for you to complete the 'Getting Started with SegNet' and the 'Bayesian SegNet' tutorials here: http://mi.eng.cam.ac.uk/projects/segnet/tutorial.html

Please note that if following this instruction set, that the folder names have been modified.

Caffe-SegNet

SegNet requires a modified version of Caffe to run. Please see the caffe-segnet-cudnn7 submodule within this repository, and follow the installation instructions.

Getting Started

To start, you can use the scripts/inference/segnet_inference.py script. It is recommended to use this with the models/inference/SegNet/CityScapes/segnet_cityscapes.prototxt model, and Timo Sämann's trained weights, which are available for download here.

The inference script can be used as follows:

python scripts/inference/inference.py models/inference/SegNet/CityScapes/segnet_cityscapes.prototxt \
/PATH/TO/segnet_iter_30000_timo.caffemodel data/test_segmentation.avi [--cpu]

If the --cpu flag is set, then inference will be run on your CPU instead of your GPU. This is not recommended, unless you don't have a GPU.

The script uses OpenCV's VideoCapture to parse the data. An example video file has been provided for testing, data/test_segmentation.avi.

The easiest way to specify your own segmentation data is via a video file, such as an .mp4 or .avi. Else, you must be sure to specify a folder of images with the format required for VideoCapture.

Example Models

A number of example models for indoor and outdoor road scene understanding can be found in the SegNet Model Zoo.

Training

Getting training data from Cityscapes

Cityscapes is a dataset that can be used to train SegNet/Bayesian SegNet, but a few steps must be done first. You can download the dataset here and the Cityscape scripts repo here. Once downloaded, follow these steps:

  1. Edit /cityscapesScripts/cityscapescripts/helpers/labels.py to contain the classes you want to train on.
  2. Set the CITYSCAPES_DATASET environment variable to wherever you downloaded the Cityscapes dataset.
  3. Run python /cityscapesScripts/cityscapescripts/preparation/createTrainIdLabelImgs.py to create the labeled images.
  4. Once the script is completed you should have a ${CITYSCAPES_DATASET}/gtFine/*/labelTrainIds.png images created.

Preprocessing the data

For convenience, the data generated by the Cityscapes scripts must be preprocessed. This is done by running python scripts/data_prep/preprocessor.py

Within this script, there are several parameters that must be modified. Please open the file and verify the values of these parameters.

  • IMG_PATH: Path to the images created from the previous step. The first directory in the array is all the raw images and the second directory is the ground truth
  • OUT_PATH: The directory where the processed images will reside. NOTE: You must create all the directories yourself. If you creating a training set, you must also create the subdirectories <OUT_PATH>/train/ and <OUT_PATH>/trainannot/. Likewise, if you are creating a validation set you need <OUT_PATH>/val/ and <OUT_PATH>/valannot/
  • DATA_TYPE: The type of data you are processing. Choose from train and val.
  • TXT_PATH: Text file location which contains all the location of all the processed images. This is needed for Caffe.
  • RESIZE_IMGS: Flag to resize images.
  • WIDTH, HEIGHT: Desired width and height of the processed images.
  • INTERPOLATION: Type of interpolation done for the actual images and the ground truth images.
  • CROP_TO_ASPECT_RATIO: Crops input images to the aspect ratio of the wanted image
  • CROP_HEIGHT_POSITION: Where to start the vertical crop. Options are top, middle, and bottom
  • CROP_WIDTH_POSITION: Where to start the horizontal crop. Options are left, middle, and right

You can set these parameters directly in the file or using command line arguments. Your training and test set should be processed using this script. NOTE: For the training processing, make sure to note down/save the class weights that are printed at the end of the function. These will be needed for your training .prototxt file.

Configuring .prototxt files

To complete training, you must have a solver.prototxt and a train.prototxt. If you are performing inference with your generated model on the validation set, you will need a test.prototxt as well. Here are some things to look out for when configuring these files.

solver.prototxt

  • snapshot_prefix: This is not the directory of the snapshots. Your snapshots will look like <snapshot_prefix>_iter_10.caffemodel.

train.prototxt

  • Couple notes on python_param.param_str
    • data_dirs: locations of the datasets you preprocessed
    • data_proportions: proportion of data you want to save for testing
    • batch_size: number of images trained per iteration of stochastic gradient descent
  • Make sure the number of classes you are training on match the number of layers the softmax layer/last convolution has.
  • Remember those class weights that you remembered to save? Well you need to input them in your loss layer as class weightings.

test.prototxt

  • The first layer's param dense_image_data_param.source should be the txt file of your validation set

Training your models

scripts/training/train_config.ini is where you can specify all the models you want to train.

  • Solvers: Solver file (.prototxt file)
  • Init_Weights: Initial weights of the model (.caffemodel file)
  • Inference_Weights: Final inference weights of the model once training is completed
  • Solverstates: Load solver state (.solverstate file). This allows for you to begin training in an intermediate step. Typically blank.
  • Test_Models: Test model that is run on snapshots while training is done in parallel (.prototxt file)
  • Test_Images: File where all the test images are stored. This should be generated by the preprocessing step (.txt file)
  • Log_Dirs: Directory where the logs are stored. NOTE: Logging currently only works for testing

Once all your trained models are in the ini file, you can run python train_and_test.py with the following arguments:

  • --config: Location of ini file
  • --run_inference: Flag to test in parallel while training the model. Will only occur for each snapshot created
  • --train_gpu: GPU ID of where the training will occur. Only matters if you have multiple GPUs.
  • --test_gpu: GPU ID of where the testing will occur. Only matters if you multiple GPUs.

Congrats! Your model should be training.

Things to keep in mind

Testing in parallel with training If you are testing at the same time as training, make sure the intervals between snapshots are big enough. Computing the batch norm statistics for each snapshot takes time (around 10 - 20 minutes) so if the snapshots are too frequent, the script may queue many inferences and potentially blow up.

Error: Did not match C++ signature If you notice an error that looks like the following:

Boost.Python.ArgumentError: Python argument types in
Net.__init__(Net, str, str, int)
did not match C++ signature:
 __init__(boost::python::api::object, std::string, std::string, int)
__init__(boost::python::api::object, std::string, int)

You usually need to wrap your strings inputted to Caffe functions with str(). This is only an issue with Python 2 which is explained in more detail here.

Publications

For more information about the SegNet architecture:

http://arxiv.org/abs/1511.02680 Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding." arXiv preprint arXiv:1511.02680, 2015.

http://arxiv.org/abs/1511.00561 Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." PAMI, 2017.

License

This software is released under a creative commons license which allows for personal and research use only. For a commercial license please contact the authors. You can view a license summary here: http://creativecommons.org/licenses/by-nc/4.0/

Contact

Alex Kendall

[email protected]

Cambridge University