Time to try our hands on something more than just digits. How about some cars ... planes ... maybe a few animals here and there? Welcome to our experimentation of Advanced Concepts using CIFAR10 dataset.
- Understanding the CIFAR-10 dataset
- Concept Time
- Objectives
- Code Structure
- Logs
- Conclusions and notes
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
Here are the classes in the dataset, as well as 10 random images from each:
The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.
Source: https://www.cs.toronto.edu/~kriz/cifar.html
Source: Rohan Shravan
Dilated convolution is a way of increasing the receptive view (global view) of the network exponentially and linear parameter accretion. With this purpose, it finds usage in applications thats care more about integrating the knowledge of the wider context with less cost.
The key application the dilated convolution authors have in mind is a dense prediction:vision applications where the predicted object has a similar size and structure to the input image. For example, semantic segmentation with one label per pixel; image super-resolution, denoising, demosaicing, bottom-up saliency, keypoint detection, etc.
In many such applications one wants to integrate information from different spatial scales and balance two properties:
∙ local, pixel-level accuracy, such as precise detection of edges, and
∙ integrating the knowledge of the wider, global context
Source: Rohan Shravan
Source: Rohan Shravan
Source: Rohan Shravan
- A GPU based code with Model architecture of C1C2C3C40 (No MaxPooling, but 3 3x3 layers with stride of 2 instead. It would be a bonus if we can figure out how to use Dilated kernels instead of MP or strided convolution)
- Total Receptive Field of more than 52
- Two of the layers must use Depthwise Separable Convolution
- One of the layers must use Dilated Convolution
- use GAP (compulsory mapped to # of classes):- CANNOT add FC after GAP to target # of classes
- Use albumentation library and apply:
- Horizontal flip
- shiftScaleRotate
- coarseDropout (max_holes = 1, max_height=16px, max_width=1, min_holes = 1, min_height=16px, min_width=16px, fill_value=(mean of your dataset), mask_fill_value = None)
- grayscale
- Minimun 87% Test Accuracy
- Total Parameters below 100K
Code is split into different modules(as it should be!). If you are looking for the final notebook, you can find it here.
-
dataset contains the code for data downloading, prepping and preprocessing. You can find code related to transformations and augmentations here.
- dataset.py: Data loading and processing code is here.
-
models will take you to our modelling directory which contains code for our network structure and the training and testing modules.
-
utils has code for our visualization needs.
- plots.py: Visualization for Train, Test logs and sample images.
-
CIFAR10_Image_Recognition.ipynb is the one notebook to rule them all! To see the final results of experiments.
- A GPU based code with Model architecture of C1C2C3C40 (No MaxPooling, but 3 3x3 layers with stride of 2 instead. It would be a bonus if we can figure out how to use Dilated kernels instead of MP or strided convolution)
- Dilated Convolution in place of Max Pooling Achieved!
- Total Receptive Field of more than 52: Receptive Field of 107 achieved
- Two of the layers must use Depthwise Separable Convolution
- One of the layers must use Dilated Convolution
- use GAP (compulsory mapped to # of classes):- CANNOT add FC after GAP to target # of classes
- Use albumentation library and apply:
- Horizontal flip
- shiftScaleRotate
- coarseDropout (max_holes = 1, max_height=16px, max_width=1, min_holes = 1, min_height=16px, min_width=16px, fill_value=(mean of your dataset), mask_fill_value = None)
- greyscale
- Minimun 87% Test Accuracy: Achieved max of 89.35%
- Total Parameters below 100K: 96,436 Parameters
- In place of Max pooling, we have employed a "Depthwise Convolution" with kernel size of 3 and stride of 2, which reduced the channel size to half.
- The usage of Depthwise Convolution greatly reduced the number of parameters required as there is only one depth filter for each input channel.
Abhiram Gurijala
Arijit Ganguly
Rohin Sequeira