Skip to content

RohinSequeira/CIFAR10_Image_Recognition

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EVA6_Session7_Advanced_Concepts

Time to try our hands on something more than just digits. How about some cars ... planes ... maybe a few animals here and there? Welcome to our experimentation of Advanced Concepts using CIFAR10 dataset.

Topics

Understanding the CIFAR-10 dataset

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

Here are the classes in the dataset, as well as 10 random images from each:

image

The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, things of that sort. "Truck" includes only big trucks. Neither includes pickup trucks.

Source: https://www.cs.toronto.edu/~kriz/cifar.html

Concept Time!

Dilated Convolution

dilated_convolution

Source: Rohan Shravan

Dilated convolution is a way of increasing the receptive view (global view) of the network exponentially and linear parameter accretion. With this purpose, it finds usage in applications thats care more about integrating the knowledge of the wider context with less cost.

The key application the dilated convolution authors have in mind is a dense prediction:vision applications where the predicted object has a similar size and structure to the input image. For example, semantic segmentation with one label per pixel; image super-resolution, denoising, demosaicing, bottom-up saliency, keypoint detection, etc.

In many such applications one wants to integrate information from different spatial scales and balance two properties:

∙ local, pixel-level accuracy, such as precise detection of edges, and

∙ integrating the knowledge of the wider, global context

image

Source: Rohan Shravan

image

Source: Rohan Shravan

Depthwise Separable Convolution

image

Source: Rohan Shravan

Objectives

  • A GPU based code with Model architecture of C1C2C3C40 (No MaxPooling, but 3 3x3 layers with stride of 2 instead. It would be a bonus if we can figure out how to use Dilated kernels instead of MP or strided convolution)
  • Total Receptive Field of more than 52
  • Two of the layers must use Depthwise Separable Convolution
  • One of the layers must use Dilated Convolution
  • use GAP (compulsory mapped to # of classes):- CANNOT add FC after GAP to target # of classes
  • Use albumentation library and apply:
    • Horizontal flip
    • shiftScaleRotate
    • coarseDropout (max_holes = 1, max_height=16px, max_width=1, min_holes = 1, min_height=16px, min_width=16px, fill_value=(mean of your dataset), mask_fill_value = None)
    • grayscale
  • Minimun 87% Test Accuracy
  • Total Parameters below 100K

Code Structure

Code is split into different modules(as it should be!). If you are looking for the final notebook, you can find it here.

  • dataset contains the code for data downloading, prepping and preprocessing. You can find code related to transformations and augmentations here.

    • dataset.py: Data loading and processing code is here.
  • models will take you to our modelling directory which contains code for our network structure and the training and testing modules.

  • utils has code for our visualization needs.

    • plots.py: Visualization for Train, Test logs and sample images.
  • CIFAR10_Image_Recognition.ipynb is the one notebook to rule them all! To see the final results of experiments.

Logs

Model Summary

image

Training and Validation Loss

image

Training and Validation Accuracy

image

Conclusions and notes

Objectives Achieved

  • A GPU based code with Model architecture of C1C2C3C40 (No MaxPooling, but 3 3x3 layers with stride of 2 instead. It would be a bonus if we can figure out how to use Dilated kernels instead of MP or strided convolution)
    • Dilated Convolution in place of Max Pooling Achieved!
  • Total Receptive Field of more than 52: Receptive Field of 107 achieved
  • Two of the layers must use Depthwise Separable Convolution
  • One of the layers must use Dilated Convolution
  • use GAP (compulsory mapped to # of classes):- CANNOT add FC after GAP to target # of classes
  • Use albumentation library and apply:
    • Horizontal flip
    • shiftScaleRotate
    • coarseDropout (max_holes = 1, max_height=16px, max_width=1, min_holes = 1, min_height=16px, min_width=16px, fill_value=(mean of your dataset), mask_fill_value = None)
    • greyscale
  • Minimun 87% Test Accuracy: Achieved max of 89.35%
  • Total Parameters below 100K: 96,436 Parameters

Notes:

  • In place of Max pooling, we have employed a "Depthwise Convolution" with kernel size of 3 and stride of 2, which reduced the channel size to half.
  • The usage of Depthwise Convolution greatly reduced the number of parameters required as there is only one depth filter for each input channel.

Collaborators

Abhiram Gurijala
Arijit Ganguly
Rohin Sequeira

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 92.1%
  • Python 7.9%