Multi-modal product classification: Image visual features + Image textual fetaures

Product detection as Image classification

From the given image dataset, we trained a 42-class Image classffier with ResNext_101_32x48d model pretrained on Imagenet dataset. Trained model can be downloaded from: https://drive.google.com/drive/u/1/folders/1UVnhpJOe0O5yESD473o43ZUfGF8VoXgx

Shopee Product Detection Challenge

Pre-trained model:

Classification accruacy (42 classees): 86.18%

Prodcut detection as Text classification

Text detection and recognition

Significant images have text which can be identified using optical character recognition We used:

Character-region awareness for text detection (CRAFT-PyTorch): https://github.com/clovaai/CRAFT-pytorch
Deep-text-recognition: https://github.com/clovaai/deep-text-recognition-benchmark

The above two repositories are merged to 1) identify text regions in an image, which futher is used to 2) detect text in the identified regions. The extracted text per image is combined to form a sentence. Doing this for all images will generate a new dataset consisting text (of an image) and its label (image product class).

Text classification

From the given text dataset, we formulate a text classification problem:

Input: Image text
Output: Label representing product class

For this task, we fine-tune a BERT based classifier achieving 55% accuracy. This is significant as compared to random baseline classifier giving ~2.38%.

The pre-final layer BERT features can be downloaded from: https://drive.google.com/drive/u/1/folders/1UVnhpJOe0O5yESD473o43ZUfGF8VoXgx

Vision based Classification

Fusion

The proper fusion of vision and text is expected to increase the performance. Please use both the datasets and see if it helps.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
dataset_ocr		dataset_ocr
README.md		README.md
ResNeXt-101_32x16d.ipynb		ResNeXt-101_32x16d.ipynb
ResNeXt-101_32x16d_submission.ipynb		ResNeXt-101_32x16d_submission.ipynb
ResNeXt-101_32x16d_submission_latest_copy.ipynb		ResNeXt-101_32x16d_submission_latest_copy.ipynb
out.csv		out.csv
resnet.pth		resnet.pth
resnet2.pth		resnet2.pth
vision_classifier.ipynb		vision_classifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-modal product classification: Image visual features + Image textual fetaures

Product detection as Image classification

Prodcut detection as Text classification

Text detection and recognition

Text classification

Vision based Classification

Fusion

About

Releases

Packages

Contributors 2

Languages

dulangaweerakoon/product_detection

Folders and files

Latest commit

History

Repository files navigation

Multi-modal product classification: Image visual features + Image textual fetaures

Product detection as Image classification

Prodcut detection as Text classification

Text detection and recognition

Text classification

Vision based Classification

Fusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages