This is a shared repository between Rissalat Ahmed and Saniya Nafees. Our goal here is to create a fullstack web application with payment integration.
more stuff to come
- User uploads the image
- We generate an image caption, which becomes an input to GPT2
- GPT2 generates text based on the input received from the image caption
- Python 3.6
- Torch
- Torchvision
- Scipy Version 1.1.0
From Sagar Vinodbabu's repository I found this great pretrained model with wordmap
Ideally you would create this environment in a fresh anaconda virtual environment
-
Install the dependancies, pretrained model and word map
-
Clone the repo
git clone https://github.com/saniyanafees6/ai-vision.git
- Open Terminal in the folder and enter the following command:
python -W ignore caption.py --img='/path/to/image.jpg' --model='path/to/BEST_checkpoint_coco_5_cap_per_img_5_min_word_freq.pth.tar' --word_map='path/to/WORDMAP_coco_5_cap_per_img_5_min_word_freq.json' --beam_size=5
- more steps to come