Skip to content

Inputting an image to GPT2 model to generate text

License

Notifications You must be signed in to change notification settings

kukuhaza/ai-vision

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Vision

This is a shared repository between Rissalat Ahmed and Saniya Nafees. Our goal here is to create a fullstack web application with payment integration.

Our Idea

more stuff to come

  • User uploads the image
  • We generate an image caption, which becomes an input to GPT2
  • GPT2 generates text based on the input received from the image caption

Dependancies

  • Python 3.6
  • Torch
  • Torchvision
  • Scipy Version 1.1.0

Pretrained Model and Word Map

From Sagar Vinodbabu's repository I found this great pretrained model with wordmap

Getting Started

Ideally you would create this environment in a fresh anaconda virtual environment

  1. Install the dependancies, pretrained model and word map

  2. Clone the repo

git clone https://github.com/saniyanafees6/ai-vision.git

  1. Open Terminal in the folder and enter the following command:

python -W ignore caption.py --img='/path/to/image.jpg' --model='path/to/BEST_checkpoint_coco_5_cap_per_img_5_min_word_freq.pth.tar' --word_map='path/to/WORDMAP_coco_5_cap_per_img_5_min_word_freq.json' --beam_size=5

  1. more steps to come

Credits

Github Repositories

Research Papers

About

Inputting an image to GPT2 model to generate text

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%