Download the dataset

Team code repository for image caption task

Download the dataset

Download the dataset via:

https://challenger.ai/competition/caption/subject

Alternative link:

https://drive.google.com/open?id=0ByB0MjjNghlyNkdhR3lIZGJneGM

Setting up the data path

Use link to point to the data directory

ln -s [the-path-to-data] data

Now the relative path of training, validation, testing data (with annotations) should be like:

data/ai_challenger_caption_train_20170902/caption_train_images_20170902/*.jpg
data/ai_challenger_caption_train_20170902/caption_train_annotations_20170902.json

data/ai_challenger_caption_validation_20170910/caption_validation_images_20170910/*.jpg
data/ai_challenger_caption_validation_20170910/caption_validation_annotations_20170910.json

data/ai_challenger_caption_test1_20170923/caption_test1_images_20170923/*.jpg

Generate the tfrecord files

If there are no TFRecord_data in directory data, it is likely the the tfrecord files have not been generated yet, run the script (after setting up the data path):

bash scripts/build_tfrecords.sh

to create the tfrecord files. It may take 20-30 minutes to create all the files.

Because of Data Augmentation, we modify the build_tfrecords.py. There are 2 things to mention:

If the train_captions_file contains key "flip_caption", the code will generate feature "flip_caption" to tfrecords to support image filpping.
We need a vocabulary to generate caption ids. we can build the vocab from the training set (keep the "word_counts_input_file" flag empty and set a value for the "word_counts_output_file" flag) or load an existing vocab (set a value for the "word_counts_input_file" flag).

You can find some help by comparing "scripts/build_tfrecords.sh" and "scripts/new_build_tfrecords.sh".

Training

First, you need to get the inception v3 network checkpoint, goto directory pretrained_model/inception_v3, run:

bash get_model.sh

It will automatically download the checkpoint.

Then, you can run bash train.sh for baseline.

You can also create another training script with a different configuration.

Inference

You will need GNU parallel. You can install GNU paralle by running sudo apt-get install parallel.

Set the model name and checkpoint number in inference.sh and run:

bash inference.sh

It will show the path to which it output the json.

You may want to change num_processes and gpu_fraction to fit your GPU memory. You may see CUDA error if the number of processes is too large.

Validate

Before validating your json result, you need to generate reference json file. Run

bash scripts/build_reference_file.sh

Then, set the model name and checkpoint number in eval.sh and run:

bash eval.sh

It will show and save the metrics.

Preparation for reranking

To generate reranking dataset, write the relative model paths intto resources/inference_all.list, and run the dataset builder script.

bash -x scripts/ranker_build_dataset.sh

The resulting tfrecord files will be saved in data/Ranker_TFRecord_data/[sha1sum-of-model-list]/.

After generating the dataset, you can run the evaluating script.

bash -x scripts/ranker_evaluate_oracle.sh

This script will estimate the upper limit (may not be a tight upper bound) of reranking on VALIDATE set.

Use lexical embedding

In ShowAndTellAdvancedModel, you can augment word embedding with the embedding of the pos-tag of the word. The mapping between the word and the pos-tag should be generated beforehand via the following command:

bash scripts/build_postag_dict.sh

This will take roughly an hour.

Important notice

Commit as soon as your branch is merged with origin/master and tested, beware of silent merge conflict.
Commit from where you edit. DO NOT edit on windows, transmit to linux, and commit on linux (or vice versa) as it will cause different line ending.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Team code repository for image caption task

Download the dataset

Setting up the data path

Generate the tfrecord files

Training

Inference

Validate

Preparation for reranking

Use lexical embedding

Important notice

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 328 Commits
eval_scripts		eval_scripts
im2txt		im2txt
inference2_scripts		inference2_scripts
model		model
pretrained_model/inception_v3		pretrained_model/inception_v3
ranker		ranker
resources		resources
scripts		scripts
tools		tools
voting-test2		voting-test2
.gitignore		.gitignore
.vimrc		.vimrc
LICENSE		LICENSE
README.md		README.md
advanced-scheduled_sampling-finetune.sh		advanced-scheduled_sampling-finetune.sh
advanced-scheduled_sampling-train.sh		advanced-scheduled_sampling-train.sh
advanced-semantic-attention-finetune-with-decay.sh		advanced-semantic-attention-finetune-with-decay.sh
advanced-semantic-attention-topk-2lexical-eval.sh		advanced-semantic-attention-topk-2lexical-eval.sh
advanced-semantic-attention-topk-2lexical-train.sh		advanced-semantic-attention-topk-2lexical-train.sh
advanced-semantic-attention-train.sh		advanced-semantic-attention-train.sh
advanced-ss_att-batch-eval_finetune-with-decay-da-rl-lr5e-5.sh		advanced-ss_att-batch-eval_finetune-with-decay-da-rl-lr5e-5.sh
advanced-ss_att-eval-1ckpt_finetune-with-decay-da.sh		advanced-ss_att-eval-1ckpt_finetune-with-decay-da.sh
advanced-ss_att-eval_finetune-with-decay-da-rl-lr1e-5.sh		advanced-ss_att-eval_finetune-with-decay-da-rl-lr1e-5.sh
advanced-ss_att-eval_finetune-with-decay-da-rl-lr5e-5.sh		advanced-ss_att-eval_finetune-with-decay-da-rl-lr5e-5.sh
advanced-ss_att-eval_finetune-with-decay-da-rl.sh		advanced-ss_att-eval_finetune-with-decay-da-rl.sh
advanced-ss_att-eval_finetune-with-decay-da-rl_lr0.1.sh		advanced-ss_att-eval_finetune-with-decay-da-rl_lr0.1.sh
advanced-ss_att-eval_finetune-with-decay-da.sh		advanced-ss_att-eval_finetune-with-decay-da.sh
advanced-ss_att-eval_finetune-with-decay.sh		advanced-ss_att-eval_finetune-with-decay.sh
advanced-ss_att-finetune-with-decay-da-rl-lr0.1.sh		advanced-ss_att-finetune-with-decay-da-rl-lr0.1.sh
advanced-ss_att-finetune-with-decay-da-rl.sh		advanced-ss_att-finetune-with-decay-da-rl.sh
advanced-ss_att-finetune-with-decay-da.sh		advanced-ss_att-finetune-with-decay-da.sh
advanced-ss_att-finetune-with-decay-da_2.sh		advanced-ss_att-finetune-with-decay-da_2.sh
advanced-ss_att-finetune-with-decay.sh		advanced-ss_att-finetune-with-decay.sh
advanced-ss_att-inference.sh		advanced-ss_att-inference.sh
advanced-ss_att-inference_all.sh		advanced-ss_att-inference_all.sh
advanced-ss_att-train-da.sh		advanced-ss_att-train-da.sh
advanced-ss_att-train-da_2.sh		advanced-ss_att-train-da_2.sh
advanced-ss_att-train.sh		advanced-ss_att-train.sh
advanced-ss_att_da-inference.sh		advanced-ss_att_da-inference.sh
advanced-ss_att_da-inference_all.sh		advanced-ss_att_da-inference_all.sh
advanced-vis-sem-attention-eval.sh		advanced-vis-sem-attention-eval.sh
advanced-vis-sem-attention-train.sh		advanced-vis-sem-attention-train.sh
advanced-visual-attention-2lexical-eval-1.0-0.66-8.0.sh		advanced-visual-attention-2lexical-eval-1.0-0.66-8.0.sh
advanced-visual-attention-2lexical-eval-2.0-0.6-8.0.sh		advanced-visual-attention-2lexical-eval-2.0-0.6-8.0.sh
advanced-visual-attention-2lexical-eval-2.0-0.66-8.0.sh		advanced-visual-attention-2lexical-eval-2.0-0.66-8.0.sh
advanced-visual-attention-2lexical-eval-2.0-0.9-2.0.sh		advanced-visual-attention-2lexical-eval-2.0-0.9-2.0.sh
advanced-visual-attention-2lexical-eval-rl.sh		advanced-visual-attention-2lexical-eval-rl.sh
advanced-visual-attention-2lexical-eval.sh		advanced-visual-attention-2lexical-eval.sh
advanced-visual-attention-2lexical-train-1.0-0.66-8.0.sh		advanced-visual-attention-2lexical-train-1.0-0.66-8.0.sh
advanced-visual-attention-2lexical-train-2.0-0.6-8.0.sh		advanced-visual-attention-2lexical-train-2.0-0.6-8.0.sh
advanced-visual-attention-2lexical-train-2.0-0.66-8.0.sh		advanced-visual-attention-2lexical-train-2.0-0.66-8.0.sh
advanced-visual-attention-2lexical-train-2.0-0.9-2.0.sh		advanced-visual-attention-2lexical-train-2.0-0.9-2.0.sh
advanced-visual-attention-2lexical-train-rl.sh		advanced-visual-attention-2lexical-train-rl.sh
advanced-visual-attention-2lexical-train.sh		advanced-visual-attention-2lexical-train.sh
advanced-visual-attention-highway-2lexical-eval.sh		advanced-visual-attention-highway-2lexical-eval.sh
advanced-visual-attention-highway-2lexical-train.sh		advanced-visual-attention-highway-2lexical-train.sh
advanced-visual-attention-lexical-eval_finetune-with-decay.sh		advanced-visual-attention-lexical-eval_finetune-with-decay.sh
advanced-visual-attention-lexical-finetune-with-decay.sh		advanced-visual-attention-lexical-finetune-with-decay.sh
advanced-visual-attention-lexical-train.sh		advanced-visual-attention-lexical-train.sh
advanced-visual-attention-nocropping-eval.sh		advanced-visual-attention-nocropping-eval.sh
advanced-visual-attention-nocropping-train.sh		advanced-visual-attention-nocropping-train.sh
baseline-eval.sh		baseline-eval.sh
baseline-eval_finetune.sh		baseline-eval_finetune.sh
baseline-finetune.sh		baseline-finetune.sh
baseline-inference.sh		baseline-inference.sh
baseline-train-da.sh		baseline-train-da.sh
baseline-train.sh		baseline-train.sh
ingraph-adam-finetune.sh		ingraph-adam-finetune.sh
ingraph-adam-train.sh		ingraph-adam-train.sh
ingraph-eval-test-beam-width.sh		ingraph-eval-test-beam-width.sh
ingraph-eval-test-max-length.sh		ingraph-eval-test-max-length.sh
ingraph-eval_da.sh		ingraph-eval_da.sh
ingraph-eval_finetune-with-decay.sh		ingraph-eval_finetune-with-decay.sh
ingraph-eval_finetune.sh		ingraph-eval_finetune.sh
ingraph-finetune-with-decay.sh		ingraph-finetune-with-decay.sh
ingraph-finetune.sh		ingraph-finetune.sh
ingraph-fromscratch-eval-test-np-cider.sh		ingraph-fromscratch-eval-test-np-cider.sh
ingraph-fromscratch-eval-test-tf-cider.sh		ingraph-fromscratch-eval-test-tf-cider.sh
ingraph-fromscratch-eval.sh		ingraph-fromscratch-eval.sh
ingraph-fromscratch-train.sh		ingraph-fromscratch-train.sh
ingraph-inference.sh		ingraph-inference.sh
ingraph-inference_all-2.sh		ingraph-inference_all-2.sh
ingraph-inference_all-3.sh		ingraph-inference_all-3.sh
ingraph-inference_all.sh		ingraph-inference_all.sh
ingraph-rl-train.sh		ingraph-rl-train.sh
ingraph-train-da.sh		ingraph-train-da.sh
ingraph-train-semantic-attention-join.sh		ingraph-train-semantic-attention-join.sh
ingraph-train-semantic-attention-join_da.sh		ingraph-train-semantic-attention-join_da.sh
ingraph-train-semantic-attention-join_idf_weighted.sh		ingraph-train-semantic-attention-join_idf_weighted.sh
ingraph-train-semantic-attention-luong-join.sh		ingraph-train-semantic-attention-luong-join.sh
ingraph-train-semantic-attention-luong_attr_only.sh		ingraph-train-semantic-attention-luong_attr_only.sh
ingraph-train-semantic-attention_attr_only.sh		ingraph-train-semantic-attention_attr_only.sh
ingraph-train-semantic-attention_attr_only_da.sh		ingraph-train-semantic-attention_attr_only_da.sh
ingraph-train-semantic-attention_attr_only_idf_weighted.sh		ingraph-train-semantic-attention_attr_only_idf_weighted.sh
ingraph-train.sh		ingraph-train.sh
multi-ref-model-train.sh		multi-ref-model-train.sh
ranker-pairwise-baseline-sigmoid-eval.sh		ranker-pairwise-baseline-sigmoid-eval.sh
ranker-pairwise-baseline-sigmoid-train.sh		ranker-pairwise-baseline-sigmoid-train.sh
ranker-pairwise-baseline-train.sh		ranker-pairwise-baseline-train.sh
ranker-pointwise-baseline-eval.sh		ranker-pointwise-baseline-eval.sh

License

wangheda/ImageCaption-UnderFitting

Folders and files

Latest commit

History

Repository files navigation

Team code repository for image caption task

Download the dataset

Setting up the data path

Generate the tfrecord files

Training

Inference

Validate

Preparation for reranking

Use lexical embedding

Important notice

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages