Skip to content

Hon-Wong/ByteVideoLLM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ByteVideoLLM

Welcome to the official repository for the upcoming work: ByteVideoLLM.

Release

  • [2024/12/13] 🔥 Technical report is released! We are going to release the training data, new checkpoints and the RFT pipeline for VideoLLM (including data and codes) in the next few weeks.
  • [2024/11/01] 🔥 [Zero-shot task transfer] ByteVideoLLM-14B achieves 1st performance among all open-source models on MMIU, even though we did not utilize any multi-image data (except for video) during training. Check [MMIU]!
  • [2024/10/21] 🔥 ByteVideoLLM-14B achieves 1st performance among 13B/14B models on VideoMME leaderboard, check [VideoMME]!
  • [2024/10/15] 🔥 ByteVideoLLM-14B and the inference code are released. Check [Checkpoint]!

Overview

ByteVideoLLM aims to strike a fine balance between performance and token consumption in video LLM.

Features

In the coming weeks, we will be releasing the following components:

  • Data: Approximately 1 million high-quality synthetic data points meticulously gathered by our team for Video QA.
  • Model Checkpoint: Pre-trained model checkpoints of different scales.
  • Training Code: Codebase to replicate the experiments conducted. We are continuously enhancing the model's performance.

Inference

We provide the eval script to ensure the reproducibility of our results.

Firstly, you need to process the annotations of LaSOT into json format, which is consistent with the format of the training set.

Secondly, Refer to config.sample_config.yaml, fill the correct data path into data.predict.data_fetch, and then start the command.

deepspeed --master_port={PORT} eval/eval.py {YOUR_CONFIG_PATH}

Data Preparation

If you want to use your own data, please process it into the following annotation format

{
    "source": ,
    "id": ,
    "vid": ,
    "metadata": ,
    "vqa": [
      {
        "from": "human", "value": "[YOUR_QUESTION]"
      },
      {
        "from": "gpt", "value": "[YOUR_ANSWER]"
      },
      ...
    ]
}

Stay tuned for updates and the release of these valuable resources!

About

This is the official repo for ByteVideoLLM/Dynamic-VLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages