Skip to content

Commit

Permalink
[Docs] add en docs (#15)
Browse files Browse the repository at this point in the history
* add en docs

* update

---------

Co-authored-by: gaotongxiao <[email protected]>
  • Loading branch information
yingfhu and gaotongxiao authored Jul 6, 2023
1 parent 07dfe8c commit 7f8eee4
Show file tree
Hide file tree
Showing 15 changed files with 193 additions and 91 deletions.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -200,4 +200,4 @@ Copyright 2020 OpenCompass Authors. All rights reserved.
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
limitations under the License.
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun

[![image](https://github.com/InternLM/OpenCompass/assets/7881589/6b56c297-77c0-4e1a-9acc-24a45c5a734a)](https://opencompass.org.cn/rank)



## Dataset Support

<table align="center">
Expand Down Expand Up @@ -245,7 +243,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
</tr>
<tr valign="top">
<td>

- InternLM
- LLaMA
- Vicuna
Expand Down
2 changes: 0 additions & 2 deletions README_zh-CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,8 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下

我们将陆续提供开源模型和API模型的具体性能榜单,请见 [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) 。如需加入评测,请提供模型仓库地址或标准的 API 接口至邮箱 `[email protected]`.


![image](https://github.com/InternLM/OpenCompass/assets/7881589/fddc8ab4-d2bd-429d-89f0-4ca90606599a)


## 数据集支持

<table align="center">
Expand Down
58 changes: 56 additions & 2 deletions docs/en/advanced_guides/new_dataset.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,57 @@
# New Dataset
# Add a dataset

Coming soon.
Although OpenCompass has already included most commonly used datasets, users need to follow the steps below to support a new dataset if wanted:

1. Add a dataset script `mydataset.py` to the `opencompass/datasets` folder. This script should include:

- The dataset and its loading method. Define a `MyDataset` class that implements the data loading method `load` as a static method. This method should return data of type `datasets.Dataset`. We use the Hugging Face dataset as the unified interface for datasets to avoid introducing additional logic. Here's an example:

```python
import datasets
from .base import BaseDataset

class MyDataset(BaseDataset):

@staticmethod
def load(**kwargs) -> datasets.Dataset:
pass
```

- (Optional) If the existing evaluators in OpenCompass do not meet your needs, you need to define a `MyDatasetEvaluator` class that implements the scoring method `score`. This method should take `predictions` and `references` as input and return the desired dictionary. Since a dataset may have multiple metrics, the method should return a dictionary containing the metrics and their corresponding scores. Here's an example:

```python
from opencompass.openicl.icl_evaluator import BaseEvaluator

class MyDatasetEvaluator(BaseEvaluator):

def score(self, predictions: List, references: List) -> dict:
pass
```

- (Optional) If the existing postprocessors in OpenCompass do not meet your needs, you need to define the `mydataset_postprocess` method. This method takes an input string and returns the corresponding postprocessed result string. Here's an example:

```python
def mydataset_postprocess(text: str) -> str:
pass
```

2. After defining the dataset loading, data postprocessing, and evaluator methods, you need to add the following configurations to the configuration file:

```python
from opencompass.datasets import MyDataset, MyDatasetEvaluator, mydataset_postprocess

mydataset_eval_cfg = dict(
evaluator=dict(type=MyDatasetEvaluator),
pred_postprocessor=dict(type=mydataset_postprocess))

mydataset_datasets = [
dict(
type=MyDataset,
...,
reader_cfg=...,
infer_cfg=...,
eval_cfg=mydataset_eval_cfg)
]
```

Once the dataset is configured, you can refer to the instructions on [Get started](../get_started.md) for other requirements.
74 changes: 72 additions & 2 deletions docs/en/advanced_guides/new_model.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,73 @@
# New A Model
# Add a Model

Coming soon.
Currently, we support HF models, some model APIs, and some third-party models.

## Adding API Models

To add a new API-based model, you need to create a new file named `mymodel_api.py` under `opencompass/models` directory. In this file, you should inherit from `BaseAPIModel` and implement the `generate` method for inference and the `get_token_len` method to calculate the length of tokens. Once you have defined the model, you can modify the corresponding configuration file.

```python
from ..base_api import BaseAPIModel

class MyModelAPI(BaseAPIModel):

is_api: bool = True

def __init__(self,
path: str,
max_seq_len: int = 2048,
query_per_second: int = 1,
retry: int = 2,
**kwargs):
super().__init__(path=path,
max_seq_len=max_seq_len,
meta_template=meta_template,
query_per_second=query_per_second,
retry=retry)
...

def generate(
self,
inputs,
max_out_len: int = 512,
temperature: float = 0.7,
) -> List[str]:
"""Generate results given a list of inputs."""
pass

def get_token_len(self, prompt: str) -> int:
"""Get lengths of the tokenized string."""
pass
```

## Adding Third-Party Models

To add a new third-party model, you need to create a new file named `mymodel.py` under `opencompass/models` directory. In this file, you should inherit from `BaseModel` and implement the `generate` method for generative inference, the `get_ppl` method for discriminative inference, and the `get_token_len` method to calculate the length of tokens. Once you have defined the model, you can modify the corresponding configuration file.

```python
from ..base import BaseModel

class MyModel(BaseModel):

def __init__(self,
pkg_root: str,
ckpt_path: str,
tokenizer_only: bool = False,
meta_template: Optional[Dict] = None,
**kwargs):
...

def get_token_len(self, prompt: str) -> int:
"""Get lengths of the tokenized strings."""
pass

def generate(self, inputs: List[str], max_out_len: int) -> List[str]:
"""Generate results given a list of inputs. """
pass

def get_ppl(self,
inputs: List[str],
mask_length: Optional[List[int]] = None) -> List[float]:
"""Get perplexity scores given a list of inputs."""
pass
```
2 changes: 1 addition & 1 deletion docs/en/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,7 +107,7 @@ models = [llama_7b]
</details>

<details>
<summary>Lauch Evalution</summary>
<summary>Launch Evaluation</summary>

First, we can start the task in **debug mode** to check for any exceptions in model loading, dataset reading, or incorrect cache usage.

Expand Down
2 changes: 1 addition & 1 deletion docs/en/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,4 +79,4 @@ Indexes & Tables
==================

* :ref:`genindex`
* :ref:`search`
* :ref:`search`
48 changes: 20 additions & 28 deletions docs/en/user_guides/dataset_prepare.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,34 +8,26 @@ First, let's introduce the structure under the `configs/datasets` directory in O

```
configs/datasets/
├── ChineseUniversal # Ability dimension
│ ├── CLUE_afqmc # Dataset under this dimension
│ │ ├── CLUE_afqmc_gen_db509b.py # Different configuration files for this dataset
│ │ ├── CLUE_afqmc_gen.py
│ │ ├── CLUE_afqmc_ppl_00b348.py
│ │ ├── CLUE_afqmc_ppl_2313cf.py
│ │ └── CLUE_afqmc_ppl.py
│ ├── CLUE_C3
│ │ ├── ...
│ ├── ...
├── Coding
├── collections
├── Completion
├── EnglishUniversal
├── Exam
├── glm
├── LongText
├── MISC
├── NLG
├── QA
├── Reasoning
├── Security
└── Translation
├── agieval
├── apps
├── ARC_c
├── ...
├── CLUE_afqmc # dataset
│   ├── CLUE_afqmc_gen_901306.py # different version of config
│   ├── CLUE_afqmc_gen.py
│   ├── CLUE_afqmc_ppl_378c5b.py
│   ├── CLUE_afqmc_ppl_6507d7.py
│   ├── CLUE_afqmc_ppl_7b0c1e.py
│   └── CLUE_afqmc_ppl.py
├── ...
├── XLSum
├── Xsum
└── z_bench
```

In the `configs/datasets` directory structure, we have divided the datasets into over ten dimensions based on ability dimensions, such as: Chinese and English Universal, Exam, QA, Reasoning, Security, etc. Each dimension contains a series of datasets, and there are multiple dataset configurations in the corresponding folder of each dataset.
In the `configs/datasets` directory structure, we flatten all datasets directly, and there are multiple dataset configurations within the corresponding folders for each dataset.

The naming of the dataset configuration file is made up of `{dataset name}_{evaluation method}_{prompt version number}.py`. For example, `ChineseUniversal/CLUE_afqmc/CLUE_afqmc_gen_db509b.py`, this configuration file is the `CLUE_afqmc` dataset under the Chinese universal ability, the corresponding evaluation method is `gen`, i.e., generative evaluation, and the corresponding prompt version number is `db509b`; similarly, `CLUE_afqmc_ppl_00b348.py` indicates that the evaluation method is `ppl`, i.e., discriminative evaluation, and the prompt version number is `00b348`.
The naming of the dataset configuration file is made up of `{dataset name}_{evaluation method}_{prompt version number}.py`. For example, `CLUE_afqmc/CLUE_afqmc_gen_db509b.py`, this configuration file is the `CLUE_afqmc` dataset under the Chinese universal ability, the corresponding evaluation method is `gen`, i.e., generative evaluation, and the corresponding prompt version number is `db509b`; similarly, `CLUE_afqmc_ppl_00b348.py` indicates that the evaluation method is `ppl`, i.e., discriminative evaluation, and the prompt version number is `00b348`.

In addition, files without a version number, such as: `CLUE_afqmc_gen.py`, point to the latest prompt configuration file of that evaluation method, which is usually the most accurate prompt.

Expand All @@ -49,13 +41,13 @@ The datasets supported by OpenCompass mainly include two parts:

2. OpenCompass Self-built Datasets

In addition to supporting Huggingface's existing datasets, OpenCompass also provides some self-built CN datasets. In the future, a dataset-related Repo will be provided for users to download and use. Following the instructions in the document to place the datasets uniformly in the `./data` directory can complete dataset preparation.
In addition to supporting Huggingface's existing datasets, OpenCompass also provides some self-built CN datasets. In the future, a dataset-related link will be provided for users to download and use. Following the instructions in the document to place the datasets uniformly in the `./data` directory can complete dataset preparation.

It is important to note that the Repo not only contains self-built datasets, but also includes some HF-supported datasets for testing convenience.

## Dataset Selection

In each dataset configuration file, the dataset will be defined in the `{}_datasets` variable, such as `afqmc_datasets` in `ChineseUniversal/CLUE_afqmc/CLUE_afqmc_gen_db509b.py`.
In each dataset configuration file, the dataset will be defined in the `{}_datasets` variable, such as `afqmc_datasets` in `CLUE_afqmc/CLUE_afqmc_gen_db509b.py`.

```python
afqmc_datasets = [
Expand All @@ -70,7 +62,7 @@ afqmc_datasets = [
]
```

And `afqmc_datasets` in `ChineseUniversal/CLUE_cmnli/CLUE_cmnli_ppl_b78ad4.py`.
And `cmnli_datasets` in `CLUE_cmnli/CLUE_cmnli_ppl_b78ad4.py`.

```python
cmnli_datasets = [
Expand Down
12 changes: 5 additions & 7 deletions docs/zh_cn/advanced_guides/new_dataset.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

1.`opencompass/datasets` 文件夹新增数据集脚本 `mydataset.py`, 该脚本需要包含:

- 数据集及其加载方式,需要定义一个 `MyDataset` 类,实现数据集加载方法 `load` ,该方法为静态方法,需要返回 `datasets.Dataset` 类型的数据。这里我们使用 huggingface dataset 作为数据集的统一接口,避免引入额外的逻辑。具体示例如下:
- 数据集及其加载方式,需要定义一个 `MyDataset` 类,实现数据集加载方法 `load`,该方法为静态方法,需要返回 `datasets.Dataset` 类型的数据。这里我们使用 huggingface dataset 作为数据集的统一接口,避免引入额外的逻辑。具体示例如下:

```python
import datasets
Expand All @@ -17,10 +17,9 @@
pass
```

- (可选)如果OpenCompass已有的evaluator不能满足需要,需要用户定义 `MyDatasetlEvaluator` 类,实现评分方法 `score` ,需要根据输入的 `predictions``references` 列表,得到需要的字典。由于一个数据集可能存在多种metric,需要返回一个 metrics 以及对应 scores 的相关字典。具体示例如下:
- (可选)如果 OpenCompass 已有的评测器不能满足需要,需要用户定义 `MyDatasetlEvaluator` 类,实现评分方法 `score`,需要根据输入的 `predictions``references` 列表,得到需要的字典。由于一个数据集可能存在多种 metric,需要返回一个 metrics 以及对应 scores 的相关字典。具体示例如下:

```python

from opencompass.openicl.icl_evaluator import BaseEvaluator

class MyDatasetlEvaluator(BaseEvaluator):
Expand All @@ -30,14 +29,14 @@

```

- (可选)如果 OpenCompass 已有的 postprocesser 不能满足需要,需要用户定义 `mydataset_postprocess` 方法,根据输入的字符串得到相应后处理的结果。具体示例如下:
- (可选)如果 OpenCompass 已有的后处理方法不能满足需要,需要用户定义 `mydataset_postprocess` 方法,根据输入的字符串得到相应后处理的结果。具体示例如下:

```python
def mydataset_postprocess(text: str) -> str:
pass
```

2. 在定义好数据集加载,数据后处理以及 `evaluator` 等方法之后,需要在配置文件中新增以下配置:
2. 在定义好数据集加载、评测以及数据后处理等方法之后,需要在配置文件中新增以下配置:

```python
from opencompass.datasets import MyDataset, MyDatasetlEvaluator, mydataset_postprocess
Expand All @@ -56,5 +55,4 @@
]
```

配置好数据集之后,其他需要的配置文件直接参考如何启动评测任务教程即可。

配置好数据集之后,其他需要的配置文件直接参考[快速上手](../get_started.md)教程即可。
2 changes: 1 addition & 1 deletion docs/zh_cn/advanced_guides/new_model.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 支持新模型

目前我们已经支持的模型有 HF 模型、部分模型 API 、自建模型和部分第三方模型
目前我们已经支持的模型有 HF 模型、部分模型 API 、部分第三方模型

## 新增API模型

Expand Down
2 changes: 1 addition & 1 deletion docs/zh_cn/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -79,4 +79,4 @@ OpenCompass 上手路线
==================

* :ref:`genindex`
* :ref:`search`
* :ref:`search`
2 changes: 1 addition & 1 deletion docs/zh_cn/prompt/prompt_template.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Prompt 模板

Coming soon.
Coming soon.
Loading

0 comments on commit 7f8eee4

Please sign in to comment.