[Docs] add en docs (#15)

* add en docs * update --------- Co-authored-by: gaotongxiao <[email protected]>
Chaseldot · Jul 6, 2023 · 7f8eee4 · 7f8eee4
1 parent 07dfe8c
commit 7f8eee4
Show file tree

Hide file tree

Showing 15 changed files with 193 additions and 91 deletions.
diff --git a/LICENSE b/LICENSE
@@ -200,4 +200,4 @@ Copyright 2020 OpenCompass Authors. All rights reserved.
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
-   limitations under the License.
+   limitations under the License.
diff --git a/README.md b/README.md
@@ -39,8 +39,6 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
 
 [![image](https://github.com/InternLM/OpenCompass/assets/7881589/6b56c297-77c0-4e1a-9acc-24a45c5a734a)](https://opencompass.org.cn/rank)
 
-
-
 ## Dataset Support
 
 <table align="center">
@@ -245,7 +243,7 @@ We provide [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) for commun
     </tr>
     <tr valign="top">
       <td>
-        
+
 - InternLM
 - LLaMA
 - Vicuna

diff --git a/README_zh-CN.md b/README_zh-CN.md
@@ -40,10 +40,8 @@ OpenCompass 是面向大模型评测的一站式平台。其主要特点如下
 
 我们将陆续提供开源模型和API模型的具体性能榜单，请见 [OpenCompass Leaderbaord](https://opencompass.org.cn/rank) 。如需加入评测，请提供模型仓库地址或标准的 API 接口至邮箱  `[email protected]`.
 
-
 ![image](https://github.com/InternLM/OpenCompass/assets/7881589/fddc8ab4-d2bd-429d-89f0-4ca90606599a)
 
-
 ## 数据集支持
 
 <table align="center">

diff --git a/docs/en/advanced_guides/new_dataset.md b/docs/en/advanced_guides/new_dataset.md
@@ -1,3 +1,57 @@
-# New Dataset
+# Add a dataset
 
-Coming soon.
+Although OpenCompass has already included most commonly used datasets, users need to follow the steps below to support a new dataset if wanted:
+
+1. Add a dataset script `mydataset.py` to the `opencompass/datasets` folder. This script should include:
+
+   - The dataset and its loading method. Define a `MyDataset` class that implements the data loading method `load` as a static method. This method should return data of type `datasets.Dataset`. We use the Hugging Face dataset as the unified interface for datasets to avoid introducing additional logic. Here's an example:
+
+   ```python
+   import datasets
+   from .base import BaseDataset
+
+   class MyDataset(BaseDataset):
+
+       @staticmethod
+       def load(**kwargs) -> datasets.Dataset:
+           pass
+   ```
+
+   - (Optional) If the existing evaluators in OpenCompass do not meet your needs, you need to define a `MyDatasetEvaluator` class that implements the scoring method `score`. This method should take `predictions` and `references` as input and return the desired dictionary. Since a dataset may have multiple metrics, the method should return a dictionary containing the metrics and their corresponding scores. Here's an example:
+
+   ```python
+   from opencompass.openicl.icl_evaluator import BaseEvaluator
+
+   class MyDatasetEvaluator(BaseEvaluator):
+
+       def score(self, predictions: List, references: List) -> dict:
+           pass
+   ```
+
+   - (Optional) If the existing postprocessors in OpenCompass do not meet your needs, you need to define the `mydataset_postprocess` method. This method takes an input string and returns the corresponding postprocessed result string. Here's an example:
+
+   ```python
+   def mydataset_postprocess(text: str) -> str:
+       pass
+   ```
+
+2. After defining the dataset loading, data postprocessing, and evaluator methods, you need to add the following configurations to the configuration file:
+
+   ```python
+   from opencompass.datasets import MyDataset, MyDatasetEvaluator, mydataset_postprocess
+
+   mydataset_eval_cfg = dict(
+       evaluator=dict(type=MyDatasetEvaluator),
+       pred_postprocessor=dict(type=mydataset_postprocess))
+
+   mydataset_datasets = [
+       dict(
+           type=MyDataset,
+           ...,
+           reader_cfg=...,
+           infer_cfg=...,
+           eval_cfg=mydataset_eval_cfg)
+   ]
+   ```
+
+   Once the dataset is configured, you can refer to the instructions on [Get started](../get_started.md) for other requirements.
diff --git a/docs/en/advanced_guides/new_model.md b/docs/en/advanced_guides/new_model.md
@@ -1,3 +1,73 @@
-# New A Model
+# Add a Model
 
-Coming soon.
+Currently, we support HF models, some model APIs, and some third-party models.
+
+## Adding API Models
+
+To add a new API-based model, you need to create a new file named `mymodel_api.py` under `opencompass/models` directory. In this file, you should inherit from `BaseAPIModel` and implement the `generate` method for inference and the `get_token_len` method to calculate the length of tokens. Once you have defined the model, you can modify the corresponding configuration file.
+
+```python
+from ..base_api import BaseAPIModel
+
+class MyModelAPI(BaseAPIModel):
+
+    is_api: bool = True
+
+    def __init__(self,
+                 path: str,
+                 max_seq_len: int = 2048,
+                 query_per_second: int = 1,
+                 retry: int = 2,
+                 **kwargs):
+        super().__init__(path=path,
+                         max_seq_len=max_seq_len,
+                         meta_template=meta_template,
+                         query_per_second=query_per_second,
+                         retry=retry)
+        ...
+
+    def generate(
+        self,
+        inputs,
+        max_out_len: int = 512,
+        temperature: float = 0.7,
+    ) -> List[str]:
+        """Generate results given a list of inputs."""
+        pass
+
+    def get_token_len(self, prompt: str) -> int:
+        """Get lengths of the tokenized string."""
+        pass
+```
+
+## Adding Third-Party Models
+
+To add a new third-party model, you need to create a new file named `mymodel.py` under `opencompass/models` directory. In this file, you should inherit from `BaseModel` and implement the `generate` method for generative inference, the `get_ppl` method for discriminative inference, and the `get_token_len` method to calculate the length of tokens. Once you have defined the model, you can modify the corresponding configuration file.
+
+```python
+from ..base import BaseModel
+
+class MyModel(BaseModel):
+
+    def __init__(self,
+                 pkg_root: str,
+                 ckpt_path: str,
+                 tokenizer_only: bool = False,
+                 meta_template: Optional[Dict] = None,
+                 **kwargs):
+        ...
+
+    def get_token_len(self, prompt: str) -> int:
+        """Get lengths of the tokenized strings."""
+        pass
+
+    def generate(self, inputs: List[str], max_out_len: int) -> List[str]:
+        """Generate results given a list of inputs. """
+        pass
+
+    def get_ppl(self,
+                inputs: List[str],
+                mask_length: Optional[List[int]] = None) -> List[float]:
+        """Get perplexity scores given a list of inputs."""
+        pass
+```
diff --git a/docs/en/get_started.md b/docs/en/get_started.md
@@ -107,7 +107,7 @@ models = [llama_7b]
 </details>
 
 <details>
-<summary>Lauch Evalution</summary>
+<summary>Launch Evaluation</summary>
 
 First, we can start the task in **debug mode** to check for any exceptions in model loading, dataset reading, or incorrect cache usage.
 

diff --git a/docs/en/index.rst b/docs/en/index.rst
@@ -79,4 +79,4 @@ Indexes & Tables
 ==================
 
 * :ref:`genindex`
-* :ref:`search`
+* :ref:`search`
diff --git a/docs/en/user_guides/dataset_prepare.md b/docs/en/user_guides/dataset_prepare.md
@@ -8,34 +8,26 @@ First, let's introduce the structure under the `configs/datasets` directory in O
 
 ```
 configs/datasets/
-├── ChineseUniversal  # Ability dimension
-│   ├── CLUE_afqmc  # Dataset under this dimension
-│   │   ├── CLUE_afqmc_gen_db509b.py  # Different configuration files for this dataset
-│   │   ├── CLUE_afqmc_gen.py
-│   │   ├── CLUE_afqmc_ppl_00b348.py
-│   │   ├── CLUE_afqmc_ppl_2313cf.py
-│   │   └── CLUE_afqmc_ppl.py
-│   ├── CLUE_C3
-│   │   ├── ...
-│   ├── ...
-├── Coding
-├── collections
-├── Completion
-├── EnglishUniversal
-├── Exam
-├── glm
-├── LongText
-├── MISC
-├── NLG
-├── QA
-├── Reasoning
-├── Security
-└── Translation
+├── agieval
+├── apps
+├── ARC_c
+├── ...
+├── CLUE_afqmc  # dataset
+│   ├── CLUE_afqmc_gen_901306.py  # different version of config
+│   ├── CLUE_afqmc_gen.py
+│   ├── CLUE_afqmc_ppl_378c5b.py
+│   ├── CLUE_afqmc_ppl_6507d7.py
+│   ├── CLUE_afqmc_ppl_7b0c1e.py
+│   └── CLUE_afqmc_ppl.py
+├── ...
+├── XLSum
+├── Xsum
+└── z_bench
 ```
 
-In the `configs/datasets` directory structure, we have divided the datasets into over ten dimensions based on ability dimensions, such as: Chinese and English Universal, Exam, QA, Reasoning, Security, etc. Each dimension contains a series of datasets, and there are multiple dataset configurations in the corresponding folder of each dataset.
+In the `configs/datasets` directory structure, we flatten all datasets directly, and there are multiple dataset configurations within the corresponding folders for each dataset.
 
-The naming of the dataset configuration file is made up of `{dataset name}_{evaluation method}_{prompt version number}.py`. For example, `ChineseUniversal/CLUE_afqmc/CLUE_afqmc_gen_db509b.py`, this configuration file is the `CLUE_afqmc` dataset under the Chinese universal ability, the corresponding evaluation method is `gen`, i.e., generative evaluation, and the corresponding prompt version number is `db509b`; similarly, `CLUE_afqmc_ppl_00b348.py` indicates that the evaluation method is `ppl`, i.e., discriminative evaluation, and the prompt version number is `00b348`.
+The naming of the dataset configuration file is made up of `{dataset name}_{evaluation method}_{prompt version number}.py`. For example, `CLUE_afqmc/CLUE_afqmc_gen_db509b.py`, this configuration file is the `CLUE_afqmc` dataset under the Chinese universal ability, the corresponding evaluation method is `gen`, i.e., generative evaluation, and the corresponding prompt version number is `db509b`; similarly, `CLUE_afqmc_ppl_00b348.py` indicates that the evaluation method is `ppl`, i.e., discriminative evaluation, and the prompt version number is `00b348`.
 
 In addition, files without a version number, such as: `CLUE_afqmc_gen.py`, point to the latest prompt configuration file of that evaluation method, which is usually the most accurate prompt.
 
@@ -49,13 +41,13 @@ The datasets supported by OpenCompass mainly include two parts:
 
 2. OpenCompass Self-built Datasets
 
-In addition to supporting Huggingface's existing datasets, OpenCompass also provides some self-built CN datasets. In the future, a dataset-related Repo will be provided for users to download and use. Following the instructions in the document to place the datasets uniformly in the `./data` directory can complete dataset preparation.
+In addition to supporting Huggingface's existing datasets, OpenCompass also provides some self-built CN datasets. In the future, a dataset-related link will be provided for users to download and use. Following the instructions in the document to place the datasets uniformly in the `./data` directory can complete dataset preparation.
 
 It is important to note that the Repo not only contains self-built datasets, but also includes some HF-supported datasets for testing convenience.
 
 ## Dataset Selection
 
-In each dataset configuration file, the dataset will be defined in the `{}_datasets` variable, such as `afqmc_datasets` in `ChineseUniversal/CLUE_afqmc/CLUE_afqmc_gen_db509b.py`.
+In each dataset configuration file, the dataset will be defined in the `{}_datasets` variable, such as `afqmc_datasets` in `CLUE_afqmc/CLUE_afqmc_gen_db509b.py`.
 
 ```python
 afqmc_datasets = [
@@ -70,7 +62,7 @@ afqmc_datasets = [
 ]
 ```
 
-And `afqmc_datasets` in `ChineseUniversal/CLUE_cmnli/CLUE_cmnli_ppl_b78ad4.py`.
+And `cmnli_datasets` in `CLUE_cmnli/CLUE_cmnli_ppl_b78ad4.py`.
 
 ```python
 cmnli_datasets = [

diff --git a/docs/zh_cn/advanced_guides/new_dataset.md b/docs/zh_cn/advanced_guides/new_dataset.md
@@ -4,7 +4,7 @@
 
 1. 在 `opencompass/datasets` 文件夹新增数据集脚本 `mydataset.py`, 该脚本需要包含：
 
-    - 数据集及其加载方式，需要定义一个 `MyDataset` 类，实现数据集加载方法 `load` ，该方法为静态方法，需要返回 `datasets.Dataset` 类型的数据。这里我们使用 huggingface dataset 作为数据集的统一接口，避免引入额外的逻辑。具体示例如下：
+    - 数据集及其加载方式，需要定义一个 `MyDataset` 类，实现数据集加载方法 `load`，该方法为静态方法，需要返回 `datasets.Dataset` 类型的数据。这里我们使用 huggingface dataset 作为数据集的统一接口，避免引入额外的逻辑。具体示例如下：
 
     ```python
     import datasets
@@ -17,10 +17,9 @@
             pass
     ```
 
-    - （可选）如果OpenCompass已有的evaluator不能满足需要，需要用户定义 `MyDatasetlEvaluator` 类，实现评分方法 `score` ，需要根据输入的 `predictions` 和 `references` 列表，得到需要的字典。由于一个数据集可能存在多种metric，需要返回一个 metrics 以及对应 scores 的相关字典。具体示例如下：
+    - （可选）如果 OpenCompass 已有的评测器不能满足需要，需要用户定义 `MyDatasetlEvaluator` 类，实现评分方法 `score`，需要根据输入的 `predictions` 和 `references` 列表，得到需要的字典。由于一个数据集可能存在多种 metric，需要返回一个 metrics 以及对应 scores 的相关字典。具体示例如下：
 
     ```python
-
     from opencompass.openicl.icl_evaluator import BaseEvaluator
 
     class MyDatasetlEvaluator(BaseEvaluator):
@@ -30,14 +29,14 @@
 
     ```
 
-    - （可选）如果 OpenCompass 已有的 postprocesser 不能满足需要，需要用户定义 `mydataset_postprocess` 方法，根据输入的字符串得到相应后处理的结果。具体示例如下：
+    - （可选）如果 OpenCompass 已有的后处理方法不能满足需要，需要用户定义 `mydataset_postprocess` 方法，根据输入的字符串得到相应后处理的结果。具体示例如下：
 
     ```python
     def mydataset_postprocess(text: str) -> str:
         pass
     ```
 
-2. 在定义好数据集加载，数据后处理以及 `evaluator` 等方法之后，需要在配置文件中新增以下配置：
+2. 在定义好数据集加载、评测以及数据后处理等方法之后，需要在配置文件中新增以下配置：
 
     ```python
     from opencompass.datasets import MyDataset, MyDatasetlEvaluator, mydataset_postprocess
@@ -56,5 +55,4 @@
     ]
     ```
 
-    配置好数据集之后，其他需要的配置文件直接参考如何启动评测任务教程即可。
-
+    配置好数据集之后，其他需要的配置文件直接参考[快速上手](../get_started.md)教程即可。
diff --git a/docs/zh_cn/advanced_guides/new_model.md b/docs/zh_cn/advanced_guides/new_model.md
@@ -1,6 +1,6 @@
 # 支持新模型
 
-目前我们已经支持的模型有 HF 模型、部分模型 API 、自建模型和部分第三方模型。
+目前我们已经支持的模型有 HF 模型、部分模型 API 、部分第三方模型。
 
 ## 新增API模型
 

diff --git a/docs/zh_cn/index.rst b/docs/zh_cn/index.rst
@@ -79,4 +79,4 @@ OpenCompass 上手路线
 ==================
 
 * :ref:`genindex`
-* :ref:`search`
+* :ref:`search`
diff --git a/docs/zh_cn/prompt/prompt_template.md b/docs/zh_cn/prompt/prompt_template.md
@@ -1,3 +1,3 @@
 # Prompt 模板
 
-Coming soon.
+Coming soon.