diff --git a/docs/_docs/home.md b/docs/_docs/home.md index face1b5..d746e1e 100644 --- a/docs/_docs/home.md +++ b/docs/_docs/home.md @@ -173,15 +173,15 @@ As you can see, the core `weblinx` has many useful functions to process the data You can also take a look at some useful `processing` functions in the [documentation of the processing module]({{'/docs/processing/' | relative_url }}), and some useful `utils` functions in the [documentation of the utils module]({{'/docs/utils/' | relative_url }}). -## Accessing element ranking scores +## Accessing element ranking scores to select candidate elements -To access the elements and scores generated by the MiniLM-L6-dmr model, you can download them with: +You might be interested in accessing the most "relevant" elements in the DOM tree (i.e., elements that are most likely to be used in one of the actions). To do that, you need a model to assign a "relevance" score to each element, and choose only the ones with the highest score; we call the best elements *candidate elements*, or more concisely, *candidates*. Our preferred model to do this task is a `MiniLM-L6` that was finetuned for this specific goal on the WebLINX training set (more details in the paper); we call this model [`MiniLM-L6-dmr`](https://huggingface.co/McGill-NLP/MiniLM-L6-dmr). You can download the *candidate elements* generated by `MiniLM-L6-dmr` using the following code: ```python from huggingface_hub import snapshot_download from weblinx.processing import load_candidate_elements -# Download the candidates generated by the MiniLM-L6-dmr model +# Download the candidates elements generated by the MiniLM-L6-dmr model snapshot_download( repo_id="McGill-NLP/WebLINX-full", repo_type="dataset",