From 7b881858eb01ff1e394de756e2fe4fb67a6690fb Mon Sep 17 00:00:00 2001 From: aetter Date: Wed, 18 Nov 2020 11:25:46 -0800 Subject: [PATCH 01/28] Add ODFE CLI --- docs/ad/cli.md | 100 ---------------------------------------------- docs/cli/index.md | 95 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 95 insertions(+), 100 deletions(-) delete mode 100644 docs/ad/cli.md create mode 100644 docs/cli/index.md diff --git a/docs/ad/cli.md b/docs/ad/cli.md deleted file mode 100644 index 3dd749f8..00000000 --- a/docs/ad/cli.md +++ /dev/null @@ -1,100 +0,0 @@ ---- -layout: default -title: Anomaly Detection CLI -parent: Anomaly Detection -nav_order: 3 ---- - -# Anomaly Detection CLI - -Anomaly detection CLI lets you call anomaly detection APIs with the `esad` command. - -You can use the CLI to: - -* Create detectors -* Start, stop, and delete detectors -* Create named profiles to connect to your cluster - -Install the anomaly detection plugin to your Elasticsearch instance, run the CLI using macOS or Linux, and connect to any valid Elasticsearch end-point. - -## Install - -Launch your local Elasticsearch instance and make sure you have the anomaly detection plugin installed. - -To install the anomaly detection CLI: - -1. Download and extract [esad binaries](https://github.com/opendistro-for-elasticsearch/anomaly-detection/actions/runs/224422434). - -2. Make the `esad` file executable: -```bash -chmod +x ./esad -``` - -3. Move the binaries to your path for root users: -```bash -sudo mv ./esad /usr/local/bin/esad -``` -Or add it to the current path: -```bash -export PATH=$PATH:$(pwd) -``` - -4. Check if the CLI is installed: -```bash -esad --version -``` -You should see the command prints of the `esad` version you installed. - - -## Configure - -Before using the CLI, you must configure your credentials. - -To quickly get started, run the `esad profile create` command: -``` -esad profile create - -Enter profile's name: dev -ES Anomaly Detection Endpoint: https://localhost:9200 -ES Anomaly Detection User: admin -ES Anomaly Detection Password: -``` - -Specify a unique profile name. The `create` command doesn’t allow duplicate profiles. - -Alternatively, you can also use a configuration file: -```yaml -profiles: -- endpoint: https://localhost:9200 - username: admin - password: foobar - name: default -- endpoint: https://odfe-node1:9200 - username: admin - password: foobar - name: dev -``` - -Save the file in `~/.esad/config.yaml`. If save you file to a different location, set the appropriate environment variable: -``` -export ESAD_CONFIG_FILE=/path/to/config_file -``` - -## Using the CLI - -1. The complete syntax for an `esad` command is as follows: -``` -esad [flags and parameters] -``` - -1. To start a detector: -``` -esad start [detector-name-pattern] -``` - -1. To see help documentation: -``` -esad --help -esad --help -esad --help -``` diff --git a/docs/cli/index.md b/docs/cli/index.md new file mode 100644 index 00000000..55e318de --- /dev/null +++ b/docs/cli/index.md @@ -0,0 +1,95 @@ +--- +layout: default +title: ODFE CLI +nav_order: 52 +has_children: false +--- + +# ODFE CLI + +The Open Distro for Elasticsearch command line interface (odfe-cli) lets you manage your ODFE cluster from the command line and automate tasks. + +Currently, odfe-cli only supports the [Anomaly Detection](../ad/) plugin. You can create and delete detectors, start and stop them, and use profiles to easily access different clusters or sign requests with different credentials. + +This example moves a detector (`ecommerce-count-quantity`) from a staging cluster to a production cluster: + +```bash +odfe-cli ad get ecommerce-count-quantity --profile staging > ecommerce-count-quantity.json +odfe-cli ad create ecommerce-count-quantity.json --profile production +odfe-cli ad start ecommerce-count-quantity.json --profile production +odfe-cli ad stop ecommerce-count-quantity --profile staging +odfe-cli ad delete ecommerce-count-quantity --profile staging +``` + + +## Install + +1. Download and extract the installation package. + +1. Make the `odfe-cli` file executable: + + ```bash + chmod +x ./odfe-cli + ``` + +1. Add the command to your path: + + ```bash + export PATH=$PATH:$(pwd) + ``` + +1. Check that the CLI is working properly: + + ```bash + odfe-cli --version + ``` + + +## Profiles + +Profiles let you easily switch between different clusters and user credentials. To get started, run `odfe-cli profile create` and specify a unique profile name: + +``` +$ odfe-cli profile create +Enter profile's name: default +Elasticsearch Endpoint: https://localhost:9200 +User Name: +Password: +``` + +Alternatively, save a configuration file to `~/.odfe-cli/config.yaml`: + +```yaml +profiles: +- endpoint: https://localhost:9200 + username: admin + password: foobar + name: default +- endpoint: https://odfe-node1:9200 + username: admin + password: foobar + name: dev +``` + + +## Usage + +odfe-cli commands use the following syntax: + +```bash +odfe-cli +``` + +For example, the following command retrieves information about a detector: + +```bash +odfe-cli ad get my-detector --profile dev +``` + +Use the `-h` or `--help` flag to see all supported commands, subcommands, or usage for a specific command: + +```bash +odfe-cli -h +odfe-cli ad -h +odfe-cli ad get -h +``` From 37f4d783006b2c278431297896c7c8c03d785cb0 Mon Sep 17 00:00:00 2001 From: aetter Date: Fri, 15 Jan 2021 13:36:57 -0800 Subject: [PATCH 02/28] Add DL link --- docs/cli/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/cli/index.md b/docs/cli/index.md index 55e318de..02381e34 100644 --- a/docs/cli/index.md +++ b/docs/cli/index.md @@ -24,7 +24,7 @@ odfe-cli ad delete ecommerce-count-quantity --profile staging ## Install -1. Download and extract the installation package. +1. [Download](https://opendistro.github.io/for-elasticsearch/downloads.html){:target='\_blank'} and extract the appropriate installation package for your computer. 1. Make the `odfe-cli` file executable: From ab19995b557d9a3822a6b7a0f7a7ad27683526a9 Mon Sep 17 00:00:00 2001 From: keithhc2 Date: Thu, 21 Jan 2021 17:31:03 -0800 Subject: [PATCH 03/28] Added information about k-NN plugin's Warmup API --- docs/knn/warmup.md | 38 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 38 insertions(+) create mode 100644 docs/knn/warmup.md diff --git a/docs/knn/warmup.md b/docs/knn/warmup.md new file mode 100644 index 00000000..8bcf1df4 --- /dev/null +++ b/docs/knn/warmup.md @@ -0,0 +1,38 @@ +--- +layout: default +title: Warmup API +parent: KNN +nav_order: 5 +has_children: false +has_toc: false +has_math: false +--- + +# Warmup API +## Overview +The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This can cause high latency during initial queries. To avoid this, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This process is indirect and requires extra effort. + +As an alternative, a user can run the k-NN plugin's warmup API on whatever indices they are interested in searching over. This API will load all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, a user will be able to start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation will have no impact on them. It only loads graphs that are not currently in memory. + +## Usage +This command will perform warmup on index1, index2, and index3: +``` +GET /_opendistro/_knn/warmup/index1,index2,index3?pretty +{ + "_shards" : { + "total" : 6, + "successful" : 6, + "failed" : 0 + } +} +``` +`total` indicates how many shards the warmup operation was performed on. `successful` indicates how many shards succeeded and `failed` indicates how many shards have failed. + +The call will not return until the warmup operation is complete or the request times out. If the request times out, the operation will still be going on in the cluster. To monitor this, use the Elasticsearch `_tasks` API. + +Following the completion of the operation, use the k-NN `_stats` API to see what has been loaded into the graph. + +## Best practices +In order for the warmup API to function properly, you will need to follow a few best practices. First, you should not be running any merge operations on the indices you want to warm up. The reason for this is that, during merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. You may see the situation where the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries will still be present. + +Second, you should first confirm that all of the graphs of interest are able to fit into native memory before running warmup. If they cannot all fit into memory, then the cache will thrash. From 9015ab59a6f07918cda35cc46b5ab27f76958648 Mon Sep 17 00:00:00 2001 From: keithhc2 Date: Fri, 22 Jan 2021 11:45:02 -0800 Subject: [PATCH 04/28] Added more information about not indexing documents to be loaded --- docs/knn/warmup.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/knn/warmup.md b/docs/knn/warmup.md index 8bcf1df4..8203d313 100644 --- a/docs/knn/warmup.md +++ b/docs/knn/warmup.md @@ -12,7 +12,7 @@ has_math: false ## Overview The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This can cause high latency during initial queries. To avoid this, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This process is indirect and requires extra effort. -As an alternative, a user can run the k-NN plugin's warmup API on whatever indices they are interested in searching over. This API will load all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, a user will be able to start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation will have no impact on them. It only loads graphs that are not currently in memory. +As an alternative, you can run the k-NN plugin's warmup API on whatever indices you are interested in searching over. This API will load all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, you will be able to start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation will have no impact on them. It only loads graphs that are not currently in memory. ## Usage This command will perform warmup on index1, index2, and index3: @@ -33,6 +33,8 @@ The call will not return until the warmup operation is complete or the request t Following the completion of the operation, use the k-NN `_stats` API to see what has been loaded into the graph. ## Best practices -In order for the warmup API to function properly, you will need to follow a few best practices. First, you should not be running any merge operations on the indices you want to warm up. The reason for this is that, during merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. You may see the situation where the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries will still be present. +In order for the warmup API to function properly, you need to follow a few best practices. First, you should not be running any merge operations on the indices you want to warm up. The reason for this is that, during merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. You may see the situation where the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries will still be present. -Second, you should first confirm that all of the graphs of interest are able to fit into native memory before running warmup. If they cannot all fit into memory, then the cache will thrash. +Second, you should first confirm that all of the graphs of interest can fit into native memory before running warmup. If they cannot all fit into memory, the cache will thrash. + +Lastly, you should not index any documents you want to load into the cache. Writing new information to segments prevents the Warmup API from loading the graphs until they are searchable, so you would have to run the Warmup API again after indexing is complete. From 98ab5b916cb6d51c46656e78ac6bac8a86803eb9 Mon Sep 17 00:00:00 2001 From: keithhc2 Date: Fri, 22 Jan 2021 16:55:43 -0800 Subject: [PATCH 05/28] Language fixes --- docs/knn/warmup.md | 24 +++++++++++++----------- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/docs/knn/warmup.md b/docs/knn/warmup.md index 8203d313..69b02cdb 100644 --- a/docs/knn/warmup.md +++ b/docs/knn/warmup.md @@ -9,14 +9,15 @@ has_math: false --- # Warmup API -## Overview -The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This can cause high latency during initial queries. To avoid this, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This process is indirect and requires extra effort. -As an alternative, you can run the k-NN plugin's warmup API on whatever indices you are interested in searching over. This API will load all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, you will be able to start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation will have no impact on them. It only loads graphs that are not currently in memory. +The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This loading time can cause high latency during initial queries. To avoid this situation, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort. + +As an alternative, you can run the k-NN plugin's warmup API on whatever indices you are interested in searching over. This API loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, you can start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs not currently in memory. ## Usage -This command will perform warmup on index1, index2, and index3: -``` +This request performs a warmup on three indices: + +```json GET /_opendistro/_knn/warmup/index1,index2,index3?pretty { "_shards" : { @@ -26,15 +27,16 @@ GET /_opendistro/_knn/warmup/index1,index2,index3?pretty } } ``` -`total` indicates how many shards the warmup operation was performed on. `successful` indicates how many shards succeeded and `failed` indicates how many shards have failed. -The call will not return until the warmup operation is complete or the request times out. If the request times out, the operation will still be going on in the cluster. To monitor this, use the Elasticsearch `_tasks` API. +`total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up. + +The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues in the cluster. To monitor this, use the Elasticsearch `_tasks` API. -Following the completion of the operation, use the k-NN `_stats` API to see what has been loaded into the graph. +Following the completion of the operation, use the k-NN `_stats` API to see what the k-NN plugin loaded into the graph. ## Best practices -In order for the warmup API to function properly, you need to follow a few best practices. First, you should not be running any merge operations on the indices you want to warm up. The reason for this is that, during merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. You may see the situation where the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries will still be present. +In order for the warmup API to function properly, you need to follow a few best practices. First, do not run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries is still be present. -Second, you should first confirm that all of the graphs of interest can fit into native memory before running warmup. If they cannot all fit into memory, the cache will thrash. +Second, confirm that all graphs you want to warm up fit into native memory. See the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings) for more information about the native memory limit. High graph memory usage causes cache thrashing. -Lastly, you should not index any documents you want to load into the cache. Writing new information to segments prevents the Warmup API from loading the graphs until they are searchable, so you would have to run the Warmup API again after indexing is complete. +Lastly, do not index any documents you want to load into the cache. Writing new information to segments prevents the warmup API from loading the graphs until they are searchable, so you would have to run the warmup API again after indexing finishes. From 5e3bd7ceb4dc8f3d8b712bee167b6d143ff10733 Mon Sep 17 00:00:00 2001 From: keithhc2 Date: Mon, 25 Jan 2021 16:35:47 -0800 Subject: [PATCH 06/28] Added information about _tasks API --- docs/knn/settings.md | 55 +++++++++++++++++++++++++++++++++++++++++++- docs/knn/warmup.md | 10 ++++---- 2 files changed, 59 insertions(+), 6 deletions(-) diff --git a/docs/knn/settings.md b/docs/knn/settings.md index 5be12c4c..4fc4d53d 100644 --- a/docs/knn/settings.md +++ b/docs/knn/settings.md @@ -1,6 +1,6 @@ --- layout: default -title: Settings and Statistics +title: Settings, Statistics, and Tasks parent: KNN nav_order: 10 --- @@ -64,3 +64,56 @@ Statistic | Description `script_compilation_errors` | The number of errors during script compilation. `script_query_requests` | The number of query requests that use [the KNN script](../#custom-scoring). `script_query_errors` | The number of errors during script queries. + +## Tasks + +You can use the `_tasks` API to see what tasks are currently executing on your indices. + +```json +GET /_tasks +``` + +This sample request returns the tasks currently running on a node named `odfe-node1`. + +```json +GET /_tasks?nodes=odfe-node1 +{ + "nodes": { + "Mgqdm0r9SEGClWxp_RbnaQ": { + "name": "odfe-node1", + "transport_address": "sample_address", + "host": "sample_host", + "ip": "sample_ip", + "roles": [ + "data", + "ingest", + "master", + "remote_cluster_client" + ], + "tasks": { + "Mgqdm0r9SEGClWxp_RbnaQ:24578": { + "node": "Mgqdm0r9SEGClWxp_RbnaQ", + "id": 24578, + "type": "transport", + "action": "cluster:monitor/tasks/lists", + "start_time_in_millis": 1611612517044, + "running_time_in_nanos": 638700, + "cancellable": false, + "headers": {} + }, + "Mgqdm0r9SEGClWxp_RbnaQ:24579": { + "node": "Mgqdm0r9SEGClWxp_RbnaQ", + "id": 24579, + "type": "direct", + "action": "cluster:monitor/tasks/lists[n]", + "start_time_in_millis": 1611612517044, + "running_time_in_nanos": 222200, + "cancellable": false, + "parent_task_id": "Mgqdm0r9SEGClWxp_RbnaQ:24578", + "headers": {} + } + } + } + } +} +``` diff --git a/docs/knn/warmup.md b/docs/knn/warmup.md index 69b02cdb..83824a33 100644 --- a/docs/knn/warmup.md +++ b/docs/knn/warmup.md @@ -12,7 +12,7 @@ has_math: false The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This loading time can cause high latency during initial queries. To avoid this situation, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort. -As an alternative, you can run the k-NN plugin's warmup API on whatever indices you are interested in searching over. This API loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, you can start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs not currently in memory. +As an alternative, you can run the k-NN plugin's warmup API on whatever indices you are interested in searching. This API loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After the process completes, you can start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs not currently in memory. ## Usage This request performs a warmup on three indices: @@ -30,13 +30,13 @@ GET /_opendistro/_knn/warmup/index1,index2,index3?pretty `total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up. -The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues in the cluster. To monitor this, use the Elasticsearch `_tasks` API. +The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the [Elasticsearch `_tasks` API](../settings#tasks). -Following the completion of the operation, use the k-NN `_stats` API to see what the k-NN plugin loaded into the graph. +Following the completion of the operation, use the [k-NN `_stats` API](../settings#statistics) to see what the k-NN plugin loaded into the graph. ## Best practices -In order for the warmup API to function properly, you need to follow a few best practices. First, do not run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries is still be present. +For the warmup API to function properly, follow these best practices. First, do not run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and neither is graph C. In this case, the initial penalty for loading graph C is still present. -Second, confirm that all graphs you want to warm up fit into native memory. See the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings) for more information about the native memory limit. High graph memory usage causes cache thrashing. +Second, confirm that all graphs you want to warm up can fit into native memory. See the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings) for more information about the native memory limit. High graph memory usage causes cache thrashing. Lastly, do not index any documents you want to load into the cache. Writing new information to segments prevents the warmup API from loading the graphs until they are searchable, so you would have to run the warmup API again after indexing finishes. From 766561d3ddf10470c6b6b9950206a3804f2ee844 Mon Sep 17 00:00:00 2001 From: John Mazanec Date: Tue, 26 Jan 2021 15:57:49 -0800 Subject: [PATCH 07/28] refactor knn docs --- docs/knn/api.md | 129 +++++++++++++++ docs/knn/approximate-knn.md | 130 +++++++++++++++ docs/knn/index.md | 284 +-------------------------------- docs/knn/jni-library.md | 13 ++ docs/knn/knn-score-script.md | 186 +++++++++++++++++++++ docs/knn/painless-functions.md | 10 ++ docs/knn/performance-tuning.md | 97 +++++++++++ docs/knn/settings.md | 38 +---- 8 files changed, 573 insertions(+), 314 deletions(-) create mode 100644 docs/knn/api.md create mode 100644 docs/knn/approximate-knn.md create mode 100644 docs/knn/jni-library.md create mode 100644 docs/knn/knn-score-script.md create mode 100644 docs/knn/painless-functions.md create mode 100644 docs/knn/performance-tuning.md diff --git a/docs/knn/api.md b/docs/knn/api.md new file mode 100644 index 00000000..8140fbdd --- /dev/null +++ b/docs/knn/api.md @@ -0,0 +1,129 @@ +--- +layout: default +title: API +nav_order: 4 +parent: KNN +has_children: false +--- + +# API +The k-NN plugin adds two APIs in order to allow users to better manage the plugin's functionality. + +## Stats +The KNN Stats API provides information about the current status of the k-NN Plugin. The plugin keeps track of both cluster level and node level stats. Cluster level stats have a single value for the entire cluster. Node level stats have a single value for each node in the cluster. A user can filter their query by nodeID and statName in the following way: +``` +GET /_opendistro/_knn/nodeId1,nodeId2/stats/statName1,statName2 +``` + +Statistic | Description +:--- | :--- +`circuit_breaker_triggered` | Indicates whether the circuit breaker is triggered. This is only relevant to approximate k-NN search. +`total_load_time` | The time in nanoseconds that KNN has taken to load graphs into the cache. This is only relevant to approximate k-NN search. +`eviction_count` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. *note:* explicit evictions that occur because of index deletion are not counted. This is only relevant to approximate k-NN search. +`hit_count` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. This is only relevant to approximate k-NN search. +`miss_count` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. This is only relevant to approximate k-NN search. +`graph_memory_usage` | Current cache size (total size of all graphs in memory) in kilobytes. This is only relevant to approximate k-NN search. +`graph_memory_usage_percentage` | The current weight of the cache as a percentage of the maximum cache capacity. +`graph_index_requests` | The number of requests to add the knn_vector field of a document into a graph. +`graph_index_errors` | The number of requests to add the knn_vector field of a document into a graph that have produced an error. +`graph_query_requests` | The number of graph queries that have been made. +`graph_query_errors` | The number of graph queries that have produced an error. +`knn_query_requests` | The number of KNN query requests received. +`cache_capacity_reached` | Whether `knn.memory.circuit_breaker.limit` has been reached. This is only relevant to approximate k-NN search. +`load_success_count` | The number of times KNN successfully loaded a graph into the cache. This is only relevant to approximate k-NN search. +`load_exception_count` | The number of times an exception occurred when trying to load a graph into the cache. This is only relevant to approximate k-NN search. +`indices_in_cache` | For each index that has graphs in the cache, this stat provides the number of graphs that index has and the total graph_memory_usage that index is using in Kilobytes. +`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. This is only relevant to k-NN score script search. +`script_compilation_errors` | The number of errors during script compilation. This is only relevant to k-NN score script search. +`script_query_requests` | The total number of script queries. This is only relevant to k-NN score script search. +`script_query_errors` | The number of errors during script queries. This is only relevant to k-NN score script search. + +### Examples +``` + +GET /_opendistro/_knn/stats?pretty +{ + "_nodes" : { + "total" : 1, + "successful" : 1, + "failed" : 0 + }, + "cluster_name" : "_run", + "circuit_breaker_triggered" : false, + "nodes" : { + "HYMrXXsBSamUkcAjhjeN0w" : { + "eviction_count" : 0, + "miss_count" : 1, + "graph_memory_usage" : 1, + "graph_memory_usage_percentage" : 3.68, + "graph_index_requests" : 7, + "graph_index_errors" : 1, + "knn_query_requests" : 4, + "graph_query_requests" : 30, + "graph_query_errors" : 15, + "indices_in_cache" : { + "myindex" : { + "graph_memory_usage" : 2, + "graph_memory_usage_percentage" : 3.68, + "graph_count" : 2 + } + }, + "cache_capacity_reached" : false, + "load_exception_count" : 0, + "hit_count" : 0, + "load_success_count" : 1, + "total_load_time" : 2878745, + "script_compilations" : 1, + "script_compilation_errors" : 0, + "script_query_requests" : 534, + "script_query_errors" : 0 + } + } +} +``` + +``` +GET /_opendistro/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,graph_memory_usage?pretty +{ + "_nodes" : { + "total" : 1, + "successful" : 1, + "failed" : 0 + }, + "cluster_name" : "_run", + "circuit_breaker_triggered" : false, + "nodes" : { + "HYMrXXsBSamUkcAjhjeN0w" : { + "graph_memory_usage" : 1 + } + } +} +``` + +## Warmup +The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with the other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This can cause high latency during initial queries. To avoid this, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This process is indirect and requires extra effort. + +As an alternative, a user can run the warmup API on whatever indices they are interested in searching over. This API will load all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, a user will be able to start searching against their indices with no initial latency penalties. The warmup API is idempotent. If a segment's graphs are already loaded into memory, this operation will have no impact on them. It only loads graphs that are not currently in memory. + +### Example +This command will perform warmup on index1, index2, and index3: +``` +GET /_opendistro/_knn/warmup/index1,index2,index3?pretty +{ + "_shards" : { + "total" : 6, + "successful" : 6, + "failed" : 0 + } +} +``` +`total` indicates how many shards the warmup operation was performed on. `successful` indicates how many shards succeeded and `failed` indicates how many shards have failed. + +The call will not return until the warmup operation is complete or the request times out. If the request times out, the operation will still be going on in the cluster. To monitor this, use the Elasticsearch `_tasks` API. + +Following the completion of the operation, use the k-NN `_stats` API to see what has been loaded into the graph. + +### Best practices +In order for the warmup API to function properly, a few best practices should be followed. First, no merge operations should be currently running on the indices that will be warmed up. The reason for this is that, during merge, new segments are created and old segments are (sometimes) deleted. The situation may arise where the warmup API loads graphs A and B into native memory, but then segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries will still be present. + +Second, it should first be confirmed that all of the graphs of interest are able to fit into native memory before running warmup. If they all cannot fit into memory, then the cache will thrash. diff --git a/docs/knn/approximate-knn.md b/docs/knn/approximate-knn.md new file mode 100644 index 00000000..a7444674 --- /dev/null +++ b/docs/knn/approximate-knn.md @@ -0,0 +1,130 @@ +--- +layout: default +title: Approximate Search +nav_order: 1 +parent: KNN +has_children: false +has_math: true +--- + +# Approximate k-NN Search + +The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the HNSW algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach should be preferred. + +This plugin builds an HNSW graph of the vectors for each "knn-vector field"/"Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. These graphs are loaded into native memory during search and managed by a cache. To pre-load the graphs into memory, please refer to the [warmup API](../api#Warmup). In order to see what graphs are loaded in memory as well as other stats, please refer to the [stats API](../api#Stats). To learn more about segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters will be applied on the results produced by the approximate nearest neighbor search. + +## Get started with approximate k-NN + +To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with the index setting, `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index. + +Additionally, if you are using the approximate k-nearest neighbor method, you should specify `knn.space_type` to the space that you are interested in. This setting cannot be changed after it is set. Please refer to the [spaces section](#spaces) to see what spaces we support! By default, `index.knn.space_type` is `l2`. For more information on index settings, such as algorithm parameters that can be tweaked to tune performance, please refer to the [documentation](../settings#IndexSettings). + +Next, you must add one or more fields of the `knn_vector` data type. Here is an example that creates an index with two `knn_vector` fields and uses cosine similarity: + +```json +PUT my-knn-index-1 +{ + "settings": { + "index": { + "knn": true, + "knn.space_type": "cosinesimil" + } + }, + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2 + }, + "my_vector2": { + "type": "knn_vector", + "dimension": 4 + } + } + } +} +``` + +The `knn_vector` data type supports a vector of floats that can have a dimension of up to 10,000, as set by the dimension mapping parameter. + +In Elasticsearch, codecs handle the storage and retrieval of indices. The k-NN plugin uses a custom codec to write vector data to graphs so that the underlying k-NN search library can read it. +{: .tip } + +After you create the index, you can add some data to it: + +```json +POST _bulk +{ "index": { "_index": "my-knn-index-1", "_id": "1" } } +{ "my_vector1": [1.5, 2.5], "price": 12.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "2" } } +{ "my_vector1": [2.5, 3.5], "price": 7.1 } +{ "index": { "_index": "my-knn-index-1", "_id": "3" } } +{ "my_vector1": [3.5, 4.5], "price": 12.9 } +{ "index": { "_index": "my-knn-index-1", "_id": "4" } } +{ "my_vector1": [5.5, 6.5], "price": 1.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "5" } } +{ "my_vector1": [4.5, 5.5], "price": 3.7 } +{ "index": { "_index": "my-knn-index-1", "_id": "6" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 } +{ "index": { "_index": "my-knn-index-1", "_id": "7" } } +{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 } +{ "index": { "_index": "my-knn-index-1", "_id": "8" } } +{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } +{ "index": { "_index": "my-knn-index-1", "_id": "9" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } + +``` + +Then you can execute an approximate nearest neighbor search on the data using the `knn` query type: + +```json +GET my-knn-index-1/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector2": { + "vector": [2, 3, 5, 6], + "k": 2 + } + } + } +} +``` + +`k` is the number of neighbors the search of each graph will return. You must also include the `size` option. This will determine how many results the query will actually return. `k` results will be returned for each shard (and each segment) and `size` results for the entire query. The plugin supports a maximum `k` value of 10,000. + +### Using approximate k-NN with filters +If you use the `knn` query alongside filters or other clauses (e.g. `bool`, `must`, `match`), you might receive fewer than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1: + +```json +GET my-knn-index-1/_search +{ + "size": 2, + "query": { + "knn": { + "my_vector2": { + "vector": [2, 3, 5, 6], + "k": 2 + } + } + }, + "post_filter": { + "range": { + "price": { + "gte": 5, + "lte": 10 + } + } + } +} +``` + +## Spaces + +A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how Elasticsearch scores results, where a greater score equates to a better result. To convert distances to Elasticsearch scores, we take 1/(1 + distance). Currently, the k-NN plugin supports the following spaces: + +spaceType | Distance Function | Elasticsearch Score +:--- | :--- | :--- +l2 | \[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \] | 1 / (1 + Distance Function) +cosinesimil | \[ Distance(X, Y) = 1 - {A · B \over \|A\| · \|B\|} \] | 1 / (1 + Distance Function) diff --git a/docs/knn/index.md b/docs/knn/index.md index 9a0efc31..2a77b9cd 100644 --- a/docs/knn/index.md +++ b/docs/knn/index.md @@ -4,290 +4,14 @@ title: KNN nav_order: 50 has_children: true has_toc: false -has_math: true --- # k-NN -Short for *k-nearest neighbors*, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of points. To determine the neighbors, a user can specify the space (the distance function) they want to use to measure the distance between points. Currently, the k-NN plugin supports Euclidean, cosine similarity, and Hamming bit spaces. +Short for *k-nearest neighbors*, the k-NN plugin enables users to search for the k-nearest neighbors to a query point across an index of vectors. To determine the neighbors, a user can specify the space (the distance function) they want to use to measure the distance between points. -Use cases include recommendations (for example, an "other songs you might like" feature in a music application), image recognition, and fraud detection. For background information on the algorithm, see [Wikipedia](https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm). +Use cases include recommendations (for example, an "other songs you might like" feature in a music application), image recognition, and fraud detection. For background information on the k-NN search, see [Wikipedia](https://en.wikipedia.org/wiki/Nearest_neighbor_search). -The k-NN plugin supports two methods of k-NN search. The first method uses the HNSW algorithm to return the approximate k-nearest neighbors. This algorithm sacrifices indexing speed and search accuracy in return for lower latency and more scalable search. To learn more about the algorithm, please refer to [nmslib's documentation](https://github.com/nmslib/nmslib/) or [the paper introducing the algorithm](https://arxiv.org/abs/1603.09320). Currently, only the Euclidean and cosine similarity spaces are available for this method. +This plugin supports three different methods for obtaining the k-nearest neighbors from an index of vectors. The first method takes an approximate nearest neighbor approach; it uses the HNSW algorithm to return the approximate k-nearest neighbors to a query vector. This algorithm sacrifices indexing speed and search accuracy in return for lower latency and more scalable search. To learn more about the algorithm, please refer to [nmslib's documentation](https://github.com/nmslib/nmslib/) or [the paper introducing the algorithm](https://arxiv.org/abs/1603.09320). The second method extends Elasticsearch's script scoring functionality to execute a brute force, exact k-NN search. With this approach, users are able to run k-NN search on a subset of vectors in their index (sometimes referred to as a pre-filter search). The third method adds the distance functions as painless extensions that can be used in more complex combinations. -The second method extends Elasticsearch's custom scoring functionality to execute a brute force, exact k-NN search. With the custom scoring approach, users are able to run k-NN search on a subset of vectors in their index (sometimes referred to as a pre-filter search). The Euclidean, cosine similarity, and Hamming bit spaces are available in this method. - -For larger data sets, users should generally choose the approximate nearest neighbor method, because it scales much better. For smaller data sets, where a user may want to apply a filter, they should choose the custom scoring approach. - - -## Get started - -To use the k-NN plugin's search functionality, you must first create a k-NN index. If you want to use the approximate k-nearest neighbors method, you will need to set the index setting, `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index. If you only want to use the custom scoring method, you can use `false`. - -Additionally, if you are using the approximate k-nearest neighbor method, you should specify `knn.space_type` to the space that you are interested in. This is not necessary for the custom scoring approach, because space is set for each query. Currently, we only support two spaces in the approximate nearest neighbor method: `l2` to use Euclidean distance or `cosinesimil` to use cosine similarity. By default, `index.knn.space_type` is `l2`. - -For both methods, you must add one or more fields of the `knn_vector` data type. However, if you are using Hamming distance with the custom scoring method, you should use the long or binary field type. Here is an example that creates an index with two `knn_vector` fields and uses cosine similarity: - -```json -PUT my-knn-index-1 -{ - "settings": { - "index": { - "knn": true, - "knn.space_type": "cosinesimil" - } - }, - "mappings": { - "properties": { - "my_vector1": { - "type": "knn_vector", - "dimension": 2 - }, - "my_vector2": { - "type": "knn_vector", - "dimension": 4 - } - } - } -} -``` - -The `knn_vector` data type supports a single list of up to 10,000 floats, with the number of floats defined by the required dimension parameter. - -In Elasticsearch, codecs handle the storage and retrieval of indices. The k-NN plugin uses a custom codec to write vector data to a graph so that the underlying KNN search library can read it. -{: .tip } - -After you create the index, add some data to it: - -```json -POST _bulk -{ "index": { "_index": "my-knn-index-1", "_id": "1" } } -{ "my_vector1": [1.5, 2.5], "price": 12.2 } -{ "index": { "_index": "my-knn-index-1", "_id": "2" } } -{ "my_vector1": [2.5, 3.5], "price": 7.1 } -{ "index": { "_index": "my-knn-index-1", "_id": "3" } } -{ "my_vector1": [3.5, 4.5], "price": 12.9 } -{ "index": { "_index": "my-knn-index-1", "_id": "4" } } -{ "my_vector1": [5.5, 6.5], "price": 1.2 } -{ "index": { "_index": "my-knn-index-1", "_id": "5" } } -{ "my_vector1": [4.5, 5.5], "price": 3.7 } -{ "index": { "_index": "my-knn-index-1", "_id": "6" } } -{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 } -{ "index": { "_index": "my-knn-index-1", "_id": "7" } } -{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 } -{ "index": { "_index": "my-knn-index-1", "_id": "8" } } -{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } -{ "index": { "_index": "my-knn-index-1", "_id": "9" } } -{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } - -``` - -Then you can execute an approximate nearest neighbor search on the data using the `knn` query type: - -```json -GET my-knn-index-1/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector2": { - "vector": [2, 3, 5, 6], - "k": 2 - } - } - } -} -``` - -In this case, `k` is the number of neighbors you want the query to return, but you must also include the `size` option. Otherwise, you get `k` results for each shard (and each segment) rather than `k` results for the entire query. The plugin supports a maximum `k` value of 10,000. - -Additionally, you can execute an exact nearest neighbor search on the data using the `knn` script: - -```json -GET my-knn-index-1/_search -{ - "size": 4, - "query": { - "script_score": { - "query": { - "match_all": {} - }, - "script": { - "source": "knn_score", - "lang": "knn", - "params": { - "field": "my_vector2", - "query_value": [2, 3, 5, 6], - "space_type": "cosinesimil" - } - } - } - } -} -``` - -Euclidian distance formula: - -

- \[ Distance(X, Y) = \sqrt{\sum_{i=1}^n (X_i - Y_i)^2} \] -

- -Cosine similarity formula: - -

- \[ {A · B \over \|A\| · \|B\|} = - - {\sum_{i=1}^n (A_i · B_i) \over \sqrt{\sum_{i=1}^n A_i^2} · \sqrt{\sum_{i=1}^n B_i^2}}\] - where \(\|A\|\) and \(\|B\|\) represent normalized vectors. -

- -After calculations, the k-NN plugin performs the following conversions to get the final Elasticsearch score. Note that these conversions are how the k-NN plugin calculates approximate k-NN scores. For custom scoring conversions, see the [custom scoring](#Pre-filtering-with-custom-scoring) section. - -[Euclidian distance conversion](https://github.com/opendistro-for-elasticsearch/k-NN/blob/0da03b29f1367b7f555e14b4ea4002626160bb35/src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNWeight.java#L113): - -``` -Elasticsearch score = 1 / (1 + Euclidian distance) -``` - -[Cosine similarity conversion](https://github.com/opendistro-for-elasticsearch/k-NN/blob/0da03b29f1367b7f555e14b4ea4002626160bb35/src/main/java/com/amazon/opendistroforelasticsearch/knn/index/KNNWeight.java#L113): - -``` -Elasticsearch score = 1 / (1 + Cosine similarity) -``` - - -## Compound queries with KNN - -If you use the `knn` query alongside filters or other clauses (e.g. `bool`, `must`, `match`), you might receive fewer than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1: - -```json -GET my-knn-index-1/_search -{ - "size": 2, - "query": { - "knn": { - "my_vector2": { - "vector": [2, 3, 5, 6], - "k": 2 - } - } - }, - "post_filter": { - "range": { - "price": { - "gte": 5, - "lte": 10 - } - } - } -} -``` - - -## Pre-filtering with custom scoring - -The [previous example](#mixing-queries) shows a search that returns fewer than `k` results. If you want to avoid this situation, the custom scoring method lets you essentially invert the order of events. In other words, you can filter down the set of documents you want to execute the k-nearest neighbor search over. - -If you *only* want to use custom scoring, you can omit `"index.knn": true`. The benefit of this approach is faster indexing speed and lower memory usage, but you lose the ability to perform standard k-NN queries on the index. -{: .tip} - -This example shows a pre-filter approach to k-NN search with custom scoring. First, create the index: - -```json -PUT my-knn-index-2 -{ - "mappings": { - "properties": { - "my_vector": { - "type": "knn_vector", - "dimension": 2 - }, - "color": { - "type": "keyword" - } - } - } -} -``` - -Then add some documents: - -```json -POST _bulk -{ "index": { "_index": "my-knn-index-2", "_id": "1" } } -{ "my_vector": [1, 1], "color" : "RED" } -{ "index": { "_index": "my-knn-index-2", "_id": "2" } } -{ "my_vector": [2, 2], "color" : "RED" } -{ "index": { "_index": "my-knn-index-2", "_id": "3" } } -{ "my_vector": [3, 3], "color" : "RED" } -{ "index": { "_index": "my-knn-index-2", "_id": "4" } } -{ "my_vector": [10, 10], "color" : "BLUE" } -{ "index": { "_index": "my-knn-index-2", "_id": "5" } } -{ "my_vector": [20, 20], "color" : "BLUE" } -{ "index": { "_index": "my-knn-index-2", "_id": "6" } } -{ "my_vector": [30, 30], "color" : "BLUE" } - -``` - -Finally, use the `script_score` query to pre-filter your documents before identifying nearest neighbors: - -```json -GET my-knn-index-2/_search -{ - "size": 2, - "query": { - "script_score": { - "query": { - "bool": { - "filter": { - "term": { - "color": "BLUE" - } - } - } - }, - "script": { - "lang": "knn", - "source": "knn_score", - "params": { - "field": "my_vector", - "query_value": [9.9, 9.9], - "space_type": "l2" - } - } - } - } -} -``` - -All parameters are required. - -- `lang` is the script type. This value is usually `painless`, but here you must specify `knn`. -- `source` is the name of the script, `knn_score`. - - This script is part of the KNN plugin and isn't available at the standard `_scripts` path. A GET request to `_cluster/state/metadata` doesn't return it, either. - -- `field` is the field that contains your vector data. -- `query_value` is the point you want to find the nearest neighbors for. For the Euclidean and cosine similarity spaces, the value must be an array of floats that matches the dimension set in the field's mapping. For Hamming bit distance, this value can be either of type signed long or a base64-encoded string (for the long and binary field types, respectively). -- `space_type` is either `l2`, `cosinesimil`, or `hammingbit`. - -[Euclidian distance conversion](https://github.com/opendistro-for-elasticsearch/k-NN/blob/0da03b29f1367b7f555e14b4ea4002626160bb35/src/main/java/com/amazon/opendistroforelasticsearch/knn/plugin/script/KNNScoringSpace.java#L71): - -``` -Elasticsearch score = 1 / (1 + Euclidian distance) -``` - -[Cosine similarity conversion](https://github.com/opendistro-for-elasticsearch/k-NN/blob/3595be5b044205fbf5c02b2ecb68ff1df2a85b53/src/main/java/com/amazon/opendistroforelasticsearch/knn/plugin/script/KNNScoringSpace.java#L102): - -``` -Elasticsearch score = 1 + Cosine similarity -``` - -Cosine similarity returns a number between -1 and 1, and because Elasticsearch relevance scores can't be below 0, the k-NN plugin adds 1 to get the final score. - - -## Performance considerations - -The standard KNN query and custom scoring option perform differently. Test using a representative set of documents to see if the search results and latencies match your expectations. - -Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../elasticsearch/#primary-and-replica-shards). +For larger data sets, users should generally choose the approximate nearest neighbor method, because it scales significantly better. For smaller data sets, where a user may want to apply a filter, they should choose the custom scoring approach. If users have a more complex use case where they need to use a distance function as part of their scoring method, they should use the painless scripting approach. diff --git a/docs/knn/jni-library.md b/docs/knn/jni-library.md new file mode 100644 index 00000000..355b415d --- /dev/null +++ b/docs/knn/jni-library.md @@ -0,0 +1,13 @@ +--- +layout: default +title: JNI Library +nav_order: 5 +parent: KNN +has_children: false +--- + +# JNI Library +In order to integrate nmslib's approximate k-NN functionality, which is implemented in C++, into the k-NN plugin, which is implemented in Java, we created a Java Native Interface library. Check out [this wiki](https://en.wikipedia.org/wiki/Java_Native_Interface) to learn more about JNI. This library allows the k-NN plugin to leverage nmslib's functionality. + +## Artifacts +We build and distribute binary library artifacts with Opendistro for Elasticsearch. We build the library binary, RPM and DEB in [this GitHub action](https://github.com/opendistro-for-elasticsearch/k-NN/blob/master/.github/workflows/CD.yml). We use Centos 7 with g++ 4.8.5 to build the DEB, RPM and ZIP. Additionally, in order to provide as much general compatibility as possible, we compile the library without optimized instruction sets enabled. For users that want to get the most out of the library, they should build the library from source in their production environment, so that if their environment has optimized instruction sets, they take advantage of them. The documentation for this can be found [here](https://github.com/opendistro-for-elasticsearch/k-NN#jni-library-artifacts). diff --git a/docs/knn/knn-score-script.md b/docs/knn/knn-score-script.md new file mode 100644 index 00000000..b6f57555 --- /dev/null +++ b/docs/knn/knn-score-script.md @@ -0,0 +1,186 @@ +--- +layout: default +title: Exact k-NN with Scoring Script +nav_order: 2 +parent: KNN +has_children: false +has_math: true +--- + +# Exact k-NN with Scoring Script +The k-NN plugin implements the Elasticsearch score script plugin that can be used to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, a filter can be applied on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions. Because this approach executes a brute force search, it will not scale as well as the [Approximate approach](../approximate-knn). In some cases, it may be better to think about refactoring your workflow or index structure to use the Approximate approach instead of this approach. + +## Getting started with the score script + +Similar to approximate nearest neighbor search, in order to use the score script on a body of vectors, you first must create an index with one or more `knn_vector` fields. If you intend to just use the script score approach (and not the approximate approach) `index.knn` can be set to `false` and `index.knn.space_type` does not need to be set. The space type can be chosen during search. See the [spaces section](#spaces) to see what spaces the knn score script suppports. Here is an example that creates an index with two `knn_vector` fields: + +```json +PUT my-knn-index-1 +{ + "mappings": { + "properties": { + "my_vector1": { + "type": "knn_vector", + "dimension": 2 + }, + "my_vector2": { + "type": "knn_vector", + "dimension": 4 + } + } + } +} +``` + +*Note* -- For binary spaces, such as the Hamming bit space, the type needs to be either `binary` or `long`. The binary data then needs to be encoded either as a base64 string or as a long (if the data is 64 bits or less). + +If you *only* want to use the score script, you can omit `"index.knn": true`. The benefit of this approach is faster indexing speed and lower memory usage, but you lose the ability to perform standard k-NN queries on the index. +{: .tip} + +After you create the index, you can add some data to it: + +```json +POST _bulk +{ "index": { "_index": "my-knn-index-1", "_id": "1" } } +{ "my_vector1": [1.5, 2.5], "price": 12.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "2" } } +{ "my_vector1": [2.5, 3.5], "price": 7.1 } +{ "index": { "_index": "my-knn-index-1", "_id": "3" } } +{ "my_vector1": [3.5, 4.5], "price": 12.9 } +{ "index": { "_index": "my-knn-index-1", "_id": "4" } } +{ "my_vector1": [5.5, 6.5], "price": 1.2 } +{ "index": { "_index": "my-knn-index-1", "_id": "5" } } +{ "my_vector1": [4.5, 5.5], "price": 3.7 } +{ "index": { "_index": "my-knn-index-1", "_id": "6" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 10.3 } +{ "index": { "_index": "my-knn-index-1", "_id": "7" } } +{ "my_vector2": [2.5, 3.5, 5.6, 6.7], "price": 5.5 } +{ "index": { "_index": "my-knn-index-1", "_id": "8" } } +{ "my_vector2": [4.5, 5.5, 6.7, 3.7], "price": 4.4 } +{ "index": { "_index": "my-knn-index-1", "_id": "9" } } +{ "my_vector2": [1.5, 5.5, 4.5, 6.4], "price": 8.9 } + +``` + +Finally, you can execute an exact nearest neighbor search on the data using the `knn` script: +```json +GET my-knn-index-1/_search +{ + "size": 4, + "query": { + "script_score": { + "query": { + "match_all": {} + }, + "script": { + "source": "knn_score", + "lang": "knn", + "params": { + "field": "my_vector2", + "query_value": [2, 3, 5, 6], + "space_type": "cosinesimil" + } + } + } + } +} +``` + +All parameters are required. + +- `lang` is the script type. This value is usually `painless`, but here you must specify `knn`. +- `source` is the name of the script, `knn_score`. + + This script is part of the KNN plugin and isn't available at the standard `_scripts` path. A GET request to `_cluster/state/metadata` doesn't return it, either. + +- `field` is the field that contains your vector data. +- `query_value` is the point you want to find the nearest neighbors for. For the Euclidean and cosine similarity spaces, the value must be an array of floats that matches the dimension set in the field's mapping. For Hamming bit distance, this value can be either of type signed long or a base64-encoded string (for the long and binary field types, respectively). +- `space_type` is corresponds to the distance function. See [the spaces section](#spaces). + + +*Note* -- After ODFE 1.11, `vector` was replaced by `query_value` due to the addition of the `bithamming` space. + + +The [post filter example in the approximate approach](../approximate-knn#UsingApproximatek-NNWithFilters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents you want to execute the k-nearest neighbor search over. + +This example shows a pre-filter approach to k-NN search with the score script approach. First, create the index: + +```json +PUT my-knn-index-2 +{ + "mappings": { + "properties": { + "my_vector": { + "type": "knn_vector", + "dimension": 2 + }, + "color": { + "type": "keyword" + } + } + } +} +``` + +Then add some documents: + +```json +POST _bulk +{ "index": { "_index": "my-knn-index-2", "_id": "1" } } +{ "my_vector": [1, 1], "color" : "RED" } +{ "index": { "_index": "my-knn-index-2", "_id": "2" } } +{ "my_vector": [2, 2], "color" : "RED" } +{ "index": { "_index": "my-knn-index-2", "_id": "3" } } +{ "my_vector": [3, 3], "color" : "RED" } +{ "index": { "_index": "my-knn-index-2", "_id": "4" } } +{ "my_vector": [10, 10], "color" : "BLUE" } +{ "index": { "_index": "my-knn-index-2", "_id": "5" } } +{ "my_vector": [20, 20], "color" : "BLUE" } +{ "index": { "_index": "my-knn-index-2", "_id": "6" } } +{ "my_vector": [30, 30], "color" : "BLUE" } + +``` + +Finally, use the `script_score` query to pre-filter your documents before identifying nearest neighbors: + +```json +GET my-knn-index-2/_search +{ + "size": 2, + "query": { + "script_score": { + "query": { + "bool": { + "filter": { + "term": { + "color": "BLUE" + } + } + } + }, + "script": { + "lang": "knn", + "source": "knn_score", + "params": { + "field": "my_vector", + "query_value": [9.9, 9.9], + "space_type": "l2" + } + } + } + } +} +``` + +## Spaces + +A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how Elasticsearch scores results, where a greater score equates to a better result. We include the conversion to Elasticsearch scores in the table below:: + +spaceType | Distance Function | Elasticsearch Score +:--- | :--- | :--- +l2 | \[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \] | 1 / (1 + Distance Function) +cosinesimil | \[ Distance(X, Y) = {A · B \over \|A\| · \|B\|} \] | 1 + Distance Function) +hammingbit | Distance = ones(X xor Y) | 1/(1 + Distance Function) + + +Cosine similarity returns a number between -1 and 1, and because Elasticsearch relevance scores can't be below 0, the k-NN plugin adds 1 to get the final score. diff --git a/docs/knn/painless-functions.md b/docs/knn/painless-functions.md new file mode 100644 index 00000000..af8c632f --- /dev/null +++ b/docs/knn/painless-functions.md @@ -0,0 +1,10 @@ +--- +layout: default +title: k-NN Painless Extensions +nav_order: 3 +parent: KNN +has_children: false +has_math: true +--- + + \ No newline at end of file diff --git a/docs/knn/performance-tuning.md b/docs/knn/performance-tuning.md new file mode 100644 index 00000000..634d1ac6 --- /dev/null +++ b/docs/knn/performance-tuning.md @@ -0,0 +1,97 @@ +--- +layout: default +title: Performance Tuning +parent: KNN +nav_order: 7 +--- + +# Performance Tuning + +This section provides recommendations for performance tuning to improve indexing/search performance for approximate k-NN. From a high level k-NN works on following principles: +* Graphs are created per knn_vector field / (Lucene) segment pair +* Queries execute on segments sequentially inside the shard (same as any other Elasticsearch query) +* Each graph in the segment returns <=k neighbors. +* Coordinator node picks up final size number of neighbors from the neighbors returned by each shard + +Additionally, this section provides recommendations for comparing approximate k-NN to exact k-NN with score script. + +## Indexing Performance Tuning + +The following steps can be taken to help improve indexing performance, especially when you plan to index large number of vectors at once: +1. Disable refresh interval (Default = 1 sec) or set a long duration for refresh interval to avoid creating multiple small segments +``` +PUT //_settings +{ + "index" : { + "refresh_interval" : "-1" + } +} +``` +*Note* -- Be sure to reenable refresh_interval after indexing finishes. + +2. Disable Replicas (No Elasticsearch replica shard). + +Having replicas set to 0, will avoid duplicate construction of graphs in both primary and replicas. When we enable replicas after the indexing, the serialized graphs are directly copied. Having no replicas means that losing a node(s) may incur data loss, so it is important that the data lives elsewhere so that this initial load can be retried in case of an issue. + +3. Increase number of indexing threads + +If the hardware we choose has multiple cores, we could allow multiple threads in graph construction and there by speed up the indexing process. You could determine the number of threads to be alloted by using the [knnalgo_paramindex_thread_qty]() setting. + +Please keep an eye on CPU utilization and choose right number of threads. Since graph construction is costly, having multiple threads can put additional load on CPU. + +## Search Performance Tuning + +1. Have fewer segments + +To improve Search performance it is necessary to keep the number of segments under control. Lucene's IndexSearcher will search over all of the segments in a shard to find the 'size' best results. But, because the complexity of search for the HNSW algorithm is logarithmic with respect to the number of vectors, searching over 5 graphs with a 100 vectors each and then taking the top size results from 5*k results will take longer than searching over 1 graph with 500 vectors and then taking the top size results from k results. Ideally having 1 segment per shard will give the optimal performance with respect to search latency. We could configure index to have multiple shards to aviod having giant shards and achieve more parallelism. + +We can control the number of segments either during indexing by asking Elasticsearch to slow down the segment creation by disabling the refresh interval or choosing larger refresh interval. + +2. Warm up the index + +The graphs are constructed during indexing, but they are loaded into memory during the first search. The way search works in Lucene is that each segment is searched sequentially (so, for k-NN, each segment returns up to k nearest neighbors of the query point) and the top size number of results based on the score would be returned from all of the results returned by segements at a shard level(higher score --> better result). + +Once a graph is loaded(graphs are loaded outside Elasticsearch JVM), we cache the graphs in memory. So the initial queries would be expensive in the order of few seconds and subsequent queries should be faster in the order of milliseconds(assuming knn circuit breaker is not hit). + +In order to avoid this latency penalty during your first queries, a user should use the warmup API on the indices they want to search. The API looks like this: + +``` +GET /_opendistro/_knn/warmup/index1,index2,index3?pretty +{ + "_shards" : { + "total" : 6, + "successful" : 6, + "failed" : 0 + } +} +``` + +The API loads all of the graphs for all of the shards (primaries and replicas) for the specified indices into the cache. Thus, there will be no penalty to load graphs during initial searches. *Note — * this API only loads the segments of the indices it sees into the cache. If a merge or refresh operation finishes after this API is ran or if new documents are added, this API will need to be re-ran to load those graphs into memory. + +3. Avoid reading stored fields + +If the use case is to just read the nearest neighbors Ids and scores, then we could disable reading stored fields which could save some time retrieving the vectors from stored fields. + +## Improving Recall + +Recall could depend on multiple factors like number of vectors, number of dimensions, segments etc. Searching over large number of small segments and aggregating the results leads better recall than searching over small number of large segments and aggregating results. The larger the graph the more chances of losing recall if sticking to smaller algorithm parameters. Choosing larger values for algorithm params should help solve this issue but at the cost of search latency and indexing time. That being said, it is important to understand your system's requirements for latency and accuracy, and then to choose the number of segments you want your index to have based on experimentation. + +Recall can be configured by adjusting the algorithm parameters of hnsw algorithm exposed through index settings. Algorithm params that control the recall are m, ef_construction, ef_search. For more details on influence of algorithm parameters on the indexing, search recall, please refer this doc (https://github.com/nmslib/hnswlib/blob/master/ALGO_PARAMS.md). Increasing these values could help recall(better search results) but at the cost of higher memory utilization and increased indexing time. Our default values work on a broader set of use cases from our experiments but we encourage users to run their own experiments on their data sets and choose the appropriate values. You could refer to these settings in this section (https://github.com/opendistro-for-elasticsearch/k-NN#index-level-settings). We will add details on our experiments shortly here. + +## Estimating Memory Usage +Typically, in an Elasticsearch cluster, a certain portion of RAM is set aside for the JVM heap. The k-NN plugin allocates graphs in a portion of the remaining RAM. This portion's size is determined by the circuit_breaker_limit cluster setting. By default, the circuit breaker limit is set at 50%. + +The memory require for graphs can be estimated to be `1.1 * (4 * dimension + 8 * M)` bytes/vector. + +As an example, assume that we have 1 Million vectors with dimension of 256 and M of 16, the memory required could be estimated as: +``` +1.1 * (4 *256 + 8 * 16) * 1,000,000 ~= 1.26 GB +``` + +*Note* -- Remember, when having a replica will double the total number of vectors. + +## Approximate nearest neighbor vs. score script + +The standard KNN query and custom scoring option perform differently. Test using a representative set of documents to see if the search results and latencies match your expectations. + +Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../elasticsearch/#primary-and-replica-shards). diff --git a/docs/knn/settings.md b/docs/knn/settings.md index 5be12c4c..4e13787e 100644 --- a/docs/knn/settings.md +++ b/docs/knn/settings.md @@ -1,13 +1,13 @@ --- layout: default -title: Settings and Statistics +title: Settings parent: KNN -nav_order: 10 +nav_order: 6 --- -# KNN Settings and statistics +# KNN Settings -The KNN plugin adds several new index settings, cluster settings, and statistics. +The KNN plugin adds several new index and cluster settings. ## Index settings @@ -34,33 +34,3 @@ Setting | Default | Description `knn.memory.circuit_breaker.limit` | 60% | The native memory limit for graphs. At the default value, if a machine has 100 GB of memory and the JVM uses 32 GB, KNN uses 60% of the remaining 68 GB (40.8 GB). If memory usage exceeds this value, KNN removes the least recently used graphs. `knn.memory.circuit_breaker.enabled` | true | Whether to enable the KNN memory circuit breaker. `knn.plugin.enabled`| true | Enables or disables the KNN plugin. - - -## Statistics - -KNN includes statistics that can give you a sense for how the plugin is performing: - -``` -GET _opendistro/_knn/stats -``` - -You can also filter by node and/or statistic: - -``` -GET /_opendistro/_knn/nodeId1,nodeId2/stats/statName1,statName2 -``` - -Statistic | Description -:--- | :--- -`totalLoadTime` | The time in nanoseconds that KNN has taken to load graphs into the cache. -`evictionCount` | The number of graphs that have been evicted from the cache due to memory constraints or idle time. -`hitCount` | The number of cache hits. A cache hit occurs when a user queries a graph and it is already loaded into memory. -`cacheCapacityReached` | Whether `knn.memory.circuit_breaker.limit` has been reached. -`loadSuccessCount` | The number of times KNN successfully loaded a graph into the cache. -`graphMemoryUsage` | Current cache size (total size of all graphs in memory) in kilobytes. -`missCount` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory. -`loadExceptionCount` | The number of times an exception occurred when trying to load a graph into the cache. -`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled. -`script_compilation_errors` | The number of errors during script compilation. -`script_query_requests` | The number of query requests that use [the KNN script](../#custom-scoring). -`script_query_errors` | The number of errors during script queries. From 9ed5ebe2dd871c885b70e59075a59242f68c03bc Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Wed, 27 Jan 2021 21:35:09 -0800 Subject: [PATCH 08/28] minor fixes --- docs/ism/api.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/ism/api.md b/docs/ism/api.md index 72d694e9..48af2606 100644 --- a/docs/ism/api.md +++ b/docs/ism/api.md @@ -396,11 +396,11 @@ POST _opendistro/_ism/remove/index_1 ## Update managed index policy -Updates the managed index policy to a new policy (or to a new version of the policy). You can use an index pattern to update multiple indices at once. When updating multiple indices, you might want to include a state filter to only affect certain managed indices. This will filter out all the existing managed indices and only apply the change to the ones in the specified state. You can also directly set the state the managed index should transition to after the change policy happens. +Updates the managed index policy to a new policy (or to a new version of the policy). You can use an index pattern to update multiple indices at once. When updating multiple indices, you might want to include a state filter to only affect certain managed indices. The change policy filters out all the existing managed indices and only applies the change to the ones in the state that you specify. You can also explicitly specify the state that the managed index transitions to after the change policy takes effect. -The changing of a policy is an asynchronous background process. The change is queued up to happen instead of taking affect immediately. This is to protect the currently running managed indices from being put into a broken state. If the policy you are changing to only has some small configuration changes such as changing `min_index_age` in the rollover condition from `"1000d"` to `"100d"` then the change will happen immediately on the next execution. If the change modifies the state, actions, or order of actions of the current state the index is in then it will happen at the end of the current state before transitioning to a new state. +A policy change is an asynchronous background process. The changes are queued and are not executed immediately by the background process. This delay in execution protects the currently running managed indices from being put into a broken state. If the policy you are changing to has only some small configuration changes, then the change takes place immediately. For example, if the policy changes the `min_index_age` parameter in a rollover condition from `1000d` to `100d`, this change takes place immediately in its next execution. If the change modifies the state, actions, or the order of actions of the current state the index is in, then the change happens at the end of its current state before transitioning to a new state. -The example below is changing the policy on the index `index_1` to `policy_1` which could be a completely new policy or an updated version of the existing policy. It will only apply the change if the index is currently in the `searches` state. And once the change goes through it will transition to the `delete` state and start from there. +In this example, the policy applied on the `index_1` index is changed to `policy_1`, which could either be a completely new policy or an updated version of its existing policy. The process only applies the change if the index is currently in the `searches` state. After this change in policy takes place, `index_1` transitions to the `delete` state. #### Request From 523b79e4589c321466cda29d3a061c79cf93940b Mon Sep 17 00:00:00 2001 From: Simon Thorley Date: Fri, 29 Jan 2021 09:20:15 +0000 Subject: [PATCH 09/28] Corrected ISM settings override template --- docs/ism/settings.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/ism/settings.md b/docs/ism/settings.md index dc14589f..4e0e740a 100644 --- a/docs/ism/settings.md +++ b/docs/ism/settings.md @@ -38,9 +38,11 @@ PUT _index_template/ism_history_indices "index_patterns": [ ".opendistro-ism-managed-index-history-*" ], - "settings": { - "number_of_shards": 1, - "number_of_replicas": 0 + “template”: { + "settings": { + "number_of_shards": 1, + "number_of_replicas": 0 + } } } ``` From 641a2ca76e9f21760e99d86d01ccb05d1a8ded1b Mon Sep 17 00:00:00 2001 From: Simon Thorley Date: Fri, 29 Jan 2021 09:27:03 +0000 Subject: [PATCH 10/28] Corected double quote char --- docs/ism/settings.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/ism/settings.md b/docs/ism/settings.md index 4e0e740a..c09b24b7 100644 --- a/docs/ism/settings.md +++ b/docs/ism/settings.md @@ -38,7 +38,7 @@ PUT _index_template/ism_history_indices "index_patterns": [ ".opendistro-ism-managed-index-history-*" ], - “template”: { + "template": { "settings": { "number_of_shards": 1, "number_of_replicas": 0 From 766335c49c5144426122e81c0ee4ba49108b2559 Mon Sep 17 00:00:00 2001 From: John Mazanec Date: Fri, 29 Jan 2021 10:00:05 -0800 Subject: [PATCH 11/28] update tables --- docs/knn/knn-score-script.md | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-) diff --git a/docs/knn/knn-score-script.md b/docs/knn/knn-score-script.md index b6f57555..836e9404 100644 --- a/docs/knn/knn-score-script.md +++ b/docs/knn/knn-score-script.md @@ -8,7 +8,7 @@ has_math: true --- # Exact k-NN with Scoring Script -The k-NN plugin implements the Elasticsearch score script plugin that can be used to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, a filter can be applied on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions. Because this approach executes a brute force search, it will not scale as well as the [Approximate approach](../approximate-knn). In some cases, it may be better to think about refactoring your workflow or index structure to use the Approximate approach instead of this approach. +The k-NN plugin implements the Elasticsearch score script plugin that can be used to find the exact k-nearest neighbors to a given query point. Using the k-NN score script, a filter can be applied on an index before executing the nearest neighbor search. This is useful for dynamic search cases where the index body may vary based on other conditions. Because this approach executes a brute force search, it will not scale as well as the [Approximate approach](../approximate-knn). In some cases, it may be better to think about refactoring your workflow or index structure to use the Approximate approach instead of this approach. ## Getting started with the score script @@ -176,11 +176,32 @@ GET my-knn-index-2/_search A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how Elasticsearch scores results, where a greater score equates to a better result. We include the conversion to Elasticsearch scores in the table below:: -spaceType | Distance Function | Elasticsearch Score -:--- | :--- | :--- -l2 | \[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \] | 1 / (1 + Distance Function) -cosinesimil | \[ Distance(X, Y) = {A · B \over \|A\| · \|B\|} \] | 1 + Distance Function) -hammingbit | Distance = ones(X xor Y) | 1/(1 + Distance Function) + + + + + + + + + + + + + + + + + + + + + + + +
spaceTypeDistance FunctionElasticsearch Score
l2\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \]1 / (1 + Distance Function)
cosinesimil\[ {A · B \over \|A\| · \|B\|} = + {\sum_{i=1}^n (A_i · B_i) \over \sqrt{\sum_{i=1}^n A_i^2} · \sqrt{\sum_{i=1}^n B_i^2}}\] + where \(\|A\|\) and \(\|B\|\) represent normalized vectors.1 + Distance Function
hammingbitDistance = countSetBits(X \(\oplus\) Y) 1 / (1 + Distance Function)
Cosine similarity returns a number between -1 and 1, and because Elasticsearch relevance scores can't be below 0, the k-NN plugin adds 1 to get the final score. From bc5040eb70ecb333dcc43036fa62d226a0bbb003 Mon Sep 17 00:00:00 2001 From: aetter Date: Mon, 1 Feb 2021 09:35:16 -0800 Subject: [PATCH 12/28] Correct plugin strings --- docs/kibana/plugins.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/docs/kibana/plugins.md b/docs/kibana/plugins.md index 7856c1a2..c44d3857 100644 --- a/docs/kibana/plugins.md +++ b/docs/kibana/plugins.md @@ -46,23 +46,23 @@ opendistroTraceAnalyticsKibana 1.12.0.0 7.9.1 -
opendistro-anomaly-detection-kibana    1.10.1.0, 1.11.0.0
-opendistro_alerting-kibana             1.10.1.1, 1.11.0.2
-opendistro_index_management-kibana     1.10.1.0, 1.11.0.0
-opendistro_security_kibana             1.10.1.1, 1.11.0.0
-opendistro-query-workbench             1.11.0.0
-opendistro-notebooks-kibana            1.11.0.0
+      
opendistro-anomaly-detection-kibana  1.10.1.0, 1.11.0.0
+opendistro-alerting                  1.10.1.1, 1.11.0.2
+opendistro_index_management_kibana   1.10.1.0, 1.11.0.0
+opendistro_security_kibana           1.10.1.1, 1.11.0.0
+opendistro-query-workbench           1.11.0.0
+opendistro-notebooks-kibana          1.11.0.0
 
7.8.0 -
opendistro-anomaly-detection-kibana    1.9.0.0
-opendistro_alerting-kibana             1.9.0.0
-opendistro_index_management-kibana     1.9.0.0
-opendistro_security_kibana             1.9.0.0
-opendistro_sql_workbench               1.9.0.0
+      
opendistro-anomaly-detection-kibana  1.9.0.0
+opendistro-alerting                  1.9.0.0
+opendistro_index_management_kibana   1.9.0.0
+opendistro_security_kibana_plugin    1.9.0.0
+opendistro-sql-workbench             1.9.0.0
 
From cf0f7ebedcd272410532b698e308b1b59120dc3f Mon Sep 17 00:00:00 2001 From: aetter Date: Mon, 1 Feb 2021 09:38:20 -0800 Subject: [PATCH 13/28] Update plugins.md --- docs/kibana/plugins.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/kibana/plugins.md b/docs/kibana/plugins.md index c44d3857..539d1af9 100644 --- a/docs/kibana/plugins.md +++ b/docs/kibana/plugins.md @@ -49,7 +49,7 @@ opendistroTraceAnalyticsKibana 1.12.0.0
opendistro-anomaly-detection-kibana  1.10.1.0, 1.11.0.0
 opendistro-alerting                  1.10.1.1, 1.11.0.2
 opendistro_index_management_kibana   1.10.1.0, 1.11.0.0
-opendistro_security_kibana           1.10.1.1, 1.11.0.0
+opendistro_security_kibana_plugin    1.10.1.1, 1.11.0.0
 opendistro-query-workbench           1.11.0.0
 opendistro-notebooks-kibana          1.11.0.0
 
From 11127d6e2fd11ff7e296cdf90c99d2bc948e1b72 Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Mon, 1 Feb 2021 10:09:30 -0800 Subject: [PATCH 14/28] Historical detector updates --- docs/ad/api.md | 577 +++++++++++++++++++++++++++++++++++++++++++- docs/ad/index.md | 18 +- docs/ad/settings.md | 4 + 3 files changed, 593 insertions(+), 6 deletions(-) diff --git a/docs/ad/api.md b/docs/ad/api.md index df84bd6f..0996d90e 100644 --- a/docs/ad/api.md +++ b/docs/ad/api.md @@ -238,6 +238,56 @@ POST _opendistro/_anomaly_detection/detectors } ``` +To create a historical detector: + +#### Request + +```json +POST _opendistro/_anomaly_detection/detectors +{ + "name": "test1", + "description": "test historical detector", + "time_field": "timestamp", + "indices": [ + "nab_art_daily_jumpsdown" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "feature_attributes": [ + { + "feature_name": "F1", + "feature_enabled": true, + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "detection_date_range": { + "start_time": 1577840401000, + "end_time": 1606121925000 + } +} +``` + You can specify the following options. Options | Description | Type | Required @@ -251,6 +301,7 @@ Options | Description | Type | Required `detection_interval` | The time interval for your anomaly detector. | `object` | Yes `window_delay` | Add extra processing time for data collection. | `object` | No `category_field` | Categorizes or slices data with a dimension. Similar to `GROUP BY` in SQL. | `list` | No +`detection_date_range` | Specify the start time and end time for a historical detector. | `object` | No --- @@ -395,7 +446,8 @@ If you specify a category field, each result is associated with an entity: ## Start detector job -Starts an anomaly detector job. +Starts a real-time or historical detector job. + #### Request @@ -419,7 +471,7 @@ POST _opendistro/_anomaly_detection/detectors//_start ## Stop detector job -Stops an anomaly detector job. +Stops a real-time or historical anomaly detector job. #### Request @@ -721,14 +773,159 @@ POST _opendistro/_anomaly_detection/detectors/results/_search ] } } -... ``` +In historical detectors, specify the `detector_id`: + +#### Request + +```json +GET _opendistro/_anomaly_detection/detectors/results/_search +{ + "query": { + "term": { + "detector_id": { + "value": "dZc8WncBgO2zoQoFWVBA" + } + } + } +} +``` + +#### Sample response + +```json +{ + "took": 1, + "timed_out": false, + "_shards": { + "total": 1, + "successful": 1, + "skipped": 0, + "failed": 0 + }, + "hits": { + "total": { + "value": 1, + "relation": "eq" + }, + "max_score": 2.1366, + "hits": [ + { + "_index": ".opendistro-anomaly-detection-state", + "_type": "_doc", + "_id": "CoM8WncBtt2qvI-LZO7_", + "_version": 8, + "_seq_no": 1351, + "_primary_term": 3, + "_score": 2.1366, + "_source": { + "detector_id": "dZc8WncBgO2zoQoFWVBA", + "worker_node": "dk6-HuKQRMKm2fi8TSDHsg", + "task_progress": 0.09486946, + "last_update_time": 1612126667008, + "execution_start_time": 1612126643455, + "state": "RUNNING", + "coordinating_node": "gs213KqjS4q7H4Bmn_ZuLA", + "current_piece": 1583503800000, + "task_type": "HISTORICAL", + "started_by": "admin", + "init_progress": 1, + "is_latest": true, + "detector": { + "description": "test", + "ui_metadata": { + "features": { + "F1": { + "aggregationBy": "sum", + "aggregationOf": "value", + "featureType": "simple_aggs" + } + } + }, + "detection_date_range": { + "start_time": 1580504240308, + "end_time": 1612126640308 + }, + "feature_attributes": [ + { + "feature_id": "dJc8WncBgO2zoQoFWVAt", + "feature_enabled": true, + "feature_name": "F1", + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "schema_version": 0, + "time_field": "timestamp", + "last_update_time": 1612126640448, + "indices": [ + "nab_art_daily_jumpsdown" + ], + "window_delay": { + "period": { + "unit": "Minutes", + "interval": 1 + } + }, + "detection_interval": { + "period": { + "unit": "Minutes", + "interval": 10 + } + }, + "name": "test-historical-detector", + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "shingle_size": 8, + "user": { + "backend_roles": [ + "admin" + ], + "custom_attribute_names": [], + "roles": [ + "all_access", + "own_index" + ], + "name": "admin", + "user_requested_tenant": "__user__" + }, + "detector_type": "HISTORICAL_SINGLE_ENTITY" + }, + "user": { + "backend_roles": [ + "admin" + ], + "custom_attribute_names": [], + "roles": [ + "all_access", + "own_index" + ], + "name": "admin", + "user_requested_tenant": "__user__" + } + } + } + ] + } +} +``` + + --- ## Delete detector Deletes a detector based on the `detector_id`. +To delete a historical detector, you need to first stop the detector. #### Request @@ -762,7 +959,7 @@ DELETE _opendistro/_anomaly_detection/detectors/ ## Update detector -Updates a detector with any changes, including the description or adding or removing of features. +Updates a detector with any changes, including the description or adding or removing of features. You can't update a real-time detector to a historical detector or vice versa. #### Request @@ -878,6 +1075,55 @@ PUT _opendistro/_anomaly_detection/detectors/ } ``` +To update a historical detector, you need to first stop the detector. + +#### Request + +```json +PUT _opendistro/_anomaly_detection/detectors/ +{ + "name": "test1", + "description": "test historical detector", + "time_field": "timestamp", + "indices": [ + "nab_art_daily_jumpsdown" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "feature_attributes": [ + { + "feature_name": "F1", + "feature_enabled": true, + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "detection_date_range": { + "start_time": 1577840401000, + "end_time": 1606121925000 + } +} +``` --- @@ -1042,6 +1288,189 @@ GET _opendistro/_anomaly_detection/detectors/?job=true } ``` +Use `task=true` to get historical detector task information. + +#### Request + +```json +GET _opendistro/_anomaly_detection/detectors/?task=true +``` + +#### Sample response + +```json +{ + "_id": "BwzKQXcB89DLS7G9rg7Y", + "_version": 1, + "_primary_term": 2, + "_seq_no": 10, + "anomaly_detector": { + "name": "test-ylwu1", + "description": "test", + "time_field": "timestamp", + "indices": [ + "nab*" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 10, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "shingle_size": 8, + "schema_version": 0, + "feature_attributes": [ + { + "feature_id": "BgzKQXcB89DLS7G9rg7G", + "feature_name": "F1", + "feature_enabled": true, + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "ui_metadata": { + "features": { + "F1": { + "aggregationBy": "sum", + "aggregationOf": "value", + "featureType": "simple_aggs" + } + } + }, + "last_update_time": 1611716538071, + "user": { + "name": "admin", + "backend_roles": [ + "admin" + ], + "roles": [ + "all_access", + "own_index" + ], + "custom_attribute_names": [], + "user_requested_tenant": "__user__" + }, + "detector_type": "HISTORICAL_SINGLE_ENTITY", + "detection_date_range": { + "start_time": 1580094137997, + "end_time": 1611716537997 + } + }, + "anomaly_detection_task": { + "task_id": "sgxaRXcB89DLS7G9RfIO", + "last_update_time": 1611776648699, + "started_by": "admin", + "state": "FINISHED", + "detector_id": "BwzKQXcB89DLS7G9rg7Y", + "task_progress": 1, + "init_progress": 1, + "current_piece": 1611716400000, + "execution_start_time": 1611776279822, + "execution_end_time": 1611776648679, + "is_latest": true, + "task_type": "HISTORICAL", + "coordinating_node": "gs213KqjS4q7H4Bmn_ZuLA", + "worker_node": "PgfR3JhbT7yJMx7bwQ6E3w", + "detector": { + "name": "test-ylwu1", + "description": "test", + "time_field": "timestamp", + "indices": [ + "nab*" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 10, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "shingle_size": 8, + "schema_version": 0, + "feature_attributes": [ + { + "feature_id": "BgzKQXcB89DLS7G9rg7G", + "feature_name": "F1", + "feature_enabled": true, + "aggregation_query": { + "f_1": { + "sum": { + "field": "value" + } + } + } + } + ], + "ui_metadata": { + "features": { + "F1": { + "aggregationBy": "sum", + "aggregationOf": "value", + "featureType": "simple_aggs" + } + } + }, + "last_update_time": 1611716538071, + "user": { + "name": "admin", + "backend_roles": [ + "admin" + ], + "roles": [ + "all_access", + "own_index" + ], + "custom_attribute_names": [], + "user_requested_tenant": "__user__" + }, + "detector_type": "HISTORICAL_SINGLE_ENTITY", + "detection_date_range": { + "start_time": 1580094137997, + "end_time": 1611716537997 + } + }, + "user": { + "name": "admin", + "backend_roles": [ + "admin" + ], + "roles": [ + "all_access", + "own_index" + ], + "custom_attribute_names": [], + "user_requested_tenant": "__user__" + } + } +} +``` + --- ## Search detector @@ -1219,6 +1648,35 @@ GET _opendistro/_anomaly_detection/stats/ } ``` +You see additional fields for a historical detector: + +#### Sample response + +```json +{ + "anomaly_detectors_index_status": "yellow", + "anomaly_detection_state_status": "yellow", + "historical_detector_count": 3, + "detector_count": 7, + "anomaly_detection_job_index_status": "yellow", + "models_checkpoint_index_status": "yellow", + "anomaly_results_index_status": "yellow", + "nodes": { + "Mz9HDZnuQwSCw0UiisxwWg": { + "ad_execute_request_count": 0, + "models": [], + "ad_canceled_batch_task_count": 2, + "ad_hc_execute_request_count": 0, + "ad_hc_execute_failure_count": 0, + "ad_execute_failure_count": 0, + "ad_batch_task_failure_count": 0, + "ad_executing_batch_task_count": 1, + "ad_total_batch_task_count": 8 + } + } +} +``` + --- ## Create monitor @@ -1565,7 +2023,7 @@ GET /_opendistro/_anomaly_detection/detectors//_profile?_all=true&pr } ``` -The `profile` operation also provides information about each entity, such as the entity’s `last_sample_timestamp` and `last_active_timestamp`. +The `profile` operation also provides information about each entity, such as the entity’s `last_sample_timestamp` and `last_active_timestamp`. No anomaly results for an entity indicates that either the entity doesn't have any sample data or its model is removed from the model cache. @@ -1593,5 +2051,114 @@ GET /_opendistro/_anomaly_detection/detectors//_profile?_all=true&en } ``` +For a historical detector, specify `_all` or `ad_task` to see information about its latest task: + +#### Request + +```json +GET _opendistro/_anomaly_detection/detectors//_profile?_all +GET _opendistro/_anomaly_detection/detectors//_profile/ad_task +``` + +#### Sample Responses + +```json +{ + "ad_task": { + "ad_task": { + "task_id": "JXxyG3YBv5IHYYfMlFS2", + "last_update_time": 1606778263543, + "state": "STOPPED", + "detector_id": "SwvxCHYBPhugfWD9QAL6", + "task_progress": 0.010480972, + "init_progress": 1, + "current_piece": 1578140400000, + "execution_start_time": 1606778262709, + "is_latest": true, + "task_type": "HISTORICAL", + "detector": { + "name": "historical_test1", + "description": "test", + "time_field": "timestamp", + "indices": [ + "nab_art_daily_jumpsdown" + ], + "filter_query": { + "match_all": { + "boost": 1 + } + }, + "detection_interval": { + "period": { + "interval": 5, + "unit": "Minutes" + } + }, + "window_delay": { + "period": { + "interval": 1, + "unit": "Minutes" + } + }, + "shingle_size": 8, + "schema_version": 0, + "feature_attributes": [ + { + "feature_id": "zgvyCHYBPhugfWD9Ap_F", + "feature_name": "sum", + "feature_enabled": true, + "aggregation_query": { + "sum": { + "sum": { + "field": "value" + } + } + } + }, + { + "feature_id": "zwvyCHYBPhugfWD9Ap_G", + "feature_name": "max", + "feature_enabled": true, + "aggregation_query": { + "max": { + "max": { + "field": "value" + } + } + } + } + ], + "ui_metadata": { + "features": { + "max": { + "aggregationBy": "max", + "aggregationOf": "value", + "featureType": "simple_aggs" + }, + "sum": { + "aggregationBy": "sum", + "aggregationOf": "value", + "featureType": "simple_aggs" + } + }, + "filters": [], + "filterType": "simple_filter" + }, + "last_update_time": 1606467935713, + "detector_type": "HISTORICAL_SIGLE_ENTITY", + "detection_date_range": { + "start_time": 1577840400000, + "end_time": 1606463775000 + } + } + }, + "shingle_size": 8, + "rcf_total_updates": 1994, + "threshold_model_trained": true, + "threshold_model_training_data_size": 0, + "node_id": "Q9yznwxvTz-yJxtz7rJlLg" + } +} +``` --- diff --git a/docs/ad/index.md b/docs/ad/index.md index 1f02de51..0ff96f1a 100644 --- a/docs/ad/index.md +++ b/docs/ad/index.md @@ -139,7 +139,23 @@ To see all the configuration settings, choose the **Detector configuration** tab 1. To enable or disable features, in the **Features** section, choose **Edit** and adjust the feature settings as needed. After you make your changes, choose **Save and start detector**. - Choose between automatically starting the detector (recommended) or manually starting the detector at a later time. -### Step 6: Manage your detectors +### Step 6: Analyze historical data + +Analyzing historical data helps you get familiar with the anomaly detection plugin. You can also evaluate the performance of a detector with historical data to further fine-tune it. + +To use a historical detector, the date range that you specify must have data present in at least 1,000 detection intervals. +{: .note } + +1. Choose **Historical detectors** and **Create historical detector**. +1. Enter the **Name** of the detector and a brief **Description**. +1. For **Data source**, choose the index that you want to use as the data source. You can optionally use index patterns to choose multiple indices. +1. For **Time range**, select a time range for historical analysis. +1. For **Detector settings**, you can choose to use settings of an existing detector. Or, choose the **Timestamp field** in your index, add individual features to the detector, and set the detector interval. +1. You can choose to run the historical detector automatically after creating. +1. Choose **Create**. +- You can stop the historical detector even before it completes. + +### Step 7: Manage your detectors Go to the **Detector details** page to change or delete your detectors. diff --git a/docs/ad/settings.md b/docs/ad/settings.md index e308d39a..9b5fc284 100644 --- a/docs/ad/settings.md +++ b/docs/ad/settings.md @@ -36,3 +36,7 @@ Setting | Default | Description `opendistro.anomaly_detection.max_primary_shards` | 10 | The maximum number of primary shards an anomaly detection index can have. `opendistro.anomaly_detection.filter_by_backend_roles` | False | When you enable the security plugin and set this to `true`, the plugin filters results based on the user's backend role(s). `opendistro.anomaly_detection.max_cache_miss_handling_per_second` | 100 | High cardinality detectors use a cache to store active models. In the event of a cache miss, the cache gets the models from the model checkpoint index. Use this setting to limit the rate of fetching models. Because the thread pool for a GET operation has a queue of 1,000, we recommend setting this value below 1,000. +`opendistro.anomaly_detection.max_batch_task_per_node` | 2 | Starting a historical detector triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1000. If the data nodes can't support all batch tasks, add more data nodes instead of changing this setting to a higher value. +`opendistro.anomaly_detection.max_old_ad_task_docs_per_detector` | 10 | You can run the same historical detector many times. For each run, the anomaly detection plugin creates a new task. This setting is the number of previous tasks the plugin keeps. Set this value to at least 1 to track its last run. You can keep a maximum of 1,000 old tasks to avoid overwhelming the cluster. +`opendistro.anomaly_detection.batch_task_piece_size` | 1000 | The date range for a historical task is split into smaller pieces and the task is run piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000. +`opendistro.anomaly_detection.batch_task_piece_interval_seconds` | 5 | Add a time interval between historical detector tasks. This interval prevents the task from consuming too much of the available resources and starving other operations like search and bulk index. You can change this setting from 1 to 600 seconds. From 0fa672841079d886a0828573724f516ea985b2d5 Mon Sep 17 00:00:00 2001 From: John Mazanec Date: Mon, 1 Feb 2021 10:21:25 -0800 Subject: [PATCH 15/28] update table --- docs/knn/approximate-knn.md | 27 ++++++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-) diff --git a/docs/knn/approximate-knn.md b/docs/knn/approximate-knn.md index a7444674..64e4f07a 100644 --- a/docs/knn/approximate-knn.md +++ b/docs/knn/approximate-knn.md @@ -9,7 +9,7 @@ has_math: true # Approximate k-NN Search -The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the HNSW algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach should be preferred. +The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the HNSW algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach should be preferred. This plugin builds an HNSW graph of the vectors for each "knn-vector field"/"Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. These graphs are loaded into native memory during search and managed by a cache. To pre-load the graphs into memory, please refer to the [warmup API](../api#Warmup). In order to see what graphs are loaded in memory as well as other stats, please refer to the [stats API](../api#Stats). To learn more about segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters will be applied on the results produced by the approximate nearest neighbor search. @@ -124,7 +124,24 @@ GET my-knn-index-1/_search A space corresponds to the function used to measure the distance between 2 points in order to determine the k-nearest neighbors. From the k-NN perspective, a lower score equates to a closer and better result. This is the opposite of how Elasticsearch scores results, where a greater score equates to a better result. To convert distances to Elasticsearch scores, we take 1/(1 + distance). Currently, the k-NN plugin supports the following spaces: -spaceType | Distance Function | Elasticsearch Score -:--- | :--- | :--- -l2 | \[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \] | 1 / (1 + Distance Function) -cosinesimil | \[ Distance(X, Y) = 1 - {A · B \over \|A\| · \|B\|} \] | 1 / (1 + Distance Function) + + + + + + + + + + + + + + + + + + +
spaceTypeDistance FunctionElasticsearch Score
l2\[ Distance(X, Y) = \sum_{i=1}^n (X_i - Y_i)^2 \]1 / (1 + Distance Function)
cosinesimil\[ {A · B \over \|A\| · \|B\|} = + {\sum_{i=1}^n (A_i · B_i) \over \sqrt{\sum_{i=1}^n A_i^2} · \sqrt{\sum_{i=1}^n B_i^2}}\] + where \(\|A\|\) and \(\|B\|\) represent normalized vectors.1 / (1 + Distance Function)
From 9dcd65f852f0457f044a38f1a097ee483a94af9b Mon Sep 17 00:00:00 2001 From: John Mazanec Date: Mon, 1 Feb 2021 10:25:24 -0800 Subject: [PATCH 16/28] update methods sections --- docs/knn/index.md | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) diff --git a/docs/knn/index.md b/docs/knn/index.md index 2a77b9cd..82547cb3 100644 --- a/docs/knn/index.md +++ b/docs/knn/index.md @@ -12,6 +12,31 @@ Short for *k-nearest neighbors*, the k-NN plugin enables users to search for the Use cases include recommendations (for example, an "other songs you might like" feature in a music application), image recognition, and fraud detection. For background information on the k-NN search, see [Wikipedia](https://en.wikipedia.org/wiki/Nearest_neighbor_search). -This plugin supports three different methods for obtaining the k-nearest neighbors from an index of vectors. The first method takes an approximate nearest neighbor approach; it uses the HNSW algorithm to return the approximate k-nearest neighbors to a query vector. This algorithm sacrifices indexing speed and search accuracy in return for lower latency and more scalable search. To learn more about the algorithm, please refer to [nmslib's documentation](https://github.com/nmslib/nmslib/) or [the paper introducing the algorithm](https://arxiv.org/abs/1603.09320). The second method extends Elasticsearch's script scoring functionality to execute a brute force, exact k-NN search. With this approach, users are able to run k-NN search on a subset of vectors in their index (sometimes referred to as a pre-filter search). The third method adds the distance functions as painless extensions that can be used in more complex combinations. +This plugin supports three different methods for obtaining the k-nearest neighbors from an index of vectors: + +1. **Approximate k-NN** -For larger data sets, users should generally choose the approximate nearest neighbor method, because it scales significantly better. For smaller data sets, where a user may want to apply a filter, they should choose the custom scoring approach. If users have a more complex use case where they need to use a distance function as part of their scoring method, they should use the painless scripting approach. + The first method takes an approximate nearest neighbor approach; it uses the HNSW algorithm to return the approximate k-nearest neighbors to a query vector. This algorithm sacrifices indexing speed and search accuracy in return for lower latency and more scalable search. To learn more about the algorithm, please refer to [nmslib's documentation](https://github.com/nmslib/nmslib/) or [the paper introducing the algorithm](https://arxiv.org/abs/1603.09320). + + Approximate k-NN is the best choice for searches over large indices (i.e. hundreds of thousands of vectors or more) that require low latency. Approximate k-NN should not be used if a filter will be applied on the index before the k-NN search, greatly reducing the number of vectors to be searched. In this case, either the script scoring method or the painless extensions should be used. + + For more details refer to the [Approximate k-NN section](../approximate-knn). + +2. **Script Score k-NN** + + The second method extends Elasticsearch's script scoring functionality to execute a brute force, exact k-NN search over "knn_vector" fields or fields that can represent binary objects. With this approach, users are able to run k-NN search on a subset of vectors in their index (sometimes referred to as a pre-filter search). + + This approach should be used for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indices may lead to high latencies. + + For more details refer to the [k-NN Script Score section](../knn-score-script). + +3. **Painless extensions** + + The third method adds the distance functions as painless extensions that can be used in more complex combinations. Similar to the k-NN Script Score, this method can be used to perform a brute force, exact k-NN search across an index and supports pre-filtering. + + This approach has slightly slower query performance compared to Script Score k-NN. This approach should be preferred over Script Score k-NN if the use case requires more customization over the final score. + + For more details refer to the [painless functions sectior](../painless-functions). + + +Overall, for larger data sets, users should generally choose the approximate nearest neighbor method, because it scales significantly better. For smaller data sets, where a user may want to apply a filter, they should choose the custom scoring approach. If users have a more complex use case where they need to use a distance function as part of their scoring method, they should use the painless scripting approach. From a7ca587f51b77174b5c2cfb6f140d64cbfc4c43c Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Mon, 1 Feb 2021 21:23:19 -0800 Subject: [PATCH 17/28] incorporated feedback --- docs/ad/api.md | 2 +- docs/ad/index.md | 8 ++++---- docs/ad/settings.md | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/ad/api.md b/docs/ad/api.md index 0996d90e..6990bc9e 100644 --- a/docs/ad/api.md +++ b/docs/ad/api.md @@ -1648,7 +1648,7 @@ GET _opendistro/_anomaly_detection/stats/ } ``` -You see additional fields for a historical detector: +Historical detectors contain additional fields: #### Sample response diff --git a/docs/ad/index.md b/docs/ad/index.md index 0ff96f1a..937a9d47 100644 --- a/docs/ad/index.md +++ b/docs/ad/index.md @@ -150,10 +150,10 @@ To use a historical detector, the date range that you specify must have data pre 1. Enter the **Name** of the detector and a brief **Description**. 1. For **Data source**, choose the index that you want to use as the data source. You can optionally use index patterns to choose multiple indices. 1. For **Time range**, select a time range for historical analysis. -1. For **Detector settings**, you can choose to use settings of an existing detector. Or, choose the **Timestamp field** in your index, add individual features to the detector, and set the detector interval. +1. For **Detector settings**, choose to use settings of an existing detector. Or choose the **Timestamp field** in your index, add individual features to the detector, and set the detector interval. 1. You can choose to run the historical detector automatically after creating. 1. Choose **Create**. -- You can stop the historical detector even before it completes. + - You can stop the historical detector even before it completes. ### Step 7: Manage your detectors @@ -161,7 +161,7 @@ Go to the **Detector details** page to change or delete your detectors. 1. To make changes to your detector, choose the detector name to open the detector details page. 1. Choose **Actions**, and then choose **Edit detector**. - - You need to stop the detector to change the detector configuration. In the pop-up box, confirm that you want to stop the detector and proceed. + - You need to stop the detector to change the detector configuration. In the pop-up box, confirm that you want to stop the detector and proceed. 1. After making your changes, choose **Save changes**. 1. To delete your detector, choose **Actions**, and then choose **Delete detector**. -- In the pop-up box, type `delete` to confirm and choose **Delete**. + - In the pop-up box, type `delete` to confirm and choose **Delete**. diff --git a/docs/ad/settings.md b/docs/ad/settings.md index 9b5fc284..4b232d73 100644 --- a/docs/ad/settings.md +++ b/docs/ad/settings.md @@ -38,5 +38,5 @@ Setting | Default | Description `opendistro.anomaly_detection.max_cache_miss_handling_per_second` | 100 | High cardinality detectors use a cache to store active models. In the event of a cache miss, the cache gets the models from the model checkpoint index. Use this setting to limit the rate of fetching models. Because the thread pool for a GET operation has a queue of 1,000, we recommend setting this value below 1,000. `opendistro.anomaly_detection.max_batch_task_per_node` | 2 | Starting a historical detector triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1000. If the data nodes can't support all batch tasks, add more data nodes instead of changing this setting to a higher value. `opendistro.anomaly_detection.max_old_ad_task_docs_per_detector` | 10 | You can run the same historical detector many times. For each run, the anomaly detection plugin creates a new task. This setting is the number of previous tasks the plugin keeps. Set this value to at least 1 to track its last run. You can keep a maximum of 1,000 old tasks to avoid overwhelming the cluster. -`opendistro.anomaly_detection.batch_task_piece_size` | 1000 | The date range for a historical task is split into smaller pieces and the task is run piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000. +`opendistro.anomaly_detection.batch_task_piece_size` | 1000 | The date range for a historical task is split into smaller pieces and the anomaly detection plugin runs the task piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000. `opendistro.anomaly_detection.batch_task_piece_interval_seconds` | 5 | Add a time interval between historical detector tasks. This interval prevents the task from consuming too much of the available resources and starving other operations like search and bulk index. You can change this setting from 1 to 600 seconds. From 5b6069e05cc651f9d17d0aed4362a6d6f63b1e11 Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Mon, 1 Feb 2021 22:45:20 -0800 Subject: [PATCH 18/28] incorporated more comments --- docs/ad/api.md | 24 +++++++++++++++++------- docs/ad/settings.md | 2 +- 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/docs/ad/api.md b/docs/ad/api.md index 6990bc9e..9e6b909f 100644 --- a/docs/ad/api.md +++ b/docs/ad/api.md @@ -249,7 +249,7 @@ POST _opendistro/_anomaly_detection/detectors "description": "test historical detector", "time_field": "timestamp", "indices": [ - "nab_art_daily_jumpsdown" + "host-cloudwatch" ], "filter_query": { "match_all": { @@ -775,7 +775,16 @@ POST _opendistro/_anomaly_detection/detectors/results/_search } ``` -In historical detectors, specify the `detector_id`: +In historical detectors, specify the `detector_id`. +To get the latest task: + +#### Request + +```json +GET _opendistro/_anomaly_detection/detectors/?task=true +``` + +To query the anomaly results with `task_id`: #### Request @@ -784,8 +793,8 @@ GET _opendistro/_anomaly_detection/detectors/results/_search { "query": { "term": { - "detector_id": { - "value": "dZc8WncBgO2zoQoFWVBA" + "task_id": { + "value": "NnlV9HUBQxqfQ7vBJNzy" } } } @@ -925,7 +934,7 @@ GET _opendistro/_anomaly_detection/detectors/results/_search ## Delete detector Deletes a detector based on the `detector_id`. -To delete a historical detector, you need to first stop the detector. +To delete a detector, you need to first stop the detector. #### Request @@ -959,7 +968,8 @@ DELETE _opendistro/_anomaly_detection/detectors/ ## Update detector -Updates a detector with any changes, including the description or adding or removing of features. You can't update a real-time detector to a historical detector or vice versa. +Updates a detector with any changes, including the description or adding or removing of features. +To update a detector, you need to first stop the detector. #### Request @@ -1075,7 +1085,7 @@ PUT _opendistro/_anomaly_detection/detectors/ } ``` -To update a historical detector, you need to first stop the detector. +To update a historical detector: #### Request diff --git a/docs/ad/settings.md b/docs/ad/settings.md index 4b232d73..f8be70d3 100644 --- a/docs/ad/settings.md +++ b/docs/ad/settings.md @@ -36,7 +36,7 @@ Setting | Default | Description `opendistro.anomaly_detection.max_primary_shards` | 10 | The maximum number of primary shards an anomaly detection index can have. `opendistro.anomaly_detection.filter_by_backend_roles` | False | When you enable the security plugin and set this to `true`, the plugin filters results based on the user's backend role(s). `opendistro.anomaly_detection.max_cache_miss_handling_per_second` | 100 | High cardinality detectors use a cache to store active models. In the event of a cache miss, the cache gets the models from the model checkpoint index. Use this setting to limit the rate of fetching models. Because the thread pool for a GET operation has a queue of 1,000, we recommend setting this value below 1,000. -`opendistro.anomaly_detection.max_batch_task_per_node` | 2 | Starting a historical detector triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1000. If the data nodes can't support all batch tasks, add more data nodes instead of changing this setting to a higher value. +`opendistro.anomaly_detection.max_batch_task_per_node` | 2 | Starting a historical detector triggers a batch task. This setting is the number of batch tasks that you can run per data node. You can tune this setting from 1 to 1000. If the data nodes can't support all batch tasks and if you're not sure if the data nodes are capable of running more historical detectors, add more data nodes instead of changing this setting to a higher value. `opendistro.anomaly_detection.max_old_ad_task_docs_per_detector` | 10 | You can run the same historical detector many times. For each run, the anomaly detection plugin creates a new task. This setting is the number of previous tasks the plugin keeps. Set this value to at least 1 to track its last run. You can keep a maximum of 1,000 old tasks to avoid overwhelming the cluster. `opendistro.anomaly_detection.batch_task_piece_size` | 1000 | The date range for a historical task is split into smaller pieces and the anomaly detection plugin runs the task piece by piece. Each piece contains 1,000 detection intervals by default. For example, if detector interval is 1 minute and one piece is 1000 minutes, the feature data is queried every 1,000 minutes. You can change this setting from 1 to 10,000. `opendistro.anomaly_detection.batch_task_piece_interval_seconds` | 5 | Add a time interval between historical detector tasks. This interval prevents the task from consuming too much of the available resources and starving other operations like search and bulk index. You can change this setting from 1 to 600 seconds. From 88d3b08398e2ece06f012a9d3aa4c22c8200a53f Mon Sep 17 00:00:00 2001 From: aetter Date: Tue, 2 Feb 2021 12:06:17 -0800 Subject: [PATCH 19/28] Add -subj flag to certificate generation All credit goes to @neographikal. --- docs/security/configuration/generate-certificates.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/security/configuration/generate-certificates.md b/docs/security/configuration/generate-certificates.md index 1e46b526..e1bdc82f 100755 --- a/docs/security/configuration/generate-certificates.md +++ b/docs/security/configuration/generate-certificates.md @@ -112,6 +112,12 @@ rm node-key-temp.pem rm node.csr ``` +If you already know the certificate details and don't want to specify them as the script runs, use the `-subj` option in your `root-ca.pem` and CSR commands: + +```bash +openssl req -new -key node-key.pem -subj "/C=CA/ST=ONTARIO/L=TORONTO/O=ORG/OU=UNIT/CN=node1.example.com" -out node.csr +``` + ## Get distinguished names From 5d9368da5e851b6f67f626dc271b77ef460c99ec Mon Sep 17 00:00:00 2001 From: keithhc2 Date: Wed, 3 Feb 2021 16:01:49 -0800 Subject: [PATCH 20/28] Language fixes and clarification --- docs/knn/settings.md | 55 +------------------------------------------- docs/knn/warmup.md | 26 ++++++++++++++------- 2 files changed, 19 insertions(+), 62 deletions(-) diff --git a/docs/knn/settings.md b/docs/knn/settings.md index 4fc4d53d..5be12c4c 100644 --- a/docs/knn/settings.md +++ b/docs/knn/settings.md @@ -1,6 +1,6 @@ --- layout: default -title: Settings, Statistics, and Tasks +title: Settings and Statistics parent: KNN nav_order: 10 --- @@ -64,56 +64,3 @@ Statistic | Description `script_compilation_errors` | The number of errors during script compilation. `script_query_requests` | The number of query requests that use [the KNN script](../#custom-scoring). `script_query_errors` | The number of errors during script queries. - -## Tasks - -You can use the `_tasks` API to see what tasks are currently executing on your indices. - -```json -GET /_tasks -``` - -This sample request returns the tasks currently running on a node named `odfe-node1`. - -```json -GET /_tasks?nodes=odfe-node1 -{ - "nodes": { - "Mgqdm0r9SEGClWxp_RbnaQ": { - "name": "odfe-node1", - "transport_address": "sample_address", - "host": "sample_host", - "ip": "sample_ip", - "roles": [ - "data", - "ingest", - "master", - "remote_cluster_client" - ], - "tasks": { - "Mgqdm0r9SEGClWxp_RbnaQ:24578": { - "node": "Mgqdm0r9SEGClWxp_RbnaQ", - "id": 24578, - "type": "transport", - "action": "cluster:monitor/tasks/lists", - "start_time_in_millis": 1611612517044, - "running_time_in_nanos": 638700, - "cancellable": false, - "headers": {} - }, - "Mgqdm0r9SEGClWxp_RbnaQ:24579": { - "node": "Mgqdm0r9SEGClWxp_RbnaQ", - "id": 24579, - "type": "direct", - "action": "cluster:monitor/tasks/lists[n]", - "start_time_in_millis": 1611612517044, - "running_time_in_nanos": 222200, - "cancellable": false, - "parent_task_id": "Mgqdm0r9SEGClWxp_RbnaQ:24578", - "headers": {} - } - } - } - } -} -``` diff --git a/docs/knn/warmup.md b/docs/knn/warmup.md index 83824a33..77fd3151 100644 --- a/docs/knn/warmup.md +++ b/docs/knn/warmup.md @@ -8,11 +8,15 @@ has_toc: false has_math: false --- -# Warmup API +# Warmup API operation for the k-NN plugin -The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This loading time can cause high latency during initial queries. To avoid this situation, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort. +The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory. -As an alternative, you can run the k-NN plugin's warmup API on whatever indices you are interested in searching. This API loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After the process completes, you can start searching against their indices with no initial latency penalties. The warmup API is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs not currently in memory. +If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort. + +As an alternative, you can avoid this latency issue by running the k-NN plugin warmup API operation on whatever indices you're interested in searching. This operation loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. + +After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory. ## Usage This request performs a warmup on three indices: @@ -30,13 +34,19 @@ GET /_opendistro/_knn/warmup/index1,index2,index3?pretty `total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up. -The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the [Elasticsearch `_tasks` API](../settings#tasks). +The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the Elasticsearch `_tasks` API: + +```json +GET /_tasks +``` -Following the completion of the operation, use the [k-NN `_stats` API](../settings#statistics) to see what the k-NN plugin loaded into the graph. +After the operation has finished, use the [k-NN `_stats` API operation](../settings#statistics) to see what the k-NN plugin loaded into the graph. ## Best practices -For the warmup API to function properly, follow these best practices. First, do not run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and neither is graph C. In this case, the initial penalty for loading graph C is still present. +For the warmup API to function properly, follow these best practices. + +First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present. -Second, confirm that all graphs you want to warm up can fit into native memory. See the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings) for more information about the native memory limit. High graph memory usage causes cache thrashing. +Second, confirm that all graphs you want to warm up can fit into native memory. For more information about the native memory limit, see the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings). High graph memory usage causes cache thrashing, which can lead to operations constantly failing and attempting to run again. -Lastly, do not index any documents you want to load into the cache. Writing new information to segments prevents the warmup API from loading the graphs until they are searchable, so you would have to run the warmup API again after indexing finishes. +Finally, don't index any documents that you want to load into the cache. Writing new information to segments prevents the warmup API operation from loading the graphs until they're searchable. This means that you would have to run the warmup operation again after indexing finishes. From 0da316dd5851b4ee1ae6f2c1a51e08a3945b9a43 Mon Sep 17 00:00:00 2001 From: aetter Date: Fri, 5 Feb 2021 10:47:35 -0800 Subject: [PATCH 21/28] Update README.md --- README.md | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 5cb98b64..0e950942 100644 --- a/README.md +++ b/README.md @@ -70,11 +70,9 @@ There are three ways to contribute content, depending on the magnitude of the ch ### Trivial changes -If you just need to fix a typo or add a sentence, this method works well: +If you just need to fix a typo or add a sentence, this web-based method works well: -1. In your web browser, navigate to the appropriate Markdown file. For example, [cluster.md](https://github.com/opendistro/for-elasticsearch-docs/blob/master/docs/elasticsearch/cluster.md). - -1. Click the **Edit this file** button (the pencil). +1. On any page in the documentation, click the **Edit this page** link in the lower-left. 1. Make your changes. From 9bb0093bc3e1c1df23dddbe02cb05e8717e9e027 Mon Sep 17 00:00:00 2001 From: John Mazanec Date: Fri, 5 Feb 2021 15:58:01 -0800 Subject: [PATCH 22/28] factor in warmup doc change --- docs/knn/api.md | 34 +++++++++++++++++++++--------- docs/knn/warmup.md | 52 ---------------------------------------------- 2 files changed, 24 insertions(+), 62 deletions(-) delete mode 100644 docs/knn/warmup.md diff --git a/docs/knn/api.md b/docs/knn/api.md index 8140fbdd..1b6ea3e6 100644 --- a/docs/knn/api.md +++ b/docs/knn/api.md @@ -101,13 +101,18 @@ GET /_opendistro/_knn/HYMrXXsBSamUkcAjhjeN0w/stats/circuit_breaker_triggered,gra ``` ## Warmup -The HNSW graphs used to perform k-Approximate Nearest Neighbor Search are stored as `.hnsw` files with the other Lucene segment files. In order to perform search on these graphs, they need to be loaded into native memory. If the graphs have not yet been loaded into native memory, upon search, they will first be loaded and then searched. This can cause high latency during initial queries. To avoid this, users will often run random queries during a warmup period. After this warmup period, the graphs will be loaded into native memory and their production workloads can begin. This process is indirect and requires extra effort. +The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory. -As an alternative, a user can run the warmup API on whatever indices they are interested in searching over. This API will load all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. After this process completes, a user will be able to start searching against their indices with no initial latency penalties. The warmup API is idempotent. If a segment's graphs are already loaded into memory, this operation will have no impact on them. It only loads graphs that are not currently in memory. +If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort. -### Example -This command will perform warmup on index1, index2, and index3: -``` +As an alternative, you can avoid this latency issue by running the k-NN plugin warmup API operation on whatever indices you're interested in searching. This operation loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. + +After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory. + +### Usage +This request performs a warmup on three indices: + +```json GET /_opendistro/_knn/warmup/index1,index2,index3?pretty { "_shards" : { @@ -117,13 +122,22 @@ GET /_opendistro/_knn/warmup/index1,index2,index3?pretty } } ``` -`total` indicates how many shards the warmup operation was performed on. `successful` indicates how many shards succeeded and `failed` indicates how many shards have failed. -The call will not return until the warmup operation is complete or the request times out. If the request times out, the operation will still be going on in the cluster. To monitor this, use the Elasticsearch `_tasks` API. +`total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up. -Following the completion of the operation, use the k-NN `_stats` API to see what has been loaded into the graph. +The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the Elasticsearch `_tasks` API: + +```json +GET /_tasks +``` + +After the operation has finished, use the [k-NN `_stats` API operation](#Stats) to see what the k-NN plugin loaded into the graph. ### Best practices -In order for the warmup API to function properly, a few best practices should be followed. First, no merge operations should be currently running on the indices that will be warmed up. The reason for this is that, during merge, new segments are created and old segments are (sometimes) deleted. The situation may arise where the warmup API loads graphs A and B into native memory, but then segment C is created from segments A and B being merged. The graphs for A and B will no longer be in memory and neither will the graph for C. Then, the initial penalty of loading graph C on the first queries will still be present. +For the warmup API to function properly, follow these best practices. + +First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present. + +Second, confirm that all graphs you want to warm up can fit into native memory. For more information about the native memory limit, see the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings). High graph memory usage causes cache thrashing, which can lead to operations constantly failing and attempting to run again. -Second, it should first be confirmed that all of the graphs of interest are able to fit into native memory before running warmup. If they all cannot fit into memory, then the cache will thrash. +Finally, don't index any documents that you want to load into the cache. Writing new information to segments prevents the warmup API operation from loading the graphs until they're searchable. This means that you would have to run the warmup operation again after indexing finishes. diff --git a/docs/knn/warmup.md b/docs/knn/warmup.md deleted file mode 100644 index 77fd3151..00000000 --- a/docs/knn/warmup.md +++ /dev/null @@ -1,52 +0,0 @@ ---- -layout: default -title: Warmup API -parent: KNN -nav_order: 5 -has_children: false -has_toc: false -has_math: false ---- - -# Warmup API operation for the k-NN plugin - -The Hierarchical Navigable Small World (HNSW) graphs that are used to perform an approximate k-Nearest Neighbor (k-NN) search are stored as `.hnsw` files with other Apache Lucene segment files. In order for you to perform a search on these graphs using the k-NN plugin, these files need to be loaded into native memory. - -If the plugin has not loaded the graphs into native memory, it loads them when it receives a search request. This loading time can cause high latency during initial queries. To avoid this situation, users often run random queries during a warmup period. After this warmup period, the graphs are loaded into native memory and their production workloads can begin. This loading process is indirect and requires extra effort. - -As an alternative, you can avoid this latency issue by running the k-NN plugin warmup API operation on whatever indices you're interested in searching. This operation loads all the graphs for all of the shards (primaries and replicas) of all the indices specified in the request into native memory. - -After the process finishes, you can start searching against the indices with no initial latency penalties. The warmup API operation is idempotent, so if a segment's graphs are already loaded into memory, this operation has no impact on those graphs. It only loads graphs that aren't currently in memory. - -## Usage -This request performs a warmup on three indices: - -```json -GET /_opendistro/_knn/warmup/index1,index2,index3?pretty -{ - "_shards" : { - "total" : 6, - "successful" : 6, - "failed" : 0 - } -} -``` - -`total` indicates how many shards the k-NN plugin attempted to warm up. The response also includes the number of shards the plugin succeeded and failed to warm up. - -The call does not return until the warmup operation is complete or the request times out. If the request times out, the operation still continues on the cluster. To monitor the warmup operation, use the Elasticsearch `_tasks` API: - -```json -GET /_tasks -``` - -After the operation has finished, use the [k-NN `_stats` API operation](../settings#statistics) to see what the k-NN plugin loaded into the graph. - -## Best practices -For the warmup API to function properly, follow these best practices. - -First, don't run merge operations on indices that you want to warm up. During merge, the k-NN plugin creates new segments, and old segments are (sometimes) deleted. For example, you could encounter a situation in which the warmup API operation loads graphs A and B into native memory, but segment C is created from segments A and B being merged. The graphs for A and B would no longer be in memory, and graph C would also not be in memory. In this case, the initial penalty for loading graph C is still present. - -Second, confirm that all graphs you want to warm up can fit into native memory. For more information about the native memory limit, see the [knn.memory.circuit_breaker.limit statistic](../settings/#cluster-settings). High graph memory usage causes cache thrashing, which can lead to operations constantly failing and attempting to run again. - -Finally, don't index any documents that you want to load into the cache. Writing new information to segments prevents the warmup API operation from loading the graphs until they're searchable. This means that you would have to run the warmup operation again after indexing finishes. From ed84553520254ebe8ce95e1c9111b28fee772ae9 Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Mon, 8 Feb 2021 08:55:39 -0800 Subject: [PATCH 23/28] added aggregation to query --- docs/ism/index-rollups.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/docs/ism/index-rollups.md b/docs/ism/index-rollups.md index d2b285c5..83aed4f4 100644 --- a/docs/ism/index-rollups.md +++ b/docs/ism/index-rollups.md @@ -62,7 +62,7 @@ Review your configuration and select **Create**. You can use the standard `_search` API to search the target index. Make sure that the query matches the constraints of the target index. For example, if don’t set up terms aggregations on a field, you don’t receive results for terms aggregations. If you don’t set up the maximum aggregations, you don’t receive results for maximum aggregations. -You can’t access the internal structure of the data in the target index because the plugin automatically rewrites the query in the background to suit the target index. This is to make sure you can use the same query for the source and target index. +You can’t access the internal structure of the data in the target index because the plugin automatically rewrites the query in the background to suit the target index. This is to make sure you can use the same query for the source and target index. To query the target index, set `size` to 0: @@ -71,13 +71,16 @@ GET target_index/_search { "size": 0, "query": { - "term": { - "timezone": "America/Los_Angeles" + "match_all": {} + }, + "aggs": { + "avg_cpu": { + "avg": { + "field": "cpu_usage" + } } } } ``` -You can also search both your source and target indices in the same query. - Consider a scenario where you collect rolled up data from 1 PM to 9 PM in hourly intervals and live data from 7 PM to 11 PM in minutely intervals. If you execute an aggregation over these in the same query, for 7 PM to 9 PM, you see an overlap of both rolled up data and live data because they get counted twice in the aggregations. From 5ae56e1d0743390683d0121690db0f623281d4b0 Mon Sep 17 00:00:00 2001 From: John Mazanec Date: Mon, 8 Feb 2021 12:54:21 -0800 Subject: [PATCH 24/28] update painless --- docs/knn/painless-functions.md | 70 +++++++++++++++++++++++++++++++++- 1 file changed, 69 insertions(+), 1 deletion(-) diff --git a/docs/knn/painless-functions.md b/docs/knn/painless-functions.md index af8c632f..a9b6251a 100644 --- a/docs/knn/painless-functions.md +++ b/docs/knn/painless-functions.md @@ -7,4 +7,72 @@ has_children: false has_math: true --- - \ No newline at end of file +# Painless Scripting Functions + +With the k-NN Plugin's Painless Scripting extensions, you can use k-NN distance functions directly in your Painless scripts to perform operations on `knn_vector` fields. Painless has a strict list of allowed functions and classes per context to ensure its scripts are secure. The k-NN plugin has added painless extensions to a few of the distance functions used in [k-NN score script](../knn-score-script) so that you can utilize them when you need more customization with respect to you k-NN workload. + +## Get started with k-NN's Painless Scripting Functions + +To use k-NN's Painless Scripting functions, first, you still need to create an index with `knn_vector` fields as was done in [k-NN score script](../knn-score-script#Getting_started_with_the_score_script). Once the index is created and you have ingested some data, you can use the painless extensions like so: + +``` +GET my-knn-index-2/_search +{ + "size": 2, + "query": { + "script_score": { + "query": { + "bool": { + "filter": { + "term": { + "color": "BLUE" + } + } + } + }, + "script": { + "source": "1.0 + cosineSimilarity(params.query_value, doc[params.field])", + "params": { + "field": "my_vector", + "query_value": [9.9, 9.9], + } + } + } + } +} +``` + +The `field` needs to map to a `knn_vector` field and the `query_value` needs to be a floating point array with the same dimension as `field`. + +## Function Types +The following table contains the available painless functions the k-NN plugin provides: + + + + + + + + + + + + + + + + + + + +
Function NameFunction SignatureDescription
l2Squared`float l2Squared (float[] queryVector, doc['vector field'])`This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors.
cosineSimilarityfloat cosineSimilarity (float[] queryVector, doc['vector field'])Cosine similarity is inner product of the query vector and document vector normalized to both have length 1. If magnitude of the query vector does not change throughout the query, users can pass magnitude of query vector optionally to improve the performance instead of calculating the magnitude every time for every filtered document: `float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)`. In general, range of cosine similarity is [-1, 1], but in case of information retrieval, the cosine similarity of two documents will range from 0 to 1, since tf-idf cannot be negative. Hence, we add 1.0 to the cosine similarity to score always positive.
+ + +## Constraints +1. If a document’s knn_vector field has different dimensions than the query, the function throws an IllegalArgumentException. +2. If a vector field doesn't have a value, the function throws an IllegalStateException. + You can avoid this situation by first checking if a document has a value for the field: +``` + "source": "doc[params.field].size() == 0 ? 0 : 1 / (1 + l2Squared(params.query_value, doc[params.field]))", +``` +Since scores can only be positive, this script ranks documents with vector fields higher than those without. From a5cf609f9bd9b80fd300fa15bd94873946d05385 Mon Sep 17 00:00:00 2001 From: John Mazanec Date: Mon, 8 Feb 2021 13:31:13 -0800 Subject: [PATCH 25/28] update knn docs --- docs/knn/api.md | 4 ++-- docs/knn/approximate-knn.md | 6 +++--- docs/knn/index.md | 8 ++++---- docs/knn/jni-library.md | 2 +- docs/knn/knn-score-script.md | 4 ++-- docs/knn/painless-functions.md | 12 ++++++------ docs/knn/performance-tuning.md | 6 +++--- docs/knn/settings.md | 2 +- 8 files changed, 22 insertions(+), 22 deletions(-) diff --git a/docs/knn/api.md b/docs/knn/api.md index 1b6ea3e6..906ec4cb 100644 --- a/docs/knn/api.md +++ b/docs/knn/api.md @@ -2,7 +2,7 @@ layout: default title: API nav_order: 4 -parent: KNN +parent: k-NN has_children: false --- @@ -38,7 +38,7 @@ Statistic | Description `script_query_requests` | The total number of script queries. This is only relevant to k-NN score script search. `script_query_errors` | The number of errors during script queries. This is only relevant to k-NN score script search. -### Examples +### Usage ``` GET /_opendistro/_knn/stats?pretty diff --git a/docs/knn/approximate-knn.md b/docs/knn/approximate-knn.md index 64e4f07a..4b96c1ed 100644 --- a/docs/knn/approximate-knn.md +++ b/docs/knn/approximate-knn.md @@ -2,7 +2,7 @@ layout: default title: Approximate Search nav_order: 1 -parent: KNN +parent: k-NN has_children: false has_math: true --- @@ -11,13 +11,13 @@ has_math: true The approximate k-NN method uses [nmslib's](https://github.com/nmslib/nmslib/) implementation of the HNSW algorithm to power k-NN search. In this case, approximate means that for a given search, the neighbors returned are an estimate of the true k-nearest neighbors. Of the three methods, this method offers the best search scalability for large data sets. Generally speaking, once the data set gets into the hundreds of thousands of vectors, this approach should be preferred. -This plugin builds an HNSW graph of the vectors for each "knn-vector field"/"Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. These graphs are loaded into native memory during search and managed by a cache. To pre-load the graphs into memory, please refer to the [warmup API](../api#Warmup). In order to see what graphs are loaded in memory as well as other stats, please refer to the [stats API](../api#Stats). To learn more about segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters will be applied on the results produced by the approximate nearest neighbor search. +This plugin builds an HNSW graph of the vectors for each "knn-vector field"/"Lucene segment" pair during indexing that can be used to efficiently find the k-nearest neighbors to a query vector during search. These graphs are loaded into native memory during search and managed by a cache. To pre-load the graphs into memory, please refer to the [warmup API](api#Warmup). In order to see what graphs are loaded in memory as well as other stats, please refer to the [stats API](api#Stats). To learn more about segments, please refer to [Apache Lucene's documentation](https://lucene.apache.org/core/8_7_0/core/org/apache/lucene/codecs/lucene87/package-summary.html#package.description). Because the graphs are constructed during indexing, it is not possible to apply a filter on an index and then use this search method. All filters will be applied on the results produced by the approximate nearest neighbor search. ## Get started with approximate k-NN To use the k-NN plugin's approximate search functionality, you must first create a k-NN index with the index setting, `index.knn` to `true`. This setting tells the plugin to create HNSW graphs for the index. -Additionally, if you are using the approximate k-nearest neighbor method, you should specify `knn.space_type` to the space that you are interested in. This setting cannot be changed after it is set. Please refer to the [spaces section](#spaces) to see what spaces we support! By default, `index.knn.space_type` is `l2`. For more information on index settings, such as algorithm parameters that can be tweaked to tune performance, please refer to the [documentation](../settings#IndexSettings). +Additionally, if you are using the approximate k-nearest neighbor method, you should specify `knn.space_type` to the space that you are interested in. This setting cannot be changed after it is set. Please refer to the [spaces section](#spaces) to see what spaces we support! By default, `index.knn.space_type` is `l2`. For more information on index settings, such as algorithm parameters that can be tweaked to tune performance, please refer to the [documentation](settings#IndexSettings). Next, you must add one or more fields of the `knn_vector` data type. Here is an example that creates an index with two `knn_vector` fields and uses cosine similarity: diff --git a/docs/knn/index.md b/docs/knn/index.md index 82547cb3..0a318da6 100644 --- a/docs/knn/index.md +++ b/docs/knn/index.md @@ -1,6 +1,6 @@ --- layout: default -title: KNN +title: k-NN nav_order: 50 has_children: true has_toc: false @@ -20,7 +20,7 @@ This plugin supports three different methods for obtaining the k-nearest neighbo Approximate k-NN is the best choice for searches over large indices (i.e. hundreds of thousands of vectors or more) that require low latency. Approximate k-NN should not be used if a filter will be applied on the index before the k-NN search, greatly reducing the number of vectors to be searched. In this case, either the script scoring method or the painless extensions should be used. - For more details refer to the [Approximate k-NN section](../approximate-knn). + For more details refer to the [Approximate k-NN section](approximate-knn). 2. **Script Score k-NN** @@ -28,7 +28,7 @@ This plugin supports three different methods for obtaining the k-nearest neighbo This approach should be used for searches over smaller bodies of documents or when a pre-filter is needed. Using this approach on large indices may lead to high latencies. - For more details refer to the [k-NN Script Score section](../knn-score-script). + For more details refer to the [k-NN Script Score section](knn-score-script). 3. **Painless extensions** @@ -36,7 +36,7 @@ This plugin supports three different methods for obtaining the k-nearest neighbo This approach has slightly slower query performance compared to Script Score k-NN. This approach should be preferred over Script Score k-NN if the use case requires more customization over the final score. - For more details refer to the [painless functions sectior](../painless-functions). + For more details refer to the [painless functions section](painless-functions). Overall, for larger data sets, users should generally choose the approximate nearest neighbor method, because it scales significantly better. For smaller data sets, where a user may want to apply a filter, they should choose the custom scoring approach. If users have a more complex use case where they need to use a distance function as part of their scoring method, they should use the painless scripting approach. diff --git a/docs/knn/jni-library.md b/docs/knn/jni-library.md index 355b415d..27166ba5 100644 --- a/docs/knn/jni-library.md +++ b/docs/knn/jni-library.md @@ -2,7 +2,7 @@ layout: default title: JNI Library nav_order: 5 -parent: KNN +parent: k-NN has_children: false --- diff --git a/docs/knn/knn-score-script.md b/docs/knn/knn-score-script.md index 836e9404..cd3ed8bb 100644 --- a/docs/knn/knn-score-script.md +++ b/docs/knn/knn-score-script.md @@ -2,7 +2,7 @@ layout: default title: Exact k-NN with Scoring Script nav_order: 2 -parent: KNN +parent: k-NN has_children: false has_math: true --- @@ -101,7 +101,7 @@ All parameters are required. *Note* -- After ODFE 1.11, `vector` was replaced by `query_value` due to the addition of the `bithamming` space. -The [post filter example in the approximate approach](../approximate-knn#UsingApproximatek-NNWithFilters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents you want to execute the k-nearest neighbor search over. +The [post filter example in the approximate approach](../approximate-knn/#using-approximate-k-nn-with-filters) shows a search that returns fewer than `k` results. If you want to avoid this situation, the score script method lets you essentially invert the order of events. In other words, you can filter down the set of documents you want to execute the k-nearest neighbor search over. This example shows a pre-filter approach to k-NN search with the score script approach. First, create the index: diff --git a/docs/knn/painless-functions.md b/docs/knn/painless-functions.md index a9b6251a..15e72742 100644 --- a/docs/knn/painless-functions.md +++ b/docs/knn/painless-functions.md @@ -2,7 +2,7 @@ layout: default title: k-NN Painless Extensions nav_order: 3 -parent: KNN +parent: k-NN has_children: false has_math: true --- @@ -13,7 +13,7 @@ With the k-NN Plugin's Painless Scripting extensions, you can use k-NN distance ## Get started with k-NN's Painless Scripting Functions -To use k-NN's Painless Scripting functions, first, you still need to create an index with `knn_vector` fields as was done in [k-NN score script](../knn-score-script#Getting_started_with_the_score_script). Once the index is created and you have ingested some data, you can use the painless extensions like so: +To use k-NN's Painless Scripting functions, first, you still need to create an index with `knn_vector` fields as was done in [k-NN score script](../knn-score-script#Getting-started-with-the-score-script). Once the index is created and you have ingested some data, you can use the painless extensions like so: ``` GET my-knn-index-2/_search @@ -57,19 +57,19 @@ The following table contains the available painless functions the k-NN plugin pr l2Squared - `float l2Squared (float[] queryVector, doc['vector field'])` + float l2Squared (float[] queryVector, doc['vector field']) This function calculates the square of the L2 distance (Euclidean distance) between a given query vector and document vectors. The shorter the distance, the more relevant the document is, so this example inverts the return value of the l2Squared function. If the document vector matches the query vector, the result is 0, so this example also adds 1 to the distance to avoid divide by zero errors. cosineSimilarity - float cosineSimilarity (float[] queryVector, doc['vector field']) - Cosine similarity is inner product of the query vector and document vector normalized to both have length 1. If magnitude of the query vector does not change throughout the query, users can pass magnitude of query vector optionally to improve the performance instead of calculating the magnitude every time for every filtered document: `float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector)`. In general, range of cosine similarity is [-1, 1], but in case of information retrieval, the cosine similarity of two documents will range from 0 to 1, since tf-idf cannot be negative. Hence, we add 1.0 to the cosine similarity to score always positive. + float cosineSimilarity (float[] queryVector, doc['vector field']) + Cosine similarity is inner product of the query vector and document vector normalized to both have length 1. If magnitude of the query vector does not change throughout the query, users can pass magnitude of query vector optionally to improve the performance instead of calculating the magnitude every time for every filtered document: float cosineSimilarity (float[] queryVector, doc['vector field'], float normQueryVector). In general, range of cosine similarity is [-1, 1], but in case of information retrieval, the cosine similarity of two documents will range from 0 to 1, since tf-idf cannot be negative. Hence, we add 1.0 to the cosine similarity to score always positive. ## Constraints -1. If a document’s knn_vector field has different dimensions than the query, the function throws an IllegalArgumentException. +1. If a document’s `knn_vector` field has different dimensions than the query, the function throws an `IllegalArgumentException`. 2. If a vector field doesn't have a value, the function throws an IllegalStateException. You can avoid this situation by first checking if a document has a value for the field: ``` diff --git a/docs/knn/performance-tuning.md b/docs/knn/performance-tuning.md index 634d1ac6..271553c1 100644 --- a/docs/knn/performance-tuning.md +++ b/docs/knn/performance-tuning.md @@ -1,7 +1,7 @@ --- layout: default title: Performance Tuning -parent: KNN +parent: k-NN nav_order: 7 --- @@ -35,7 +35,7 @@ Having replicas set to 0, will avoid duplicate construction of graphs in both pr 3. Increase number of indexing threads -If the hardware we choose has multiple cores, we could allow multiple threads in graph construction and there by speed up the indexing process. You could determine the number of threads to be alloted by using the [knnalgo_paramindex_thread_qty]() setting. +If the hardware we choose has multiple cores, we could allow multiple threads in graph construction and there by speed up the indexing process. You could determine the number of threads to be alloted by using the [knn.algo_param.index_thread_qty](../settings/#Cluster-settings) setting. Please keep an eye on CPU utilization and choose right number of threads. Since graph construction is costly, having multiple threads can put additional load on CPU. @@ -94,4 +94,4 @@ As an example, assume that we have 1 Million vectors with dimension of 256 and M The standard KNN query and custom scoring option perform differently. Test using a representative set of documents to see if the search results and latencies match your expectations. -Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../elasticsearch/#primary-and-replica-shards). +Custom scoring works best if the initial filter reduces the number of documents to no more than 20,000. Increasing shard count can improve latencies, but be sure to keep shard size within [the recommended guidelines](../../elasticsearch/#primary-and-replica-shards). diff --git a/docs/knn/settings.md b/docs/knn/settings.md index 4e13787e..f5c5b168 100644 --- a/docs/knn/settings.md +++ b/docs/knn/settings.md @@ -1,7 +1,7 @@ --- layout: default title: Settings -parent: KNN +parent: k-NN nav_order: 6 --- From 70a1fe765deabc474f093f5cf8166e1142284464 Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Thu, 11 Feb 2021 16:03:18 -0800 Subject: [PATCH 26/28] reporting troubleshooting section --- docs/kibana/reporting.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/docs/kibana/reporting.md b/docs/kibana/reporting.md index ba29f01b..265a1a9d 100644 --- a/docs/kibana/reporting.md +++ b/docs/kibana/reporting.md @@ -39,3 +39,13 @@ Definitions let you schedule reports for periodic creation. For scheduled reports, choose either **Recurring** or **Cron based**. You can receive reports daily or at some other time interval. Cron expressions give you even more flexiblity. See [Cron expression reference](../../alerting/cron/) for more information. 1. Choose **Create**. + +## Troubleshooting + +### Chromium fails to launch + +This problem occurs due to two reasons: + +1. You don't have the correct version of `headless-chrome` to match the OS on which Kibana is running. Download the correct version of `headless-chrome` from [here](https://github.com/opendistro-for-elasticsearch/kibana-reports/releases/tag/chromium-1.12.0.0). + +2. You're missing additional dependencies. Install the required dependencies for your OS from the [additional libraries](https://github.com/opendistro-for-elasticsearch/kibana-reports/tree/dev/kibana-reports/rendering-engine/headless-chrome) section. From b3067706ccaef7d98a1148fb9717050f76c5c7da Mon Sep 17 00:00:00 2001 From: ashwinkumar12345 Date: Thu, 11 Feb 2021 23:51:27 -0800 Subject: [PATCH 27/28] added more context --- docs/images/kibana-reporting-error.png | Bin 0 -> 32921 bytes docs/kibana/reporting.md | 8 ++++++-- 2 files changed, 6 insertions(+), 2 deletions(-) create mode 100644 docs/images/kibana-reporting-error.png diff --git a/docs/images/kibana-reporting-error.png b/docs/images/kibana-reporting-error.png new file mode 100644 index 0000000000000000000000000000000000000000..8bb03b2fed69326a18be2edde349f022e9a90bbf GIT binary patch literal 32921 zcmY(q1yodTv_4EX3@Y6vAQB_pA>9fpQUi#Tbk`6=BdK(EOLuoj_t4!#ck>PYzI)gI zvKEWQIp^K+>}T&c&OX5(RAh0n$gmI)5OC$?q|^`)5KrO1Bp7J$D|spo+z1FX2=Y=A zpInmno1N=eO+2<;jh&sKmGGZe79)o<{3cPM5{R@?KnzS;I|3Obg2?i}yf2Z+Edp@R zpMJ-|hxXcr-LNblkOkw$2}YSQKM%a7^1f+;eCc5okR+ z15>v*UuEx!UKHqoYinYQygpAnVJ9dtM*T$}brh(3r$vQl<|zztkGN~bBvy_UBq?#g5FHEd)bY$F?iOUOUuDD_)ZDG2c%`j~Qaae8Gg1Fa!_UFyCu_Z&hUH8@l^6`p4Su z&)SaYBkU|Fc8c0S=sVb*U^N0!yp^nDDpaQ%q&-~xh@?)-SpK~T);$VW6tQ9KM zb1#w;FNzZ{I=Fe9uD-(k1s}u1kh}=?KZ}=QpUs(FYSMthJBkoeI~#EhKgvCA|Mp7$ zC)i_yvs1=E4C|6&FH4*{Y?wIUrY`!HKWO_S+^g+lcFDWm+L{_qx--?-dp(6mFSz&w zr6ut=&CkYPhpRbP?8~-4P@|uZ?SKiks@3<`Gf&L$snL(`cNx6w-XIIL(md&HJ!ze) z{(2tR)XvWJcKa*oxxkU#)X~J$pD2<4!pR{1TN>?bI3@D;;y+8q~?tdAaHaF1v-&Vi`7wVX8jY zsus9vJFX~?T-u$_plLjV#&%X}ZE5)+^S4$0fBn!~CbMA;{AY=M`V+KRqW^)ZhOzU$ zx5hZMpGkV%DClXvcs22wx$BJiUQ~N`qHn|+m+pwO$^Ppy-`N8koI|}+*%P=ZUDc?@klemg|u4kugUnSm#qqbv2I&*;9G(YV@sCke+QfA#q)a zyJYJ!K%DNE9K%b@2Tw5@-K&5B_Qo|IGT=zosohVHZ=Xc5PR+)yTbIc~;~rc{*PtNx zgU_&*uM97PrRXdRtAdX&%uAbpplrD!FL=$6WzR54Z@!WMeI}2)spVq%|plYW^-gjA7tfCP8<_a6a#>O)wG&f$VSxnTn*;Tm*Ozflv%s4A3d_%s!&i$wICMc=9#n?cf+=H_lS3xlKJZUl2{ z3iRDQc$UTRwmmrXj0>|osSnj@H`&@k@O!ibzTvhzI4fU{n9w0U65F;L5 zT$S`-qVVQ2ZKkj!Mk5FMRszIS3@QoR9=jf~ga>b}fdlV9A4**9@%iGUkg+yxN#X3SwG2+o zx3Cue|1fNj5r2G3D(S^*_~Y}$e3jr$>)agK^3Ipf)TIhVPvg_m)1@W^|&XloNTlHKkBWG zaUQyh7adQ*IF?nb#XnFi86!2Odc7>+`dj{Ar#zQKGd??>=J+%%l&AXMi=x>5Q)oOv z5V@R(a%!6bs}=`QiI6XrK1OLbk}rCCz$JM89|^6Y8Ckg2Su{`1+EbMqk~Gx1Xk3DB zU#7}I+zs%UEP5>x2mZ$rC`R8=YjxwQWtBagjFt{>a3IP5?lg~ryuk~c@e~^|StkFP zK>6w7&O_=W_mtx&ur)fhiJEc}R2WNn-vq8gQ+=sdG96G`LIr_-$$_=B)ZAF7ilnsE+{~0Dn!}L!^H(lgEURFg& zKVNzKxpZ09YjGJR*une13+knSukW)uaos#R1cIxuUi8T`jQ2ByN+gk1)%%c}{v4Fg zCJYi+q5JhE6S$kviQCi`I7vZim2SSIB*&oU@$dQZsUzfAYkgb)uUrg??I#`%{%A%s z^4Xg|pU1@(iXxOqI>ZO4;fQpOa#4Nj)PIMYG1Sm1{c=}4_}>#OggrYL$g(JMoIzKi zfnC5d@wS(VWXp2dum6*f%eU#EV13)ACDBk?iuPKT+llLuGf{4=~`{NwN3 z$@=#ne%O$2b9a}llK%kH|04QNbaovy@HNidLK~5tl7ffxCzUX3hH*QoXBh2^0BupZ zK|R%}RACXkEw1Aqo@IJtz$0U3b)%`<)X5Wn1pV1F5N6G34Gt+d36o0Hc(#T+PH9q7 zmgMVfT|xrp+GY!CgS4_D;TSr0;B$Ok%7xaA3|DQ�)Te`b8!f^ zFMBC;MdsO*AOGEW1Kh=$E=l99`%AWNJb5icSFG}^>6132N(5sy*f)ICM)tB`m$@DMvO+CMUW94cnNKuW0$2 z=-Izd+W$QG+|Pj8#rDo4?basyfaV?Vb1>e%d=7^4r}%RNBgd1HQUhC5y=iM3#n2SU zLe8^T)PG9L_pT80@&>gB7PeFy*qth@VA$E(CZd zVSGV50WbVH6us!;4+{0A*}!-oK3XRisxH)R6=JbI-@t(qT($a7vOkMyhi(CTtXQ$QJ{<+02E$_x{Mir+0XmK}r7VU<2vd}ZBTK-IkPpbc% z%l4$U$(bb#v;S+J%5z9FsJ53y19%_a#}8jrxUQ4`gXqt{Fw1px?Hs@C@|TD!N_^u; z9K4H{GbvmAli2Y^k**|lv=O6PC5NtV$ zLjXCG{sqmdBFY7gF>fwx-&}%*2;7&h8J}r0_ZmsS%xDjJ)Bwrgn_QH*ET+GXZftCJBKfaPIKBEi}>A^3()$WGg@ua;jA)hpCew0gGIHG?Dr*iX|PM<@> zRkR8_vH>NTNRvjuPxUxg;03dn=Lfo@5whvaOgkgY>K}<~{UF^09dnK+Zg2|}JMqcU zFN_D!!>r#u)59NQBRhxr@S@$Zs;H+yNkhWxe55FS#L%E2p-mfVA9Vl11l<11klXR9{CcJL<;1SRQ4*c#WqdKyy>DCS$5 zkw;O+GtJ~s?8bHWkRusM75q!dm!cM4poQv-X18_4%tqm9eBik(0LTm(s}^XC#5$Yj z%sc~3Mg&RbjyhFw5Q=c)W5!~NhsFiFOTStJcJVa96RI1+Y-}4e!le!3Zh7+@IJhfX zd-Rq+?TmuLTN?!V+rbxbRtx3lhumFU6)@4GXR>xy7F!$PyRK`O)*dP<&=r41IVIdn zcTf(rrB?E<78>JQ3?$w1K+riV3E3WVN%7cotgRhN`&OSixBzG>+q3bz_z+=Ih&}LaayPWiZ4s z(;1dyTriia^^TGqzvl;p<{=)z_gRf-Wz?~KP0Q%XKO0;;ag()ht?aWWanbp3 z7J%1pwU!matg|~yB#}=)@)PfNy>4>_5f3Qex8uepVF)Vxtj>1N5FL*;e-s&dip(5a z#bU+B7{us$%NpuEO2LZ1gC#&wd}KiOnBEE3P*+eA;{4!tH)$e(Z*;EvS#>; zfzjDc7+4#?eJ|nf+S+;tJ~0R~2QH_*3>)#JQNXj+YYY-W4~D(JwYoKVt|Fg69NqA# zB)gLZ=>~0nE+zm#tL)>_&JFg=53co7IS-eUjs4m-fvcyUd92wcnjF9Z@sKvV_C-4b ziNQWlPr&qVF5vaNx_JnQF@(>FF48MKA`dw%n$;H_u&C$7p)AQzsykFvr8_9V*tqM; zGIJT=(--u?Gg0j1SPD)R`~H2%q9btJB13onBD1kszLOdyemf!d-oR$xypOOe+GEe2 zGUyw>p8;?|V#E~%ax>V##si)Ol%g;Ct_nhiUBpy7$*&m$g57rv z_-3N0+KBh1IP`mYSaWF@1~QMQLg|!4rqRt@>L5Bx0u}^nJI~uf7{n zC)YW^jcaY|e@OvtrNn&uqT ztXs}h{`P6~lkA}{A$|b?OK5*93z&TQQYqJLNq2Fc zF{OCxx56>LknaIaz!TS%&~g7BpEDE@Y2a2B~StV&)@~ zfqAdcCNfVFYHBx!EG?Yv=cP<8>_C+uDnDWE=bTI`Y#Os8Sed$tU;RrV1jL84q3{kY zw=PYihhTSu#>FA0USBO!sw7BV#Ep#Y@@3?+X4Ld8V(QZVM7}Z##V;rQP$6xHngU*m zIhV+v?OW|ilt)P@XTV+JywRc4)W25+z3S?FTo%?`W(vADQLwcr{Sv9zZ>1-9^teUM zEj;k`U{+6U^C&Aq+VvwVL`1o>H|V$M6f`eL95nCim}yzEdMjF2h#nc$=>8m zlRoA|bR#cLh7;PZ-j^kbmLEaeLl#(&??KhRqx_0u)^6$)e!HiMQu9I8*e`6pvZCHr z1=KmV3oG0nS0rr;CnPzf-DcZX3gAg4WYk+G(vLt!LB;TZ!?Y)Q_)zCn)%OWz*<{ZU zz`sJ*O_8lr1EA3X=;AcgK} zFI-Fe1WD3ZcJeWHx}i2CF>|?OyzXWnS88u=`GfHD6TpwCtAdeKwX|lqYFzs^$7VNI zLG7ht!Wi=*{a}fkeswenG)Vl4Xw(*>%L|;~a|2h>_P-ynyC$zo-k=Oi*k zNK&Xm1V~7As0XvM_5O_M?$adwg%@j4e4~!78vp%p8e~TZ-te@$E^tb&4kh=lNM$?d`;-9#YiWVN{ zd{P%0waD*6cLTiy#K69u8(~7iF?Z2BFiBgaIVpnb&nv38{gl6qF$jFu2mbu2Z3byb?RSuc{djy4dT|Ls7 z2b-U>@G`1YDaoH{lJM zXf6w0C!7t2#9ycLHhd@uS~4PPPC_vl7m6HnM?BUiMWEKZlvBDDCG7rMQnCmZ#GW~G zoL+#h;XNV((KcZD?i|!RW-K!#0NWaLp)t^{!&}!|hk-)C%fIB8ej9S!l9>}#4?2~r z^ecp-+?{CHlDapO%@?fWcOOPT0HUCr>4`v4E9*fEreXHLWCo(^$Ezii zV=Ct}mY)33sA=c05GnVZJZ+w!$x1!WEYdYre~VBd5v(+Pqk*=!;uIPPjaJGIzvHkU zZUJvOt$oNLl9{C7zXrjJ^yVRL!n+y#zpU2Gj5@xr4BzpOM{TbmEmbb?n+O{&*c`v+ zt5q(Jg9_^y#m=7Eifgb8H4P@MrI0kk^1nK$q1`iG6M^esxYWw*vA0w96s$jgh-bIjDlinNeoe-O(_O^mYM) z#W^#Aads~8CgMB*@r4;0dv^)mm8LNEDuk^5Uvo9B0TLI_IKup;H$Vk62c8D2$NsK=WgF|M+@@VH0GR-y#It z2_=v-&gpCPGTjS@nq)V(NBhLO+K+%wXM4xqVC3}1AL)R-9nBy7$ZofKK8K6sO|P<44#q+eZ|epJ#?_i!QN+l)P0ZYwFxnkr3jq^a;Bh^wowLBc zDVY@z;<2zx-+(X6+ZTzfT$2OoqTcSij*}o;j-ryh5rDlSNxtb5U;B$bKs<71rJ!5G z3jI5$xzXGdI$RL~BX_))&eW2PP7d2RnT-i#cjKYGri)GB320*b+bK=;cyb0p=Ri>e z!ix%1q1J+(pDOmCB27!Dply>-XyA2W(tK{cRkIG!$NI)-bHdLKbsXe76LdF7r<xMd%IaJP z=gnG7xj|w%sj*)#GwUrg7jd?DK((6212ElEwJjq_74~GZp#knhfD#U6o@@YPJoU;_ zsIo=Y$Lqm%xxZ0aR}Km4e{`Aut{9!|R`UMcgsSeZhdUwNc-oWBxq;alx9bDR=A!gP zzY22BFEnwx-Z}|0|F+nbUa~MnN3(4OP`)k$NAXbSqHnr9?x$>pycb?v`m(;&ke$O3 zd_YrS*`@2g3RS*!RNt?JZ*3Z!W|=ejBm?hcBtU}*Hlhf1i3Qk;Y|^U(%km%5z+}yp zbZ#SgF7FR$B5}fx#|$b{fM%G;EviCum>WrZx*spoV~g<=*xbW_eBheOvXG>|ehoJ} zVasv58LYlJl31XXJg$3e-l8GA6H+{>zceLX_eXhMw)pdS@8{7>M8Q2T*fGv#>{kYM zfoUc+V50QA1mgPNC;5(Up57c({v61MuID9T=*7CG2N17bWLwy(C)dSY?XW7*dq$R+FQlcJM$2d-fduMy|l$S;-4R`N{&?(S1X5vxbMxfKV69v7;K4+u^!F^ zHRz&H`DWn?GRg=kVL2<|?6P5;C0?9@J9%&u$s_tGV4l9; z%0GwEDG%-pg~P;KUIH>LMBnr?>X$BO0n<(b7|uu^31z%4GBYJitW#z9dVF~;0&+Oc zQcQ{Fc{@sQH(hOVR1mFx(*^jvH}AefvX3rBQx;)Zm< zI&{bZsrc%u(Srsp)PnF}hKQ~z2Uo;=J)NlqO~YEAY3Zn2XqHLgFD_P^A)2>2neRxEM7{;3PGA`&_CAb{j7MP>sO&~4DxQari@7{ z*z$2*LaE&d-_4LFy(}?xH#XJax3(%$CswUs?g(}SnG!f zO`0E5p`msRV|G7)Dg1qV)}+018rUp9U>N(tQjtrxN*8$zl8ko(r|V|QPDdK{ijg*> zpHs8H%mmuNx1QE7g6?f|1AR><{gj7~`q-q@76-hy{BtVg6AL*q^HJ0~Fz(tIp)u*P zU{Q{?K>ZS8Ki7JY=%p}f>JSCM8e7|uUWat&{TF0DGUNr@!>LQomMdYbw6o|gj}^{A z}@$_9f%-Yu?MMISq%t{ zdGr{VS8{vDS(+nTHrjUz+b&8F8I7C2#QOrt4Usx!GnfDzeGS(@T_6+&Q;af$DF)Xq z6m?omywoB(8J()AsUp>*J-iQSY$+IOd}nGVcE)O67!7?LOQsx`_l(}62r)zhQfE7& zk6j`un<@HUZwy)5@*POU!4rznIg%NQ|U79{8j9Dqf$zgX=KC^HREug9y1`U6-` zms$PHE+b#9{%3P&d$>pjONj2^^GtO3r6jVQ&!wk5o=M}&;Wz2(a=l-O4WI!vl%`P0 zHWL0<`W=DWihZbp*&*HU>^p>2VTZ0&BHbN8IWK2jaQ-Wge&$&<%a{CfZ8O6QK)yl5 zG&PPilnb!X*Ux00r7#%mXUACzGCF+Of_9erSCQ@prQ$qQL<4glzb;xV3D@y~bK z47ce7RQ_Gve}}=?CWSB(XOkUJjKfNqVYv2!$(O*U=Cef^ox`Ox6LS1WCBN~Rj$@|g z?`E_f+wDUsJ*>eP$nG%aS%%1M2*lKxY^i%GcNI$P4;5Luq;YfjtDf1M(HSX|bq4O& zU!*)hu{%{r=HDIS?^m0*-Qr9(A{~}MGtosJwy*Q97oi>`o=2*c7*3o&eKiC>yfYM9 zq0_{C-SR)Z0AB%Z&h0w%`%c?Fu8VFr!uoN>2$#OU1H_I$x^xq$QN}anhS+g^Z#nKI zBTm6aRt=Dwo`fe869-^HGdo3gYTQuzTnBgC<@J!dx`C+4_a*x?c-157{j(V0Ycgah zU+^l$9&*@#D2imSVGOJ_33LT<=uII*CE&*?`^-%XGy4*E>{aD<0- zKK%zClm`0fuw8idkJ=8b6_E?;jYU0h)n5oq;gKyCX&^_pZ$TL3`f&XttzFpMYGUDU z1%uOcFxBC1=O>T&-s5bU@0#}Q;PRFL*EZ1w+_W7lGr@LU?0hEKRrE4?5%-AgDAasLkGEB~V;UzH!A zyZf>73VCI=q-sbjPO#oTbV~yxD3${M_We_-q|u`5wz=KB#e#TbG?z zNVi2Lo}+=^xSne16LVFZ%$pW$T7E&sxwO)h;C67Z4PJb2Di^;_&W5fHFIUH`VpdC6 z@r>+8o;|JGr{FkGyv@eh{YDKiHDt9w^4%2YMgnOx=~r?z%chRp>*YIPtP#CZ;q_mX zay&^BRC7Wn)P z-!~|sV37DaQFJY`XIcsJk(z26c)s|papx$vNnAHh5lc4V(?aQ^l1^-`q# z(sb7$o+&`$4Vq$p#+>WY7*#PdH{5M&6Z<~bXftzUjt?t3;_e`<3oE{PvMq}Aoo1bF z73*WO24U|M9$`l%I~%J@Sb%^%&7eim`yy-OEEDAKc}KzTdN+-iC0n#&HWhVwSLRBP zwmx3b+*^g)V(tY1DlmPoIuZSDD*L+e&5$;o@4&P>o>UXt zLAfg37{+DCc-3xbc*E|G&$6Ud0J^R~{X=@;J|IVPnwyp2DV)FJZrv)bw)jpNaj+#8 zk5&$~E6}+X6lPkAe%1w)>()Q1a{N5S1-@1u-0CQPm>cEog>~J~LT)f7 z*KzhZNg%f^v;6#OSC>$XBP{5yDd0y(lZmijVmmg{*9Gyw3w;F|J+!{wCE*_FBZ|2b zw2x&eCC#ChLa6un$tgI)!ynG!Q$Ro9apmS#t$O?-XJaw)E6JA1?ruvmQ3}@_mHWi* zT+p9G{pLM1oz@>5Pwt8a>|1(`jcK~*dx{pZ+N^t9k~x|EMr^Q_RC+2hT^*#sR|gVr z0ze!Lk)`AtF;u}zjne0bl5S>hAR+c>Si4D+ccr4uk|{Az54aXmp?PMfK39?WC)0|! z$97erxxT<5D&2@~KOUo)5FBv><38GTLhrb_>$4B+ojVWxK#gh@bD{soHo4w-B>+O6 zDsU$n9jAg>r}D8d*%8U0)S$#aDJ5ScS3V207MgIbXcrO4CTn`}eF% zU#mBUce>;&1;*ER5^e0&cNrz-{cK{8G#(WU4ym2uBbuNup%4QnL81bYW0sY~A1hd$ zGz7k-B;eilI4U2S%CxyFQgE6C3J&XvKE#{J!$qNEimYaAClNmdNHx<1|7(@rUKi13 zxLaNw>tD7bS;3bL_{*lB%2yT+b1eYa$BDNri>ru|fXD{pHoz8<_Fh@)yKJwrEZ+l3 zX)uM)kpPf2(>=nDWJ>mU;w0jM*NtjfMNSlQJ#7iz$F7)l5}0*pvB3clp{S^i%4U$S zelO=ci=5S)mp-wX3>5m2)Fd7af#-iP}|v z{tJcb782sQl1c3H;dfRJt`?)#Ul;)}ylh@q=zH8c_k~VqL~J7HjQNWYJ5ohVbC=F*$#;uWo*q_(=V6>750fEnXMyljHfUdl__hUV&m z**hz7`vHleZ!@oiu|Kvoo0}AYs&0aNC)k>kc;rPrT z%##Y`Hhr_O8cdC2L1}X2Glq!5#>H{D-4#|}I@H4)b(Wj!9sL2iu!wN46xmGYdLg%n z(~F7OzGKc6(c&y!vQ#RgMwpJu z-w8eI2*}5=SAK?wuiqW<%t0}!QLLsa0mITV9r+9ST76WTZ`PE8zEuLVZmTfi)-Xo$ z#8!F!k7da(Sv*b+q4I7woVisp2MV~_2|t5wSMu8UW~B{N!dGry_MZ_2+*ZX5N&=Z8 zU-kVxYcyIT9OmGl-h!8oy*Z4_fmq&G9phw-`do$P zog|a+*OX&AwapoN*@Ptuc;=^&HPj^6fUc`^5c*t<$?LBEk3 z68;Fi9w{496!x68W^sCbR`<36!8Mq;hk|S)H%__jC>I_iSzKjUH^9ljQr{dk7cBSp z1sYDU(srCiLW?fZ;8k~Q%>ro`WpO~y0Ffidaa&H?FqZxj=9^;W)U|Po#g^>#H{IAu8m65q}t$udG6@f>B%Y}oZH&<~sH%TKRnlP1T8 zf1U4)M*EAa@c9p2nnpKA@qBQsE(xt*l5i(uaQafFYe5x%6vt-SS^5!!w_?YL5E%D6 zt2&d-=KJZN;f$pzG^Oj&lDKfd7w&wtg(8eScs5p{ygdHJbJ0l&L)3U|fJMKh|5<7imADAXD?|rrbg(0Q zTMIUbl;tdr$`xO}C*PK^S$fyP#oqrVM0WOUR`OG#0cfjJ*RyJiKb}ctE~Z2nu!y@n zK8+W(e_r7*79!RedR+lxdIfa&axD1kuVT6UDDOMgsEF!!ev4K$LAgTXW5dkVEzEsq zdH3R{bY$kmcgi*hi!6K@yZANJ`YhgB2GP1g{bwn(y)D=pw=-kLkv+OTLV`$!6d6W@ zDWx|~rK1&Xq_VPsiZ@zC4v+ZLlV}oczpfigS3xt^RIJ4nJKzVXzY>)>Wkwb~{8awv zU3>}Zfm8dZ#2*pA@gF?Bb`LMyFgk4ti$tB4A;Mn$sz`uTyq`PCsprJV&H`(Prnn{G z9dE)s(8V;ufjk0-(24G`cUD2?TKtevoe0udUj@FFZXKPtHpda{&$zPXs2W2jPe;$IgO;vZE{;krMh8p!&6w$RFEiU&!i&5%rC={>zVWpj% z2fY?jG!Z1E`m2?-D0>Fj0LkvOQT&ob%i)#lrj z!<9y7z9ve_%(sMVxPC6ATdc?BS(5wsgMY~lk#4jkKxhpds9J)9YU!}Ry(Nf{zjG?P zNzCr)1+x2R_(6;SHcH|gYqeDiW;&I5%KL?16+~UHR{u%CN|k#Wi$fDH(?0Z%!4sU8O)o*GVfCts!`Eh*8g~*6R@s>}K+heXTUG!O)rxc-p&D#}~)y2V{j#Jk| zWmWjfdL=6y4A3nYokU}!G-P@!kqrX=02y|TeW0z0K|CXUzxs!=AHLkg0tl&D5)6{F zNJ)}fy!Ev-8xIWQX~@7K6V$nL6d)*ZGo60%J(@<|jg*m zp*fj-`S18<$rlU;S>`Lf$!4n{rVc7Q28ud}xH>8K=dVJ)O9+)qS7j=T-zR@STQCWd zp%J~UJVNj72q0pgQ_Cp8adM~%#H+mWmryt3uGkk>{Ndn>kUG)^MtdWi&og^Ps-u^l zp_YF1z%%DwO1VW(g?f{}jp|kkmHf>B>>*%`R1F$-pUu~D+v!2$?o8k}?++OrcWuzg zHdnn8sDmONpQOu?|H=H*{zKc}^i%f47t}qY9;rk9Y~rCuJUQ zC;AY(B(*I1*9+p_cQ{&gvz;IJB%+oYsGTVMI#AC9oU++lo*a>{(p^~klu<~T3PDFXSv66fpLSNKc)Ez%_NZ+Mr^X8xRMLsK!X4jZlF z;$~8Xv}LP!(_xs^{>-b|7q;X-8%@z)&fC=_?z>r=qFENC`@8Eu&oHm9O((r?W|)N{ zZ9EX)@%DqBM(RUh4F2eEBTe&tQO?U;M3U4@W?$kl9ghuGyl{W0zm0kBYr)+Kz{0Q3Uj;(`>)9nX)Qu`U{lei;VOkEH!$xm(l!EOy2iL z>5DFl_TYthOVP9l)8smT0m6+{zUu8nT> zr7C&ai}#IlCz3Qtzb=Z?X>mqRi^zg{X`S0pI#trgFt`O<{ND1{>5nc*_fbkEtb1@v zw)W40hji}rk@J7m|Gky3=tjgU zk%T3+w7`;VMm;hv>*zUEMI4LOyzc`_GG}s<_V%|+|lbUCZuE` zU<%sc@QLp9kRkF>=8deH(ZT_)cE5C?Z zHVRAax?0Osmm<`nF>g}}c#rdtSO8ym#)RpLOObCDC3bJca*?#BSfH1NC2$_%)I7&a zHE-D?JD4s)EK#d%;NC~Oj`5&RH=4s6NV-7^Uz-Y&e>P7#O_=C3aln3!*Y$O#1Pz_9Jd9dI@LBmN9Kxqx;pjbJzYI_Y-}37a+$jGs6V>+{M59p zQ>*FVR|PE?c#P9;V16yEz=^uM?X|D903QJ+Sv*h3am$ zjyiz>UoqexhlmpJhOTZUc_D~JVsvIBZFwLGxJuFv+Xp$PAMhrpFj}=-1j_xUg7;bP zmNCn_)or?q`M^+TE8CZ(^-yJrLoNM9BCXztwJmLH`sW^9$sB?WF{wt_ zdpX=!GQf|a%qsZZ@75R~9JrbLHpFmk02)wp_IHGn$sR@6`nZ{|W8#c}%yWNopsGNh zGb{RF+$f`BsjNKO8A=N$Mklu*d>e5^u;x4TX>rgA%hou@i48@C{Le(&aJl<)SNZN} zK~NtX`2rx_F0xnirwF<$0sT)yxT1QRQosrPT?EAnZKZ)QcF%#hfMwYwua@jl_t5g| zXIndSkTxjEJFX@iOAO*6je)@Mc4T#_9h6ha!zrmXLnjXt_YwP`K#MwFEBJUbNkL1O z;#V{ES62Ul!=eUb!=@rO`>{yk_t6iiM1NZ4L@**`+t4Ll!BWy)`wNAt~e5r>;b`<7`DmN-5&Bhr;)av z7y1|Q<2UFxC+o@%7UZGW*ATyHGq=Tfe7X6`%xVw-@jgKT6Mfr!(Or-herKS zUuxUKW0T^Ew*`A4L>pdqv6n z9omSafe^2tC3aW}<$@#h@>tF4i9sW z;I)}>-aY4nr+3|RdY>;+uPtRMR|)bqvC;vu{nX45Vtj>hB#~}%O=D>E7`yBvndGhM zy3V{RvDZF^0)GaGmqZ6dlR-YMbOgd|tLAk7nHx>!-30m4hz%W(D4{VTy9QE;n+#c_ zoZ^lZinmw$X3`4G`mIJQB$>gL(bSTi!Y!`_Rk;3nb@$iIy?2-6c zpzlbaAuJ^pwbuy#riZ!@X@lx4g9if6s?rk|6-WX7>~Zg;i3o9U!#ZL9sbJ=aTz~@> zLoh@N?=73CvAG(1^%hmY5NE{o@P_}$nFGPlB4dP5`h^b-Giwg(mI$A;{BQ9$)l7PC z!wpS~uQtqo$)PL5Rf@VH30Y!?_qN5$ZU2pdsG+gH`tVb{<)apApT>(Y)$p#stP>F0 zDaD#mBw%lSH;FS+5LJM%?FYpc>_>4u~2Y}gC&@suk~nk2y9qlEeA zaY5@nck0Fda(89gnXvG7)7eR;;uFG=`!f3Rkk>7f3zgofvsj~BGnMDD@BHzd*DxA4wA1z)KBSufQ+o-}FSRs; z6FcrD*u_srpErHzW*1;0HFNdJ-Uk}w`p!{{YTK*kB1G5L<=&HZrsKNGv>)*g2)Fs8GmSqVYZF%9pBjxwR}8JVs7pycbA(o%LLK zKz9vdko7tnU4TZ@q3ZGB+tO4Rga!5Hv&-&)9wx2S3D?`4irT67QR|*MDg~FCSfnh$;R?QS0Ad{=1g$&mWHXo#ts8 zm;Yfkf(*%(Ksr**=N;Oc*O|;@w+#=T)IB8EEdZ#phIF_w`|ZQyipeTP%h9@`-Xm7( z^+ujDit7{)vzVN}!57bWXzwvxQUgJy=s7$cK%Y-oB``whk8z%%Qi(^dTTcY&#ZQZ-|cv-mWCanx$e;wQ=w?~3i?h} zkHJH?ahRou{v5lptX`Yx^)x?it7gKZr?$Z36IC$Qa`S~dKWv|AZD-7oOLkMBPw(-k z*g}j#cqYcvy7`m0I+dHClNKAon-882ca+yr(5E-U!Z1w!vp-BL7uGKHhk|w}1|(~+ z(uv-ds}hNsz}*afHGe(9*MeyxdPpI>H)gWt=yi_< z-k_)Mtygv1`x?0o{bx8;oUSjlv>7E~+JekZLd17ap_p{{7p?s~I2mc%ll@&#k z4POX@>YVgmAM1fpINWcj*Cr>1+x&@jv6ioTK@;~Gr?J{r?Ki|aL8_joTrPVPY5o%< zF5D%}mr0TUS%z^&X7rn1jS-o%KRWm3Y(&u5?^#)(HcgTQzv9}tFOQ7x<9GF5O|%*- ze+_+X6r_G8S(3*grvp=aBhD6W(3M-zZw5*xa^%xOl?%i!^~fY3D?~V=XQE1QI}^dx z(1eF3PHnBa)LmrBesfAcbhW=|DX^%-lbXcPWh6#p9e(FwkY_`l1|4DK?0AN2qz9L8 zA&5WRcCCvy_4eFkm9P}d^K9((Eni8uBIs5j`Dy`s)U?XxeCJ~_cyG7vCRTf#d4iFI&GKXK<9c(X{alb-5^#6gg#r1cT)zQ=<)C{J{K_WC?eEW{eTF4%PX zxwOYu)sxw(@qicpB_GNu5a*;k!s}&(=b+|w7n7*JSIe$3cq5Ct*DOyADeY!(JLI%C zY3cq;vg2j_by4e$_JZAj^GOE41BEH&7I^=#>_ury(dj61zZ9b9v)y|94SWGNLQL07 zguQW}hyTDJqAS59MIG}LPPu~vwA9Pc?Ujmw(zc7`TcVQ2L+X>YxF(zt_a*GHAJqn- z?J*k7fGdwS>-DRr6{EV#2AItm+cN4l#Yp4Id*NdVrqyC=cXX?|RiV0vOCj6+neuXI z)vN_=`1%@|$nG1}-nf=?!Ucy3HvIR4pH*moeJSAVBNuudX|n66$1=mt!P`t}3ZMP} zHR~C(Ke86f1FS0OB2B9sXUv@2NH{zbm^_pSy2~0CgBR=ItS;#-pV&)ZJbd5Z-kSuJ z>Q%^69;k56c{i|kakoMzUJD~y)~4t!Yrsn^<`s<9olTxPx_S{nEBJ+2A*Z09UqyeG zolMBGYMv;E465el1-NdLa zcDTwbC*CJBbYETj5QBW;n$8K2g=3x8%tbB|S#oS1F8EduTOD6o-4BZqYht}Ldlln* zUqUZizG%_8Cy3fNao*swZhPG#I74%Utgq8nv3QiTBsHGR{ahz2pwEyadwz{=uLJz1&CW)ZEQ-rd~Lgw|6Bs z$9ipN!a8=l)GM7{Cu14t?+kHD8ulJsX5WHPdq_lGh5~Kd&p3SWMLliiE%|RICr(tM zUf6lbw>B1hWbiuor#CHx1H7sT>{+>mS-1I8*`^C9QN0QUX#U(j_7wNJ}@;H7F%1jUXvqL&z}FA>GW- zIdt}N8@PrBC1>COdtz@`n8xy*4vGKK` zy3ZXQy@>(ZK+fx#9?OH0`<*b~JABW-eWpS6ZM&IE8OYZCW<1#4*A!A|5;ZR8k#1Nx zjMGGxYxSSq)|JeE7V}hIBXpftoZ@7!MttMSWgd>Rx+wZ(8pTc6roLitbJbYTU|QAJCvthyL4qOlday<1uTKcN2$IKnAz6iH zok9|t5A8)~B9KB74jzZ$WU;#+h*d`AVIUk;i{=p}N)8(%DyevC;r+}yTsY3x_bx&* z#?0|iFIzU=>d_)XFcFL?IhKu64t37!COye`hfh?Tc54^^=wzS164RKGbLb=oV27~h z9e#PxUyLO&zO2g{>%ljYnRqU6u|cm+or6{W0NnEJWdKu#RQR3*kO&)gJIBB-X7A!S zq%UXzmQL*ZfjF|gCE^$w>6x$2iw;IpT8Vbu@rM9(fWCkG?kL-_^KVgB9I{6f9Ea>x z5-AY6$DDvs5q*?e;9@@NZ*=JOZSkU6pN z><5rVZgc=>J?rX?vHOMHauc7aYxnlSL_N&4SVf@BF~HwGcJ9ni>iR)E+;Q%#{@@^7 zmdH8-Eh&bKS?uTn0Dy+i6f0Ue#RL;hvv786chIZX=Dm@&2Exp~MwwC3@-Chgnd-nj zSA~Ji{n|hT(Xy?N=RzH4gyjVdk}@K9{UMiC8d1Yb4@hav zm!^h{#3jdC+U2eu&!F)WFk0k>{(SD+tUJAPOo?ROfAB5=_^CUw1*L^gQ7mLL*A`J{ z(qLft9N&w9gF;T`-_I=JMwJ4c2y3u)e>f7z1}qPF*=+yC>>#U7vX7P^Wnr07yf>0)>^pO;{M zqNU`h4ix52jW8MN`n-6pEOi~D+;!BFvtK`3i(dN16h-fNO~%mEVaa^v*>b%a0D#{M zKALqW2K<2QOWCeR6*m+D@Pj=MvshW*6DxVydO~?XRD61rP;y{&3$bZned|r;#*D}p z@=)oakTeR1OD#LTXs6Y=O9}8rv5Oi+`IgEQo*T1*C*gt4Sq48Qw{ygaT;jX9gQRYr zm2MdZ=U#*&~0s1o^t z=FsNN*g(nT+zlE&EMe4Fq65We1=pK|)vc~5XsYyOL0bBl< zZ#H_+(?fUTS}-gi;+4F^1$-^+jh)m_l0z`Gp#Nor&t4-nfQ3REcB&^ILm$FvT5cHk zJK6Xz{Tj-RTJVy`yK#pC9+x*=wsy-887-WW8M~pQ(e3z|F|+#(6xx(Xqs68N@xI5F z=wH^d)XY~?9OPjm{owTs2gPEAjDm?+0B$$`Wl zEw(KyDQG7qc`!)b&<=Ino8=b-zG$_8@SZF$vN$HmWGmOvfTN6~!5)Kn+4_elfWZ4e zQzz?`M=Z{$g}Qd4VY4uOS#0nXxWZLE={2zFxlNnwP;yTyiXxV3COMrilq+cNuUt8q zoI-ixi&AM4J-B$2dzK-7%dfBud6#2Lmn1ov^WW818~P>(J8Qc6Gm2W5Gq#^3qgE1z zq@Bv2A$(ey(j-IPcxM?-&QNCiJVX?{wy7IkYrP&?5#xKk_S#k`EtAJ&K~raYfGSr^ z>G!7&OR3qd2$Fm{y>$nEiQyQrpx2elV2v`IrDYSzCPBr;Ij}6P5<8K z=um#&2XV+?*M#p~)lD1jpPUAxWjz?E&CWZi(WoLR3V#8=oJc{uec#C>&*jjRC7;DB znfqX^;r7xIna6yEGbppJ;<@%aJDOEFjIH$# zr-QnB7=@4&iS25--p{l zSF%eT=UsX*JXJ0?3EQT-d*I*;BsRKCJmzETTX!?XE)v6+t1O!)#_XIVnLJJ#J%@bC zJFAdA02X%!#45X;Hr|)Sp=t92XTisimO)`gJ%2D@POYHKiwknipYpcz?F>W3-UEvE zW4an#zz}1Csby#h*vsZh$z)tv=tlfSs;1dTmP9I0_+m34(mp=A%y_lODy4 zCkHh0&49jJ;9bA1vd<$!=tO|1+HDvM!6_jT8LrGRg!*-Qf!C*!dnQDP5XcJckPHYU zvzgvDI1*N#Lj1^UR)%P>&X*i6W_?ZLFTz zDV@~G=Fn5R!>>;|VnuH??}M#_8@X?$3)B7sK;7W=$4Qt)%zf;`!F1KYv6Qc)(e*uE z5`X{lh`ZzW#fexPI@ZxR`d$Z_d`IP-d*wBIxXn(KJ#Xl5=@*XI>LGR^PgD+WW*q?Z zY$kDTz~}!!=5Qn)X|R5?S59}mME9owEeTM}-$Z%u%>Sc$zi}N|ad$~PY{*I??|aTK z6DE86O_tI7$6M9q0)C%IGItj+rrz9F&4tMF2az?Ay>k9njAvBf`kk65{f)91jr<@H z#L?jXt`c$V2YTQmX;9=1Tg10y75`cyfa3Po3U26Q>xUb-0s<17>0b1*RBs^IqnYge+sV2n5e@r&^X3s%*zMqwM81|L z^NV0e&WOEb^9a*ZMFs5{REqz8*@^|r$$Myu9Wo?B3cv9MIj7U(aJhxQ5dU=-2pNW) zNRdo*k6f`T`>afIz%TM#F|h(wm9Jm|EFn7eot+&@kb%|7mEU^o?S6=_klVNT@->*+1RvJ>*2kw5A$Yc$n!@3Vs7*G$~@LazK&7E0cy2PU~5%hhy z;9{j;1dk8mr_7g+Uf*0_QASFdBCS&f656^qVF?O(hW1#V$L2pH>;~8-Gl+ zGYCuk4@Yb!X^0e%W|R5GryxM$(jz}x?fh%x&#uRU{C|ZU4Q%LQh-EDMOfqR7EuPop z2XYoMDWo;NL9`rTr*1aE{6*tMLB5=HlOg}S4VhEOUj$!0jESa1m4~|*Md$rjr+BH0 zcQK+wv8zFyoF1IvjJXGbFyVX!RutCqK!<<>Y%1d8<1!wNoR?oVyjoxX?|7j%#bgNH zaEL#+X2GU~4%PJ=Il>2Bx3u2$o|p&ki#gyO0&BW}s#YQ$-UZoACZ9xHGd)Ts$am3h zycX>RF>U=SjN6#FdF|OWuHL~MiC}~eS25PiP{lcj{+y0gU}02EN}mUAMm`#COp!4F ziv=ro z#l?Ee6I6-QnOw>C73Ugb2GD`D3|s%w@&%7AGUZkDZnvS2%k%Hp@e@D4<3wsE7}MX$gDg)R)p~F)(^IK zRpOu)ai#yU)IS+HkAN`MyBm|boUjWJAuhzAV?1y{~e6337u1JS{xR5Sa(UZ%58I$jJ!U;9Gep$?dTLGe^g4`w8dksLOD@0K@ z54$nK!r;K(K9Zk~`J?*vo6Y6#!5%;595}$)m2?4MP%Js>1s0-p0pN#|SWMfkKP6St zk0p0OBBH3=Hj|jn+=s_!ATT+>kD6EUa=l9C8bS_~1TTd-qgqV)H8L)V3hPdl@iYKu zp3O)WzUEY&AEZroLTS!st@N`Os)zArxhDjC-ak}7Oa$8{0@V+rAl z!9xv(6CG-Ce#u>V-S6|6&@}ZqGws;=(j+WBCT7$1pflql^=#`QMgdn`D(y%!}dR^CeurV%>=m4bV4`aqBV^l63p`^wB z_>nX<;p8D6mr>_-Pp-OU0nlS_uc7bfQ@e;2WN4!ZA`{SyN9r%3R}bX|tF@uc(k@l$ z&xsEC)b{1o%J4(voESjxzTD~(sv*tqK_y!fo~-~%Ru-Va>8ec4$cmh;kM3}m^2`SL z>%PzBBqw=6j&slg@`uq-v|cGb2!db)n|UuZJyZ#O@%Vfdc64>4l9!y5^Ozu*{OY^E z?6tq(;a!f_j$LpE*jEOVVMw0*%a?R=Z4!mIL?qO4Pap<9L1_c&=e;Kf|&m1_! z5p(WAh!;YR1G{dHO0bO76JrRQU;gvYcDf-U9h7YKfGAW;*ChP!=J$==?eHNEg~4nW zQg}Tf&Bf=GUqdl_^~1MhpUob1|ML?jCkRH0Wk~puRq&f)7L6qxT*Vdq1MgQKD?#?$ zRKXrTEf1hR_oA-~k}S*mIBoC@+T(Ehkd6u^1G}pVF&-HnCZ|g`6y*BEo|YafXpZ(a zkdn1x*T@u)kC(Z?QcGzKZS?_l*~AnQdu8>ka!(%YB{S-`xIL1Y^}~>E^#eqNT=9W& zT-KlObK%jfkE;ZIT2l)>wMoV;c2gYmLG+iGv$&0aC*EewA#7oeQ16PTHqsdDIg z>*jF=W+3ZYr>I1RXjb_QNCUQFC?%e(9<_y~WZU|Eyxc7xiuwIU%9f=yib@=4*;wX0e|>eN4sYPtknF^d~f0 zd@5k+Wk2aVF$VY&I8@lt5q@_OFl0WcR$(#LydQ-0e4{(L-1v9QnN^f|*26dEQRxKo z`1=vybUK#>JQt{KbvP;M(D&kZesh~}LzUuE={>;SQhREWaeIhmlNRYKGIZ47Yp_)c z%t22ME62##SW#1!CG1RTkJtBPT(RHVLgsegF{@`wI9*Hs&DUYg9#z0PSV-^KT>DSN zb6|VD_m?84P9T1PgJRf^Rsh_UgX<62Vi=zqzoDT7oDv1|V`to?{f$N!H9Kz|i}wfk ziN9^F)o#lJEunt?P8G2vBV(2f&FdL7AB-lhwC^??7Q)WfY&WtKUG^AK@1ETjMXpTh z4O>;^))>dzK3`w@X3~rvZ>*@=84!NihX5GKZNay&>;$&FBt|rPW9QO9^im!TNy(B9 zX{6z$&y#6WIv-EjwwzwN3=4%cvstAkT(HQ@UHBL`9rB8N@@KZC{cT=#3L(Fq@>rsI z_S>{WZyg4ob~UqHD2;Z$?JW4Rp-8F7AmNKe`+RMweHkehfPO~BYZ}Ss_zQ37D8bU_ zvXR8z*IKgedJ8MLLO;CUprU=%g(G7fu&akYNhwbjmz4m*Z=x^5OS?jSX^JN!+ z{Zb$8wnr@L%=fMf0EuMJfO!%&%gjc{Z1OKDImlf5KL8~DeE0We>P=_qRP!AHkpne{ z!fU{i`)1d2Z00GKxjf8t@~37L7Tfrnt^;{PoLPQ|rySCLHIYj;Q%9j*>x8 z5ng&+kErFN3#Y?>;@79sZak(l1zc=pHDI@04m6mTk$lpZQlVrwGT{aD-fYA$2OXRxCSe5MvxL^U-#P{%ww z(>?^@aT$MpO&=XxP~(FMUMsDiNAe4Ph7$vGNYEy6B}~O@61vL05Z1MfQ-N-!QWGT! zW4~9e!U0c0FVUls`LV01wd+UF`Wm+%eWTI0{`8#btavC&%!@x`H}coFqg!1ez63$$ zE>OT_yfo)X@;*HKF6vfE>FfiNxu7@!UGaYbH_k=Y;F3 z2UpH3C<7@`E4){QBI3F{OB<2TQ(@*@msj|YqcA;NO?MW22wcrfh{DjhB~4_2>c!bj zI>PfmO;qe(beZWz4ha^<|Cb3yrGM)xhi34Hv zq7L0905=jT&Ou8|M_mQIOLM?K@vL%qr{HH#!fT@T@9Gm zbI(Z!K&DmqSQ#c9r^5Jw-6TUt1mOkSZZ6*JM8scp+UJ{B6VRxclk}I_$*~IyGQjNTYo^B%QK zd_1DS3-SrtEOl&&f01AF9q@N4#L-VMUTF*|T(qjbc{{%1F{IIr&wR^tb;{h)hOLag z8aiE>2Asj>sIGu1|AHwD*K9|IM#Z3Mo=Hm73-9eO<9KNwkmX%a-ub&TxqgH%;q2la zO)$BnCM^jbdiO-?{h&84Ha4ZZZv%0*Ulnuuf*11@h)G^KoK~lB@I`1FQH~Zt@KSik z@Rw+6e&p|=GmDF;&*uaCg-f$`jRzMvzOG#Oo3zvSPF5^n`B_Zi>6iUCe*{Q+dr!n} zR=swv&s|%Nd$&Eez6-gfHUnilfJWM8f0*dA`rc}WwU%p%w#LK|2$(9Xi?M!oS4B+k z9XJrs3ybBIM@wR4fkMNMzShl@wtpAt0jy=98_l(8u(7wNXr)^w6firR4PAzs%6=*N zxGCH#L;%L*wuCb<$hbTd8Ze&^`{cQ~8amawPJYTP76q`uVq_&L_4^6YoC8QBOD5cl zcmbnN^%Dh-tjzWKZv;Bu8|KpWU~$k%j3lah`E3UygSb19Q9WQVe@>@l6>m0S`!XTCe4B&Olc0%M>Kn z8>V9W9YAIfD_vgp0_)u7W>J@NuZ8f>+jRiM=sNJ+__B|iPZz}xYl;YsUcedRsHmkO zF2HK+lLw+B1?e}?cqe};Ak2_snkHZu@KqQ*>ZLmrgElkEA~=eyR|Ye`zWieAAtop; zO;KR=sKKY60p=sZ$h433(rmr(!49ZLJ(h2X1v}RZD z>K)&%+Hcq1ln#%x7`k=A~N!hYc^gmIV0zENU+qsXF+X zN56rAn$J{j?N_5y;~W7FR)gDxQNCs$9I#1F8wbwD4W6``vME8Oi)+215Ui)AAJLop zXDh3X(a>bTrqCsI2ZNVxKr)J#HTL%NZ4^E6*&Vkb%IN-lTbun2zzz`hz@4Q`eXSnw z`TMagi-k!#0IKJkg>!hM|3oKj?jed$+8?15M28QtT2f8*agq6 zS_sK4%+1NHf%$n};dlY|$Ial^e?%HL5v5rGsYfX$I$mCOf?zXOtran*BWk`LeCEC2 zvCS}Gg+w(2Usg@IL0MQ+Ok?UP{V&S!<8Z>Lc z2}1{4iO==&|8dp%Fh`9DmvqP;5@oq};67)J@ThV6CG;>29bK>{xzlYY+k2~_&>o}; z#`_ikDqmjWLz4xGsrl9dt;7k1#)UnBljg*HQBPV-sQ?=qn6k##xddh_!&>rmb2pHi zwm-PKiTtD!lu`ISaUX;iLbyv3`ht_c;t$~JX-Hz1#Fzk^bCxBanj|fXbr_poXE|x@ z!@4L*2OSd%VN=tyyt>M6N9H_^V!^}zng8=M1nUBI{Av!-j0-c1!E9{W{R?3tz)|v@ zw>_bSnK7K!i#Z?iWPKMR{d_{2z%pCyB3<|@Gc+!45HaAS?QHN~{Tc`l#Q9spDqxKz zUaW*K!FO9biv30hcrC7-dtvgp;EXTKEr>bw{g+yv06&3JKMaOqY$=SyQRrEdrB+YGlXyUB&;%$sufPyg;N$ zgRVdfAwwJ_#4Zrkkp0UWrdNMcL(klOgYQGc4-E>V5F?q0^kY ziNI#I)l^AG!L}(s4AW$FLYKI8owmjGBeoh>8ykW$c8Yt}e=8cyDJ#PU;!sbALy}k7 z7c62LhEMCZe*@@yi8OI>tZUnY1VAKuB{?HI@o<68FDp=F)F^ zMUP?_LautLReEjX7ylH7P-5p{HO}tRqCXem9zuPQ(5?SXyf-^3kc|id{O;FLAN_+% zmtTgzwcE6hmgmcu%{%P+oQe+kfV9rN4)p`WUekzPJ0W?gW8~#kFUTcLj9X7Er|o`@ zgPFZE6Z?$=MmJWw3IuU-vG~xQqVaux7Ci`4ChOVoD;5Mk6C=XNP#w|hkISyJ!}rEC zS@G>DcZ1PuCd|gJ%@=P^jRD}nLPOH=;Wx6CF&hk^e_O!e*Syz#Px$p~wbWC+!>54$ z_rztwqXBWOi7~i$yJDW_o}>RV|0%H+TDL{?rBpnoA$b7=RZ=ckW6#YL0`AOF9v^K5 zmurEE@BPQ~Die)W=P(=DZ{V7c&Y|lzIYC>wGf?|7o4wYfyW~V5?V-ma5nc#qgGtP< z9ld+t{j3lcxGO0m2iU%ioti)(;Oc=NNm22cMgpkwAw*`*`N}tP{V6%Y(%glb{>cQK zrv3be#bI@^qf?O^D)LF>gVj*?`eDNh*UduH7lQ&J)^ zEJV?$VUSFJ>hSNWr1z`<1 z`q=K&xMB5$T7i15*x@#xLAJ_gzMI?#06i@y>7^{SGBUGT!iirFA>d7BXoq=%X)WR!=(^7l>xE&e%*efV04niYW#38bvxYkS|D-(o*_wqtBg}w=)pc z#T#c@PR*OCu^u%ug~rIp-B-Q=)$UQb1Vv+n(FOP<$fRciCFxR|qHBQvK4j;qnzSuh zWMd!;L3@u(TLG#v!8FV{Z&Rc$bKAR(K{M@+9r+h&jIW%7YIRaK8l_4J-xEdY=+dx(Qq2U%eG!GLe` zv?l&OcV;nvSq4bF(p;O)7hvyimDhIo%k;F+iML}7Q_R=e86MgsOMigi`D~o+ zK7F#f>dgQg!388moY>akmh+7}{%m+yVt#8aEj=B`GM<3MQJltk%y1oCmtNfZ@W(*F zl4b*@W~M}iJVO}u@FODG_buN2|Xj#x-Zj!h|r!;klu!f6r$jrs33VFgefL?jz5YJynf*cy?PTbkOY)kATl<; z|5xl88O6HzF#xQJq2P6A1p@~!agLnhU^LDzCqowaLs~)uGa@23{_5bMc%LDz*r2nF z4nW@}RWD16u)KCgf7^UCAx5o4K*|#gi!9|F>fHX@uOu;`_<-6D-tK)8n1U81WA;vAu-X_4nD76UdceYnOb0>~ zWh8;=mb$qF1ewC-Bxz-RYiB+eLhaO>d#uBUDun=+lyZARh_uP6xDph?R zsUnh>_`tq8($06gAwOG*Pw*)ICo>$lDB$N;yC#+TSLVCSiC_nh_PxXQ&9{(2xuUU~ zGyS0JfNi%X*f7S0<~SG>76xJXjMC1@J0-{JcpMP~MVbFItk4@peZ0c%5JRX9Q(t2e zyufGy2U9B$JUecrnIbk9XEL3@hjS}$m@?JqVK(a-O77)bxOzdvGidS|HW+`H@6tky zig3f&DOAeti=fW0UvC39GZMU&rzAf-pje9H<{lEuPGe3!Hp3*P!2h%P7`=@)b?_aJjh|j7`9wu`TYcbvCCV(8Zij(S0!B?Evm67lvy6OV#!O<; zBvLgVV$i#g{+T!6yAWlZ!;RaeKQzv$EE{thf&_$=&=aqk1lYqQH8Mq>sK;DS^@>)6 z*QVH@#7mx-JRAyZL$qPo124JUi^7AtR)gl3;TTpaatk4a&eu}+<LlMC!Ah+R9p?5#=5O=1)%lg3g8_+=` z@J#}~@jaQnOpaMLKR?$d?HcG5Oh}jfYpRZG3D2LnufLji7ftwF=58at1zyX(54$z~ zuoZ6whlgBi>VogOt}rjM|1zJZ0PRNv`o0U6Dx6aH-})|FJ~gm@q>|oo0MtSY3T#aq zYz^@mU+*a8H0#Zf7bg5?^$6eOPn9*20!CuKmGY8{C{fyQM}!0skdJj?*1L1_JDl{8 zc_aU06Z7WE#8=rpQhj)O&%5U|*=9KVXB_rKdG;iE3k2-$@-(tErnQer)}!Ys!z`8y z8#f9CYN+`{8K=qNd%1Grb2N#i<33#Uzx*isCuC?Od5WLYXd8^-$dOw*utY>gM#B22 za2_NHs{_Bc*r`O&cWP}E*#FpZpPCnva(gxIeNA`B{5R&N<0kkz03LK5uM5@_3Dt&V zRpRVAFA~Oz&IbrFi}o-v%~O?ay_DXhD$A`AwySQPz=bi(WS4knQUlPzt>QC#Q`X$( zY5Q8DNZ96P(J)uh#8LE9C*gW>@y1X@V?wV7OWVVl(^o;bb%WVr^w7=gfJ0GG=EbKNd zRV=jU&@iU)cK7=akXipM)$t4c%=`1Y!;nm-ZIvo#ac{L_xIS*v@JI^RjF_LgYt&IYbcxVss84@*LVssVG>wB*)o;p%Pf;r$-^ z8#=ay4t3ayI!V~s5_t#Z#&rxR8_VLjOX(pUL>J=Ydf}|a6QP0S4{$0%ARj&rs z(+Zw5>PEEc#?W7MBl`H`0&gz|-pwJnJB7PNbqBdS$LAzx+8%r!jGY~6-ZV~foQu1u z_cVjuVDN$2+5Mq1iNDOA-iQ?PE8nTlpM8r38Cw3WZD9c#h>DXGU9IATi9M59eUu2^ zcUO6T-OJw;4$z?W=9jW_vC1fab5>n~|}QM}5+&Q2ZPz%uk-s(kvCeqRG6Z?O)fN zI<%-n>>Em4I}hd=iC;E0!a0M(QN1%t8{DLvW0Scxxm4Y&Ev;- zUxkvbB5&_;?XVu5mceXW#`O1$D<}~V8MYd_{`g&cyvp4q1*@vge-3i;(S%`mtRG2v zi_0CAtV~#sg-MHx9UzJYm}kc)!02;+<&qUV66P!`R#Z{sRoxrU&2Pc$wz_9WRMo%l zPpDHiw2fasLgs#bO8ID8yLeerR|68lWlQ#8NucS(3o{!#&ew8uN=dr?wUV-EN z9YN<6zsVM8qfF7w%WqMl3wfHqlvk6BX9dXJ*yM6n8`GmCyf<$|A!cKNy-iAQCPnim zhOy*B9+2{~r_S(hRO?#+-kw|`|2Ov|ALe( zB}oR@Y0_efMQd^k%V~vA1_1k@psUV$JRUo&C;Y<9DzI%7sLQl$J8PQ7yZABoo}5&Z za>h44D2}Z)ZPMWb$ydbXTfm`-vn=y@~W)c#_bSyVfz zB}4{cy*ruG=8NY7cwIa^$q^%YsTv~0uB&4C2I#WnBfYlHVoX}ohkx{j7B_Jo}BkBqrOL2`1gZY^nXi7mvc4D<41H)-)N2Z)tA z_ObBcqHcGi;jZ^a=@Y#-<9WH&8mpK3+xNUmnM<1CtN6vZaA}yhX%`U*`^`5Q|AYC> z1pBV?&r8$LHkm3|$(4(4P4;%_wq*}S89E0Z*h^>2GomM|U}|Jw7eVV&vPH*=eErFf zc#D)*fb$@^r>bzmy(X<+AWg#JWBre^F{0>N(0psi27RT)@CRTG)=N9YDo6J8JGpWN zBzk=7=UNXkC5tUkYFx_8*V}|K>Fx;wnQJ_K22veKhe9W9JA==z3L=byfptAeu@=Tm zbRyvtiSx8xo>8?P7*ux5O+3;z+Ywl;(^GM8E%DQgW{0QjgL70=VUyez#?NiRHhO*y6Xw9E~M)0ORo9O#Eqnyn+}AoDt4*yS@S(^S3Lx zD?MZ8xc$ZZKaYzjz%i|ii%T41XrJdh7Z?babGADu0d0Q{qxwgmi~O3?K_LH)Fm`}l zsJ>3cGPO#Y=zxqHAa3-i#|NJUq?WUzF~y)^>z`4$jP_&&7ukl5E3U)GM?6bvn%?FZ zy@nR4R`gJ}-h=?Qv1)d=?X00gd_jl`tciD|4+X(+cz+0L@|Cfrr2P1sv@K{=De_gldu|8Vnu%Hw%UrWwwROpV6D*W87K zYE^7K*<2j8rp>=fq=Xf0v^_dG;Vmw8oN!X>A85BRF_~0&_3AppSJv51-f|HLrb(C!eeM7{{3%a23&7|~eV>B$ z_`a)BokZ-!E?QoK)HQoGZk^{hlX&i^4$ns_$bVCzQ%lQ3s#!*VoD;0endXdD0nCE9 zpF@~RT)U(MkBz92MV(vT*eJK$+zf|wfNx+Y#wv!DE5hej-_YYCT2C4gtV1AA+ESwW zvveJf%yTZQY4Y}$!bJ5-HNN#Wu`4&82CwaZmY4?-6BD1wICuh^!8j4fyu*<|--cRX zJ>y0jq`chNy2bcCVVe9j4SX30u~v(H$)y4H+hc?VErgS)iT0b!ZwMHTELbQSH9umf z-GRFk$Lwa>*lsIC|D+gTf1dV=(MBX~o4l(eiTbCn>EXXK zwgM;}p0MANU1^`Be;7NsdAqkrI`Fp#ldB}Ths5ey3r>+I1;1zdv%Kynv2yTJ4Vu~2 zA9-EDb?&zPTLMM}g8ies1*2tt08tRleX#cx-m1+UVefMA{AO_1ZcrbqiC+8Q=>QK9a$#06Qs!_nndL9d2>$>%`e)0X^;}_X%I+lhb zl7Dc4&n^;!MIV32%0Rn}k~uQb@x|l2r_ckR#Q9LU-k@85A|N%;o# zcQf=EvE2-swhx`2i;W}u8B&v8Ch&bVZmc}G*_g*#=repe=4~D=FP!Xep&0$Zs7Y!TwWoK>ZWLCM5GhENI}IH%$` z8u{|Qy=^?B&#MnyTMqk`xq0$*rTrhcC3vMbYEC;pCc$g=vi#l7$a3qmUj)bk#A5dD zMa6{k#pAftr?2^3UGA5ewz~T>{7FjmuD0=QsJT3ItFD1$J$->x?)q;lgr$&RX6o!mWP{-e1Xav91; zwFMhh0lmEwO}fiWy}vve3@tBPoqKCy&kNT6a8kjfU1(hpo$R>k$n0AQf9^J+Ke*`@ZHT&pv zm#6VA^I=da<}J2KP30%Ci1?>;dWlb;GLxhSpmuV zUjr=$13&{rL;l6lp=XO*?=Z_1mKP_x2O~lFXzh?|pQfwu#)p~QePf9PCMu1kIpu*S zLCq`75C2EMW$E_;)oh2(P=1j(g^Z+8pu7q`4cQtVq-jFV_yC!vj^LR^u)8W^oH8`R_)#W zwc?EO)aZ77Md~2jQj5XF94adm!L6G74&aI*MDGU^$X$4@_Imx#e-{$*Gqu1;ozww$ z!4QL)^pA4l>&dTF{mAwH+Q2N5t&f$Z%jI}dg7+6%p*qd#BmCzyHE91m#$*}=%rSy{ zRQ0!=;yM~K8Qg^B-W`yuM&|WDBG7xDaX)E`KA1_he8ZG7p!8oP*GnW>`x(dtI(cs% zf5!Yjsnx_NdiR~G;gM%N&JyU+N>p1Bz4Y+7)+AOl73%*6|5c|-eW2Y_CJuOlbR}v< z)|hzXW3gS33Gu<-j4jmv3019fXZia%o!{EgnT9cb@9DUPYujZWBL1%z1rDUiQ>8er z_F9*H?qLiJ3;`7dxi|Uafy?bObW{I(DBe0g+2e16xI^NvAP|xF|ArB{l;CoZvw4SY Y!S*L?|8 Date: Fri, 12 Feb 2021 10:07:35 -0800 Subject: [PATCH 28/28] incorporate comments --- docs/knn/jni-library.md | 5 +---- docs/knn/painless-functions.md | 2 +- 2 files changed, 2 insertions(+), 5 deletions(-) diff --git a/docs/knn/jni-library.md b/docs/knn/jni-library.md index 27166ba5..fb818ffc 100644 --- a/docs/knn/jni-library.md +++ b/docs/knn/jni-library.md @@ -7,7 +7,4 @@ has_children: false --- # JNI Library -In order to integrate nmslib's approximate k-NN functionality, which is implemented in C++, into the k-NN plugin, which is implemented in Java, we created a Java Native Interface library. Check out [this wiki](https://en.wikipedia.org/wiki/Java_Native_Interface) to learn more about JNI. This library allows the k-NN plugin to leverage nmslib's functionality. - -## Artifacts -We build and distribute binary library artifacts with Opendistro for Elasticsearch. We build the library binary, RPM and DEB in [this GitHub action](https://github.com/opendistro-for-elasticsearch/k-NN/blob/master/.github/workflows/CD.yml). We use Centos 7 with g++ 4.8.5 to build the DEB, RPM and ZIP. Additionally, in order to provide as much general compatibility as possible, we compile the library without optimized instruction sets enabled. For users that want to get the most out of the library, they should build the library from source in their production environment, so that if their environment has optimized instruction sets, they take advantage of them. The documentation for this can be found [here](https://github.com/opendistro-for-elasticsearch/k-NN#jni-library-artifacts). +In order to integrate nmslib's approximate k-NN functionality, which is implemented in C++, into the k-NN plugin, which is implemented in Java, we created a Java Native Interface library. Check out [this wiki](https://en.wikipedia.org/wiki/Java_Native_Interface) to learn more about JNI. This library allows the k-NN plugin to leverage nmslib's functionality. For more information about how we build the JNI library binary and how to get the most of it in your production environment, see [here](https://github.com/opendistro-for-elasticsearch/k-NN#jni-library-artifacts). diff --git a/docs/knn/painless-functions.md b/docs/knn/painless-functions.md index 15e72742..d03ca95c 100644 --- a/docs/knn/painless-functions.md +++ b/docs/knn/painless-functions.md @@ -34,7 +34,7 @@ GET my-knn-index-2/_search "source": "1.0 + cosineSimilarity(params.query_value, doc[params.field])", "params": { "field": "my_vector", - "query_value": [9.9, 9.9], + "query_value": [9.9, 9.9] } } }