+{
+ "question": "Where do I live?"
+ "context": "My name is John. I live in New York"
+}
+```
+{% include copy-curl.html %}
+
+The response provides the answer based on the context:
+
+```json
+{
+ "inference_results": [
+ {
+ "output": [
+ {
+ "result": "New York"
+ }
+ }
+}
+```
\ No newline at end of file
diff --git a/_ml-commons-plugin/remote-models/blueprints.md b/_ml-commons-plugin/remote-models/blueprints.md
index 57e0e4177b..5cac2f3d3b 100644
--- a/_ml-commons-plugin/remote-models/blueprints.md
+++ b/_ml-commons-plugin/remote-models/blueprints.md
@@ -55,32 +55,41 @@ As an ML developer, you can build connector blueprints for other platforms. Usin
## Configuration parameters
-The following configuration parameters are **required** in order to build a connector blueprint.
-
-| Field | Data type | Description |
-| :--- | :--- | :--- |
-| `name` | String | The name of the connector. |
-| `description` | String | A description of the connector. |
-| `version` | Integer | The version of the connector. |
-| `protocol` | String | The protocol for the connection. For AWS services such as Amazon SageMaker and Amazon Bedrock, use `aws_sigv4`. For all other services, use `http`. |
-| `parameters` | JSON object | The default connector parameters, including `endpoint` and `model`. Any parameters indicated in this field can be overridden by parameters specified in a predict request. |
-| `credential` | JSON object | Defines any credential variables required in order to connect to your chosen endpoint. ML Commons uses **AES/GCM/NoPadding** symmetric encryption to encrypt your credentials. When the connection to the cluster first starts, OpenSearch creates a random 32-byte encryption key that persists in OpenSearch's system index. Therefore, you do not need to manually set the encryption key. |
-| `actions` | JSON array | Defines what actions can run within the connector. If you're an administrator creating a connection, add the [blueprint]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/) for your desired connection. |
-| `backend_roles` | JSON array | A list of OpenSearch backend roles. For more information about setting up backend roles, see [Assigning backend roles to users]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control#assigning-backend-roles-to-users). |
-| `access_mode` | String | Sets the access mode for the model, either `public`, `restricted`, or `private`. Default is `private`. For more information about `access_mode`, see [Model groups]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control#model-groups). |
-| `add_all_backend_roles` | Boolean | When set to `true`, adds all `backend_roles` to the access list, which only a user with admin permissions can adjust. When set to `false`, non-admins can add `backend_roles`. |
-
-The `action` parameter supports the following options.
-
-| Field | Data type | Description |
-| :--- | :--- | :--- |
-| `action_type` | String | Required. Sets the ML Commons API operation to use upon connection. As of OpenSearch 2.9, only `predict` is supported. |
-| `method` | String | Required. Defines the HTTP method for the API call. Supports `POST` and `GET`. |
-| `url` | String | Required. Sets the connection endpoint at which the action occurs. This must match the regex expression for the connection used when [adding trusted endpoints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index#adding-trusted-endpoints). |
-| `headers` | JSON object | Sets the headers used inside the request or response body. Default is `ContentType: application/json`. If your third-party ML tool requires access control, define the required `credential` parameters in the `headers` parameter. |
-| `request_body` | String | Required. Sets the parameters contained inside the request body of the action. The parameters must include `\"inputText\`, which specifies how users of the connector should construct the request payload for the `action_type`. |
-| `pre_process_function` | String | Optional. A built-in or custom Painless script used to preprocess the input data. OpenSearch provides the following built-in preprocess functions that you can call directly:
- `connector.pre_process.cohere.embedding` for [Cohere](https://cohere.com/) embedding models
- `connector.pre_process.openai.embedding` for [OpenAI](https://openai.com/) embedding models
- `connector.pre_process.default.embedding`, which you can use to preprocess documents in neural search requests so that they are in the format that ML Commons can process with the default preprocessor (OpenSearch 2.11 or later). For more information, see [built-in functions](#built-in-pre--and-post-processing-functions). |
-| `post_process_function` | String | Optional. A built-in or custom Painless script used to post-process the model output data. OpenSearch provides the following built-in post-process functions that you can call directly:
- `connector.pre_process.cohere.embedding` for [Cohere text embedding models](https://docs.cohere.com/reference/embed)
- `connector.pre_process.openai.embedding` for [OpenAI text embedding models](https://platform.openai.com/docs/api-reference/embeddings)
- `connector.post_process.default.embedding`, which you can use to post-process documents in the model response so that they are in the format that neural search expects (OpenSearch 2.11 or later). For more information, see [built-in functions](#built-in-pre--and-post-processing-functions). |
+| Field | Data type | Is required | Description |
+|:------------------------|:------------|:------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `name` | String | Yes | The name of the connector. |
+| `description` | String | Yes | A description of the connector. |
+| `version` | Integer | Yes | The version of the connector. |
+| `protocol` | String | Yes | The protocol for the connection. For AWS services such as Amazon SageMaker and Amazon Bedrock, use `aws_sigv4`. For all other services, use `http`. |
+| `parameters` | JSON object | Yes | The default connector parameters, including `endpoint` and `model`. Any parameters indicated in this field can be overridden by parameters specified in a predict request. |
+| `credential` | JSON object | Yes | Defines any credential variables required to connect to your chosen endpoint. ML Commons uses **AES/GCM/NoPadding** symmetric encryption to encrypt your credentials. When the connection to the cluster first starts, OpenSearch creates a random 32-byte encryption key that persists in OpenSearch's system index. Therefore, you do not need to manually set the encryption key. |
+| `actions` | JSON array | Yes | Defines what actions can run within the connector. If you're an administrator creating a connection, add the [blueprint]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/blueprints/) for your desired connection. |
+| `backend_roles` | JSON array | Yes | A list of OpenSearch backend roles. For more information about setting up backend roles, see [Assigning backend roles to users]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control#assigning-backend-roles-to-users). |
+| `access_mode` | String | Yes | Sets the access mode for the model, either `public`, `restricted`, or `private`. Default is `private`. For more information about `access_mode`, see [Model groups]({{site.url}}{{site.baseurl}}/ml-commons-plugin/model-access-control#model-groups). |
+| `add_all_backend_roles` | Boolean | Yes | When set to `true`, adds all `backend_roles` to the access list, which only a user with admin permissions can adjust. When set to `false`, non-admins can add `backend_roles`. |
+| `client_config` | JSON object | No | The client configuration object, which provides settings that control the behavior of the client connections used by the connector. These settings allow you to manage connection limits and timeouts, ensuring efficient and reliable communication. |
+
+
+The `actions` parameter supports the following options.
+
+| Field | Data type | Description |
+|:------------------------|:------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `action_type` | String | Required. Sets the ML Commons API operation to use upon connection. As of OpenSearch 2.9, only `predict` is supported. |
+| `method` | String | Required. Defines the HTTP method for the API call. Supports `POST` and `GET`. |
+| `url` | String | Required. Sets the connection endpoint at which the action occurs. This must match the regex expression for the connection used when [adding trusted endpoints]({{site.url}}{{site.baseurl}}/ml-commons-plugin/remote-models/index#adding-trusted-endpoints). |
+| `headers` | JSON object | Sets the headers used inside the request or response body. Default is `ContentType: application/json`. If your third-party ML tool requires access control, define the required `credential` parameters in the `headers` parameter. |
+| `request_body` | String | Required. Sets the parameters contained in the request body of the action. The parameters must include `\"inputText\`, which specifies how users of the connector should construct the request payload for the `action_type`. |
+| `pre_process_function` | String | Optional. A built-in or custom Painless script used to preprocess the input data. OpenSearch provides the following built-in preprocess functions that you can call directly:
- `connector.pre_process.cohere.embedding` for [Cohere](https://cohere.com/) embedding models
- `connector.pre_process.openai.embedding` for [OpenAI](https://openai.com/) embedding models
- `connector.pre_process.default.embedding`, which you can use to preprocess documents in neural search requests so that they are in the format that ML Commons can process with the default preprocessor (OpenSearch 2.11 or later). For more information, see [Built-in functions](#built-in-pre--and-post-processing-functions). |
+| `post_process_function` | String | Optional. A built-in or custom Painless script used to post-process the model output data. OpenSearch provides the following built-in post-process functions that you can call directly:
- `connector.pre_process.cohere.embedding` for [Cohere text embedding models](https://docs.cohere.com/reference/embed)
- `connector.pre_process.openai.embedding` for [OpenAI text embedding models](https://platform.openai.com/docs/api-reference/embeddings)
- `connector.post_process.default.embedding`, which you can use to post-process documents in the model response so that they are in the format that neural search expects (OpenSearch 2.11 or later). For more information, see [Built-in functions](#built-in-pre--and-post-processing-functions). |
+
+
+The `client_config` parameter supports the following options.
+
+| Field | Data type | Description |
+|:---------------------|:----------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `max_connection` | Integer | The maximum number of concurrent connections that the client can establish with the server. |
+| `connection_timeout` | Integer | The maximum amount of time (in seconds) that the client will wait while trying to establish a connection to the server. A timeout prevents the client from waiting indefinitely and allows it to recover from unreachable network endpoints. |
+| `read_timeout` | Integer | The maximum amount of time (in seconds) that the client will wait for a response from the server after sending a request. Useful when the server is slow to respond or encounters issues while processing a request. |
## Built-in pre- and post-processing functions
diff --git a/_ml-commons-plugin/remote-models/index.md b/_ml-commons-plugin/remote-models/index.md
index 0b9c6d03ed..657d7254be 100644
--- a/_ml-commons-plugin/remote-models/index.md
+++ b/_ml-commons-plugin/remote-models/index.md
@@ -205,7 +205,18 @@ Take note of the returned `model_id` because you’ll need it to deploy the mode
## Step 4: Deploy the model
-To deploy the registered model, provide its model ID from step 3 in the following request:
+Starting with OpenSearch version 2.13, externally hosted models are deployed automatically by default when you send a Predict API request for the first time. To disable automatic deployment for an externally hosted model, set `plugins.ml_commons.model_auto_deploy.enable` to `false`:
+```json
+PUT _cluster/settings
+{
+ "persistent": {
+ "plugins.ml_commons.model_auto_deploy.enable" : "false"
+ }
+}
+```
+{% include copy-curl.html %}
+
+To undeploy the model, use the [Undeploy API]({{site.url}}{{site.baseurl}}/ml-commons-plugin/api/model-apis/undeploy-model/).
```bash
POST /_plugins/_ml/models/cleMb4kBJ1eYAeTMFFg4/_deploy
diff --git a/_query-dsl/minimum-should-match.md b/_query-dsl/minimum-should-match.md
index 9ec65431b1..e2032b8911 100644
--- a/_query-dsl/minimum-should-match.md
+++ b/_query-dsl/minimum-should-match.md
@@ -26,7 +26,7 @@ GET /shakespeare/_search
}
```
-In this example, the query has three optional clauses that are combined with an `OR`, so the document must match either `prince`, `king`, or `star`.
+In this example, the query has three optional clauses that are combined with an `OR`, so the document must match either `prince` and `king`, or `prince` and `star`, or `king` and `star`.
## Valid values
@@ -448,4 +448,4 @@ The results contain only four documents that match at least one of the optional
]
}
}
-```
\ No newline at end of file
+```
diff --git a/_search-plugins/caching/index.md b/_search-plugins/caching/index.md
new file mode 100644
index 0000000000..4d0173fdc7
--- /dev/null
+++ b/_search-plugins/caching/index.md
@@ -0,0 +1,32 @@
+---
+layout: default
+title: Caching
+parent: Improving search performance
+has_children: true
+nav_order: 100
+---
+
+# Caching
+
+OpenSearch relies heavily on different on-heap cache types to accelerate data retrieval, providing significant improvement in search latencies. However, cache size is limited by the amount of memory available on a node. If you are processing a larger dataset that can potentially be cached, the cache size limit causes a lot of cache evictions and misses. The increasing number of evictions impacts performance because OpenSearch needs to process the query again, causing high resource consumption.
+
+Prior to version 2.13, OpenSearch supported the following on-heap cache types:
+
+- **Request cache**: Caches the local results on each shard. This allows frequently used (and potentially resource-heavy) search requests to return results almost instantly.
+- **Query cache**: The shard-level query cache caches common data from similar queries. The query cache is more granular than the request cache and can cache data that is reused in different queries.
+- **Field data cache**: The field data cache contains field data and global ordinals, which are both used to support aggregations on certain field types.
+
+## Additional cache stores
+**Introduced 2.13**
+{: .label .label-purple }
+
+This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024).
+{: .warning}
+
+In addition to existing OpenSearch custom on-heap cache stores, cache plugins provide the following cache stores:
+
+- **Disk cache**: This cache stores the precomputed result of a query on disk. You can use a disk cache to cache much larger datasets, provided that the disk latencies are acceptable.
+- **Tiered cache**: This is a multi-level cache, in which each tier has its own characteristics and performance levels. For example, a tiered cache can contain on-heap and disk tiers. By combining different tiers, you can achieve a balance between cache performance and size. To learn more, see [Tiered cache]({{site.url}}{{site.baseurl}}/search-plugins/caching/tiered-cache/).
+
+In OpenSearch 2.13, the request cache is integrated with cache plugins. You can use a tiered or disk cache as a request-level cache.
+{: .note}
\ No newline at end of file
diff --git a/_search-plugins/caching/tiered-cache.md b/_search-plugins/caching/tiered-cache.md
new file mode 100644
index 0000000000..3842ebe5a9
--- /dev/null
+++ b/_search-plugins/caching/tiered-cache.md
@@ -0,0 +1,82 @@
+---
+layout: default
+title: Tiered cache
+parent: Caching
+grand_parent: Improving search performance
+nav_order: 10
+---
+
+# Tiered cache
+
+This is an experimental feature and is not recommended for use in a production environment. For updates on the progress of the feature or if you want to leave feedback, see the associated [GitHub issue](https://github.com/opensearch-project/OpenSearch/issues/10024).
+{: .warning}
+
+A tiered cache is a multi-level cache, in which each tier has its own characteristics and performance levels. By combining different tiers, you can achieve a balance between cache performance and size.
+
+## Types of tiered caches
+
+OpenSearch 2.13 provides an implementation of _tiered spillover cache_. This implementation spills the evicted items from upper to lower tiers. The upper tier is smaller in size but offers better latency, like the on-heap tier. The lower tier is larger in size but is slower in terms of latency compared to the upper tier. A disk cache is an example of a lower tier. OpenSearch 2.13 offers on-heap and disk tiers.
+
+## Enabling a tiered cache
+
+To enable a tiered cache, configure the following setting:
+
+```yaml
+opensearch.experimental.feature.pluggable.caching.enabled: true
+```
+{% include copy.html %}
+
+For more information about ways to enable experimental features, see [Experimental feature flags]({{site.url}}{{site.baseurl}}/install-and-configure/configuring-opensearch/experimental/).
+
+## Installing required plugins
+
+A tiered cache provides a way to plug in any disk or on-heap tier implementation. You can install the plugins you intend to use in the tiered cache. As of OpenSearch 2.13, the available cache plugin is the `cache-ehcache` plugin. This plugin provides a disk cache implementation to use within a tiered cache as a disk tier.
+
+A tiered cache will fail to initialize if the `cache-ehcache` plugin is not installed or disk cache properties are not set.
+{: .warning}
+
+## Tiered cache settings
+
+In OpenSearch 2.13, a request cache can use a tiered cache. To begin, configure the following settings in the `opensearch.yml` file.
+
+### Cache store name
+
+Set the cache store name to `tiered_spillover` to use the OpenSearch-provided tiered spillover cache implementation:
+
+```yaml
+indices.request.cache.store.name: tiered_spillover: true
+```
+{% include copy.html %}
+
+### Setting on-heap and disk store tiers
+
+The `opensearch_onheap` setting is the built-in on-heap cache available in OpenSearch. The `ehcache_disk` setting is the disk cache implementation from [Ehcache](https://www.ehcache.org/). This requires installing the `cache-ehcache` plugin:
+
+```yaml
+indices.request.cache.tiered_spillover.onheap.store.name: opensearch_onheap
+indices.request.cache.tiered_spillover.disk.store.name: ehcache_disk
+```
+{% include copy.html %}
+
+For more information about installing non-bundled plugins, see [Additional plugins]({{site.url}}{{site.baseurl}}/install-and-configure/plugins/#additional-plugins).
+
+### Configuring on-heap and disk stores
+
+The following table lists the cache store settings for the `opensearch_onheap` store.
+
+Setting | Default | Description
+:--- | :--- | :---
+`indices.request.cache.opensearch_onheap.size` | 1% of the heap | The size of the on-heap cache. Optional.
+`indices.request.cache.opensearch_onheap.expire` | `MAX_VALUE` (disabled) | Specify a time-to-live (TTL) for the cached results. Optional.
+
+The following table lists the disk cache store settings for the `ehcache_disk` store.
+
+Setting | Default | Description
+:--- | :--- | :---
+`indices.request.cache.ehcache_disk.max_size_in_bytes` | `1073741824` (1 GB) | Defines the size of the disk cache. Optional.
+`indices.request.cache.ehcache_disk.storage.path` | `""` | Defines the storage path for the disk cache. Required.
+`indices.request.cache.ehcache_disk.expire_after_access` | `MAX_VALUE` (disabled) | Specify a time-to-live (TTL) for the cached results. Optional.
+`indices.request.cache.ehcache_disk.alias` | `ehcacheDiskCache#INDICES_REQUEST_CACHE` (this is an example of request cache) | Specify an alias for the disk cache. Optional.
+`indices.request.cache.ehcache_disk.segments` | `16` | Defines the number of segments the disk cache is separated into. Used for concurrency. Optional.
+`indices.request.cache.ehcache_disk.concurrency` | `1` | Defines the number of distinct write queues created for the disk store, where a group of segments share a write queue. Optional.
+
diff --git a/_search-plugins/concurrent-segment-search.md b/_search-plugins/concurrent-segment-search.md
index 58b8d9a8ce..0bb7657937 100644
--- a/_search-plugins/concurrent-segment-search.md
+++ b/_search-plugins/concurrent-segment-search.md
@@ -27,7 +27,7 @@ By default, concurrent segment search is disabled on the cluster. You can enable
- Cluster level
- Index level
-The index-level setting takes priority over the cluster-level setting. Thus, if the cluster setting is enabled but the index setting is disabled, then concurrent segment search will be disabled for that index.
+The index-level setting takes priority over the cluster-level setting. Thus, if the cluster setting is enabled but the index setting is disabled, then concurrent segment search will be disabled for that index. Because of this, the index-level setting is not evaluated unless it is explicitly set, regardless of the default value configured for the setting. You can retrieve the current value of the index-level setting by calling the [Index Settings API]({{site.url}}{{site.baseurl}}/api-reference/index-apis/get-settings/) and omitting the `?include_defaults` query parameter.
{: .note}
To enable concurrent segment search for all indexes in the cluster, set the following dynamic cluster setting:
diff --git a/_search-plugins/hybrid-search.md b/_search-plugins/hybrid-search.md
index ebd014b0de..b0fb4d5bef 100644
--- a/_search-plugins/hybrid-search.md
+++ b/_search-plugins/hybrid-search.md
@@ -146,7 +146,9 @@ PUT /_search/pipeline/nlp-search-pipeline
To perform hybrid search on your index, use the [`hybrid` query]({{site.url}}{{site.baseurl}}/query-dsl/compound/hybrid/), which combines the results of keyword and semantic search.
-The following example request combines two query clauses---a neural query and a `match` query. It specifies the search pipeline created in the previous step as a query parameter:
+#### Example: Combining a neural query and a match query
+
+The following example request combines two query clauses---a `neural` query and a `match` query. It specifies the search pipeline created in the previous step as a query parameter:
```json
GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
@@ -161,7 +163,7 @@ GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
"queries": [
{
"match": {
- "text": {
+ "passage_text": {
"query": "Hi world"
}
}
@@ -216,3 +218,355 @@ The response contains the matching document:
}
}
```
+{% include copy-curl.html %}
+
+#### Example: Combining a match query and a term query
+
+The following example request combines two query clauses---a `match` query and a `term` query. It specifies the search pipeline created in the previous step as a query parameter:
+
+```json
+GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
+{
+ "_source": {
+ "exclude": [
+ "passage_embedding"
+ ]
+ },
+ "query": {
+ "hybrid": {
+ "queries": [
+ {
+ "match":{
+ "passage_text": "hello"
+ }
+ },
+ {
+ "term":{
+ "passage_text":{
+ "value":"planet"
+ }
+ }
+ }
+ ]
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+The response contains the matching documents:
+
+```json
+{
+ "took": 11,
+ "timed_out": false,
+ "_shards": {
+ "total": 2,
+ "successful": 2,
+ "skipped": 0,
+ "failed": 0
+ },
+ "hits": {
+ "total": {
+ "value": 2,
+ "relation": "eq"
+ },
+ "max_score": 0.7,
+ "hits": [
+ {
+ "_index": "my-nlp-index",
+ "_id": "2",
+ "_score": 0.7,
+ "_source": {
+ "id": "s2",
+ "passage_text": "Hi planet"
+ }
+ },
+ {
+ "_index": "my-nlp-index",
+ "_id": "1",
+ "_score": 0.3,
+ "_source": {
+ "id": "s1",
+ "passage_text": "Hello world"
+ }
+ }
+ ]
+ }
+}
+```
+{% include copy-curl.html %}
+
+## Hybrid search with post-filtering
+**Introduced 2.13**
+{: .label .label-purple }
+
+You can perform post-filtering on hybrid search results by providing the `post_filter` parameter in your query.
+
+The `post_filter` clause is applied after the search results have been retrieved. Post-filtering is useful for applying additional filters to the search results without impacting the scoring or the order of the results.
+
+Post-filtering does not impact document relevance scores or aggregation results.
+{: .note}
+
+#### Example: Post-filtering
+
+The following example request combines two query clauses---a `term` query and a `match` query. This is the same query as in the [preceding example](#example-combining-a-match-query-and-a-term-query), but it contains a `post_filter`:
+
+```json
+GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
+{
+ "query": {
+ "hybrid":{
+ "queries":[
+ {
+ "match":{
+ "passage_text": "hello"
+ }
+ },
+ {
+ "term":{
+ "passage_text":{
+ "value":"planet"
+ }
+ }
+ }
+ ]
+ }
+
+ },
+ "post_filter":{
+ "match": { "passage_text": "world" }
+ }
+}
+
+```
+{% include copy-curl.html %}
+
+Compare the results to the results without post-filtering in the [preceding example](#example-combining-a-match-query-and-a-term-query). Unlike the preceding example response, which contains two documents, the response in this example contains one document because the second document is filtered using post-filtering:
+
+```json
+{
+ "took": 18,
+ "timed_out": false,
+ "_shards": {
+ "total": 2,
+ "successful": 2,
+ "skipped": 0,
+ "failed": 0
+ },
+ "hits": {
+ "total": {
+ "value": 1,
+ "relation": "eq"
+ },
+ "max_score": 0.3,
+ "hits": [
+ {
+ "_index": "my-nlp-index",
+ "_id": "1",
+ "_score": 0.3,
+ "_source": {
+ "id": "s1",
+ "passage_text": "Hello world"
+ }
+ }
+ ]
+ }
+}
+```
+
+
+## Combining hybrid search and aggregations
+**Introduced 2.13**
+{: .label .label-purple }
+
+You can enhance search results by combining a hybrid query clause with any aggregation that OpenSearch supports. Aggregations allow you to use OpenSearch as an analytics engine. For more information about aggregations, see [Aggregations]({{site.url}}{{site.baseurl}}/aggregations/).
+
+Most aggregations are performed on the subset of documents that is returned by a hybrid query. The only aggregation that operates on all documents is the [`global`]({{site.url}}{{site.baseurl}}/aggregations/bucket/global/) aggregation.
+
+To use aggregations with a hybrid query, first create an index. Aggregations are typically used on fields of special types, like `keyword` or `integer`. The following example creates an index with several such fields:
+
+```json
+PUT /my-nlp-index
+{
+ "settings": {
+ "number_of_shards": 2
+ },
+ "mappings": {
+ "properties": {
+ "doc_index": {
+ "type": "integer"
+ },
+ "doc_keyword": {
+ "type": "keyword"
+ },
+ "category": {
+ "type": "keyword"
+ }
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+The following request ingests six documents into your new index:
+
+```json
+POST /_bulk
+{ "index": { "_index": "my-nlp-index" } }
+{ "category": "permission", "doc_keyword": "workable", "doc_index": 4976, "doc_price": 100}
+{ "index": { "_index": "my-nlp-index" } }
+{ "category": "sister", "doc_keyword": "angry", "doc_index": 2231, "doc_price": 200 }
+{ "index": { "_index": "my-nlp-index" } }
+{ "category": "hair", "doc_keyword": "likeable", "doc_price": 25 }
+{ "index": { "_index": "my-nlp-index" } }
+{ "category": "editor", "doc_index": 9871, "doc_price": 30 }
+{ "index": { "_index": "my-nlp-index" } }
+{ "category": "statement", "doc_keyword": "entire", "doc_index": 8242, "doc_price": 350 }
+{ "index": { "_index": "my-nlp-index" } }
+{ "category": "statement", "doc_keyword": "idea", "doc_index": 5212, "doc_price": 200 }
+{ "index": { "_index": "index-test" } }
+{ "category": "editor", "doc_keyword": "bubble", "doc_index": 1298, "doc_price": 130 }
+{ "index": { "_index": "index-test" } }
+{ "category": "editor", "doc_keyword": "bubble", "doc_index": 521, "doc_price": 75 }
+```
+{% include copy-curl.html %}
+
+Now you can combine a hybrid query clause with a `min` aggregation:
+
+```json
+GET /my-nlp-index/_search?search_pipeline=nlp-search-pipeline
+{
+ "query": {
+ "hybrid": {
+ "queries": [
+ {
+ "term": {
+ "category": "permission"
+ }
+ },
+ {
+ "bool": {
+ "should": [
+ {
+ "term": {
+ "category": "editor"
+ }
+ },
+ {
+ "term": {
+ "category": "statement"
+ }
+ }
+ ]
+ }
+ }
+ ]
+ }
+ },
+ "aggs": {
+ "total_price": {
+ "sum": {
+ "field": "doc_price"
+ }
+ },
+ "keywords": {
+ "terms": {
+ "field": "doc_keyword",
+ "size": 10
+ }
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+The response contains the matching documents and the aggregation results:
+
+```json
+{
+ "took": 9,
+ "timed_out": false,
+ "_shards": {
+ "total": 2,
+ "successful": 2,
+ "skipped": 0,
+ "failed": 0
+ },
+ "hits": {
+ "total": {
+ "value": 4,
+ "relation": "eq"
+ },
+ "max_score": 0.5,
+ "hits": [
+ {
+ "_index": "my-nlp-index",
+ "_id": "mHRPNY4BlN82W_Ar9UMY",
+ "_score": 0.5,
+ "_source": {
+ "doc_price": 100,
+ "doc_index": 4976,
+ "doc_keyword": "workable",
+ "category": "permission"
+ }
+ },
+ {
+ "_index": "my-nlp-index",
+ "_id": "m3RPNY4BlN82W_Ar9UMY",
+ "_score": 0.5,
+ "_source": {
+ "doc_price": 30,
+ "doc_index": 9871,
+ "category": "editor"
+ }
+ },
+ {
+ "_index": "my-nlp-index",
+ "_id": "nXRPNY4BlN82W_Ar9UMY",
+ "_score": 0.5,
+ "_source": {
+ "doc_price": 200,
+ "doc_index": 5212,
+ "doc_keyword": "idea",
+ "category": "statement"
+ }
+ },
+ {
+ "_index": "my-nlp-index",
+ "_id": "nHRPNY4BlN82W_Ar9UMY",
+ "_score": 0.5,
+ "_source": {
+ "doc_price": 350,
+ "doc_index": 8242,
+ "doc_keyword": "entire",
+ "category": "statement"
+ }
+ }
+ ]
+ },
+ "aggregations": {
+ "total_price": {
+ "value": 680
+ },
+ "doc_keywords": {
+ "doc_count_error_upper_bound": 0,
+ "sum_other_doc_count": 0,
+ "buckets": [
+ {
+ "key": "entire",
+ "doc_count": 1
+ },
+ {
+ "key": "idea",
+ "doc_count": 1
+ },
+ {
+ "key": "workable",
+ "doc_count": 1
+ }
+ ]
+ }
+ }
+}
+```
\ No newline at end of file
diff --git a/_search-plugins/knn/approximate-knn.md b/_search-plugins/knn/approximate-knn.md
index 74cf7e39f5..16d1a7e686 100644
--- a/_search-plugins/knn/approximate-knn.md
+++ b/_search-plugins/knn/approximate-knn.md
@@ -287,9 +287,15 @@ Not every method supports each of these spaces. Be sure to check out [the method
nmslib and faiss:\[ score = {1 \over 1 + d } \] Lucene:\[ score = {2 - d \over 2}\] |
- innerproduct (not supported for Lucene) |
- \[ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} · \mathbf{y}} = - \sum_{i=1}^n x_i y_i \] |
- \[ \text{If} d \ge 0, \] \[score = {1 \over 1 + d }\] \[\text{If} d < 0, score = −d + 1\] |
+ innerproduct (supported for Lucene in OpenSearch version 2.13 and later) |
+ \[ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} · \mathbf{y}} = - \sum_{i=1}^n x_i y_i \]
+ Lucene:
+ \[ d(\mathbf{x}, \mathbf{y}) = {\mathbf{x} · \mathbf{y}} = \sum_{i=1}^n x_i y_i \]
+ |
+ \[ \text{If} d \ge 0, \] \[score = {1 \over 1 + d }\] \[\text{If} d < 0, score = −d + 1\]
+ Lucene:
+ \[ \text{If} d > 0, score = d + 1 \] \[\text{If} d \le 0\] \[score = {1 \over 1 + (-1 · d) }\]
+ |
@@ -297,3 +303,8 @@ The cosine similarity formula does not include the `1 -` prefix. However, becaus
smaller scores with closer results, they return `1 - cosineSimilarity` for cosine similarity space---that's why `1 -` is
included in the distance function.
{: .note }
+
+With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...]`) as input. This is because the magnitude of
+such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests
+containing the zero vector will be rejected and a corresponding exception will be thrown.
+{: .note }
\ No newline at end of file
diff --git a/_search-plugins/knn/knn-index.md b/_search-plugins/knn/knn-index.md
index 4a527f3bcb..1e0c2e84f5 100644
--- a/_search-plugins/knn/knn-index.md
+++ b/_search-plugins/knn/knn-index.md
@@ -17,7 +17,7 @@ Starting with k-NN plugin version 2.9, you can use `byte` vectors with the `luce
## Method definitions
-A method definition refers to the underlying configuration of the Approximate k-NN algorithm you want to use. Method definitions are used to either create a `knn_vector` field (when the method does not require training) or [create a model during training]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model) that can then be used to [create a `knn_vector` field]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).
+A method definition refers to the underlying configuration of the approximate k-NN algorithm you want to use. Method definitions are used to either create a `knn_vector` field (when the method does not require training) or [create a model during training]({{site.url}}{{site.baseurl}}/search-plugins/knn/api#train-model) that can then be used to [create a `knn_vector` field]({{site.url}}{{site.baseurl}}/search-plugins/knn/approximate-knn/#building-a-k-nn-index-from-a-model).
A method definition will always contain the name of the method, the space_type the method is built for, the engine
(the library) to use, and a map of parameters.
@@ -33,7 +33,7 @@ Mapping parameter | Required | Default | Updatable | Description
Method name | Requires training | Supported spaces | Description
:--- | :--- | :--- | :---
-`hnsw` | false | l2, innerproduct, cosinesimil, l1, linf | Hierarchical proximity graph approach to Approximate k-NN search. For more details on the algorithm, see this [abstract](https://arxiv.org/abs/1603.09320).
+`hnsw` | false | l2, innerproduct, cosinesimil, l1, linf | Hierarchical proximity graph approach to approximate k-NN search. For more details on the algorithm, see this [abstract](https://arxiv.org/abs/1603.09320).
#### HNSW parameters
@@ -52,7 +52,7 @@ An index created in OpenSearch version 2.11 or earlier will still use the old `e
Method name | Requires training | Supported spaces | Description
:--- | :--- | :--- | :---
-`hnsw` | false | l2, innerproduct | Hierarchical proximity graph approach to Approximate k-NN search.
+`hnsw` | false | l2, innerproduct | Hierarchical proximity graph approach to approximate k-NN search.
`ivf` | true | l2, innerproduct | Bucketing approach where vectors are assigned different buckets based on clustering and, during search, only a subset of the buckets is searched.
For hnsw, "innerproduct" is not available when PQ is used.
@@ -90,8 +90,8 @@ Training data can be composed of either the same data that is going to be ingest
### Supported Lucene methods
Method name | Requires training | Supported spaces | Description
-:--- | :--- | :--- | :---
-`hnsw` | false | l2, cosinesimil | Hierarchical proximity graph approach to Approximate k-NN search.
+:--- | :--- |:--------------------------------------------------------------------------------| :---
+`hnsw` | false | l2, cosinesimil, innerproduct (supported in OpenSearch 2.13 and later) | Hierarchical proximity graph approach to approximate k-NN search.
#### HNSW parameters
@@ -259,7 +259,7 @@ At the moment, several parameters defined in the settings are in the deprecation
Setting | Default | Updatable | Description
:--- | :--- | :--- | :---
-`index.knn` | false | false | Whether the index should build native library indexes for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but Approximate k-NN search functionality will be disabled.
+`index.knn` | false | false | Whether the index should build native library indexes for the `knn_vector` fields. If set to false, the `knn_vector` fields will be stored in doc values, but approximate k-NN search functionality will be disabled.
`index.knn.algo_param.ef_search` | 100 | true | The size of the dynamic list used during k-NN searches. Higher values result in more accurate but slower searches. Only available for NMSLIB.
`index.knn.algo_param.ef_construction` | 100 | false | Deprecated in 1.0.0. Instead, use the [mapping parameters](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#method-definitions) to set this value.
`index.knn.algo_param.m` | 16 | false | Deprecated in 1.0.0. Use the [mapping parameters](https://opensearch.org/docs/latest/search-plugins/knn/knn-index/#method-definitions) to set this value instead.
diff --git a/_search-plugins/knn/knn-score-script.md b/_search-plugins/knn/knn-score-script.md
index 602346803d..cc79e90850 100644
--- a/_search-plugins/knn/knn-score-script.md
+++ b/_search-plugins/knn/knn-score-script.md
@@ -313,9 +313,11 @@ A space corresponds to the function used to measure the distance between two poi
\[ score = 2 - d \] |
- innerproduct (not supported for Lucene) |
- \[ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} · \mathbf{y}} = - \sum_{i=1}^n x_i y_i \] |
- \[ \text{If} d \ge 0, \] \[score = {1 \over 1 + d }\] \[\text{If} d < 0, score = −d + 1\] |
+ innerproduct (supported for Lucene in OpenSearch version 2.13 and later) |
+ \[ d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} · \mathbf{y}} = - \sum_{i=1}^n x_i y_i \]
+ |
+ \[ \text{If} d \ge 0, \] \[score = {1 \over 1 + d }\] \[\text{If} d < 0, score = −d + 1\]
+ |
hammingbit |
@@ -326,3 +328,8 @@ A space corresponds to the function used to measure the distance between two poi
Cosine similarity returns a number between -1 and 1, and because OpenSearch relevance scores can't be below 0, the k-NN plugin adds 1 to get the final score.
+
+With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...`]) as input. This is because the magnitude of
+such a vector is 0, which raises a `divide by 0` exception in the corresponding formula. Requests
+containing the zero vector will be rejected and a corresponding exception will be thrown.
+{: .note }
\ No newline at end of file
diff --git a/_search-plugins/knn/painless-functions.md b/_search-plugins/knn/painless-functions.md
index 2b28f753ef..1f27cc29a6 100644
--- a/_search-plugins/knn/painless-functions.md
+++ b/_search-plugins/knn/painless-functions.md
@@ -67,3 +67,8 @@ cosineSimilarity | `float cosineSimilarity (float[] queryVector, doc['vector fie
```
Because scores can only be positive, this script ranks documents with vector fields higher than those without.
+
+With cosine similarity, it is not valid to pass a zero vector (`[0, 0, ...`]) as input. This is because the magnitude of
+such a vector is 0, which raises a `divide by 0` exception when computing the value. Requests
+containing the zero vector will be rejected and a corresponding exception will be thrown.
+{: .note }
\ No newline at end of file
diff --git a/_search-plugins/neural-sparse-search.md b/_search-plugins/neural-sparse-search.md
index 31ae43991e..88d30e4391 100644
--- a/_search-plugins/neural-sparse-search.md
+++ b/_search-plugins/neural-sparse-search.md
@@ -55,6 +55,8 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline-sparse
```
{% include copy-curl.html %}
+To split long text into passages, use the `text_chunking` ingest processor before the `sparse_encoding` processor. For more information, see [Chaining text chunking and embedding processors]({{site.url}}{{site.baseurl}}/ingest-pipelines/processors/text-chunking/#chaining-text-chunking-and-embedding-processors).
+
## Step 2: Create an index for ingestion
In order to use the text embedding processor defined in your pipeline, create a rank features index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as [`rank_features`]({{site.url}}{{site.baseurl}}/field-types/supported-field-types/rank/#rank-features). Similarly, the `passage_text` field should be mapped as `text`.
@@ -237,3 +239,129 @@ The response contains the matching documents:
}
}
```
+
+## Setting a default model on an index or field
+
+A [`neural_sparse`]({{site.url}}{{site.baseurl}}/query-dsl/specialized/neural-sparse/) query requires a model ID for generating sparse embeddings. To eliminate passing the model ID with each neural_sparse query request, you can set a default model on index-level or field-level.
+
+First, create a [search pipeline]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/index/) with a [`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) request processor. To set a default model for an index, provide the model ID in the `default_model_id` parameter. To set a default model for a specific field, provide the field name and the corresponding model ID in the `neural_field_default_id` map. If you provide both `default_model_id` and `neural_field_default_id`, `neural_field_default_id` takes precedence:
+
+```json
+PUT /_search/pipeline/default_model_pipeline
+{
+ "request_processors": [
+ {
+ "neural_query_enricher" : {
+ "default_model_id": "bQ1J8ooBpBj3wT4HVUsb",
+ "neural_field_default_id": {
+ "my_field_1": "uZj0qYoBMtvQlfhaYeud",
+ "my_field_2": "upj0qYoBMtvQlfhaZOuM"
+ }
+ }
+ }
+ ]
+}
+```
+{% include copy-curl.html %}
+
+Then set the default model for your index:
+
+```json
+PUT /my-nlp-index/_settings
+{
+ "index.search.default_pipeline" : "default_model_pipeline"
+}
+```
+{% include copy-curl.html %}
+
+You can now omit the model ID when searching:
+
+```json
+GET /my-nlp-index/_search
+{
+ "query": {
+ "neural_sparse": {
+ "passage_embedding": {
+ "query_text": "Hi world"
+ }
+ }
+ }
+}
+```
+{% include copy-curl.html %}
+
+The response contains both documents:
+
+```json
+{
+ "took" : 688,
+ "timed_out" : false,
+ "_shards" : {
+ "total" : 1,
+ "successful" : 1,
+ "skipped" : 0,
+ "failed" : 0
+ },
+ "hits" : {
+ "total" : {
+ "value" : 2,
+ "relation" : "eq"
+ },
+ "max_score" : 30.0029,
+ "hits" : [
+ {
+ "_index" : "my-nlp-index",
+ "_id" : "1",
+ "_score" : 30.0029,
+ "_source" : {
+ "passage_text" : "Hello world",
+ "passage_embedding" : {
+ "!" : 0.8708904,
+ "door" : 0.8587369,
+ "hi" : 2.3929274,
+ "worlds" : 2.7839446,
+ "yes" : 0.75845814,
+ "##world" : 2.5432441,
+ "born" : 0.2682308,
+ "nothing" : 0.8625516,
+ "goodbye" : 0.17146169,
+ "greeting" : 0.96817183,
+ "birth" : 1.2788506,
+ "come" : 0.1623208,
+ "global" : 0.4371151,
+ "it" : 0.42951578,
+ "life" : 1.5750692,
+ "thanks" : 0.26481047,
+ "world" : 4.7300377,
+ "tiny" : 0.5462298,
+ "earth" : 2.6555297,
+ "universe" : 2.0308156,
+ "worldwide" : 1.3903781,
+ "hello" : 6.696973,
+ "so" : 0.20279501,
+ "?" : 0.67785245
+ },
+ "id" : "s1"
+ }
+ },
+ {
+ "_index" : "my-nlp-index",
+ "_id" : "2",
+ "_score" : 16.480486,
+ "_source" : {
+ "passage_text" : "Hi planet",
+ "passage_embedding" : {
+ "hi" : 4.338913,
+ "planets" : 2.7755864,
+ "planet" : 5.0969057,
+ "mars" : 1.7405145,
+ "earth" : 2.6087382,
+ "hello" : 3.3210192
+ },
+ "id" : "s2"
+ }
+ }
+ ]
+ }
+}
+```
\ No newline at end of file
diff --git a/_search-plugins/search-pipelines/search-processors.md b/_search-plugins/search-pipelines/search-processors.md
index 36b848e6eb..5e53cf5615 100644
--- a/_search-plugins/search-pipelines/search-processors.md
+++ b/_search-plugins/search-pipelines/search-processors.md
@@ -24,7 +24,7 @@ The following table lists all supported search request processors.
Processor | Description | Earliest available version
:--- | :--- | :---
[`filter_query`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/filter-query-processor/) | Adds a filtering query that is used to filter requests. | 2.8
-[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search at the index or field level. | 2.11
+[`neural_query_enricher`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/neural-query-enricher/) | Sets a default model for neural search and neural sparse search at the index or field level. | 2.11(neural), 2.13(neural sparse)
[`script`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/script-processor/) | Adds a script that is run on newly indexed documents. | 2.8
[`oversample`]({{site.url}}{{site.baseurl}}/search-plugins/search-pipelines/oversample-processor/) | Increases the search request `size` parameter, storing the original value in the pipeline state. | 2.12
diff --git a/_search-plugins/semantic-search.md b/_search-plugins/semantic-search.md
index f4753bee1c..32bd18cd6c 100644
--- a/_search-plugins/semantic-search.md
+++ b/_search-plugins/semantic-search.md
@@ -48,6 +48,8 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline
```
{% include copy-curl.html %}
+To split long text into passages, use the `text_chunking` ingest processor before the `text_embedding` processor. For more information, see [Chaining text chunking and embedding processors]({{site.url}}{{site.baseurl}}/ingest-pipelines/processors/text-chunking/#chaining-text-chunking-and-embedding-processors).
+
## Step 2: Create an index for ingestion
In order to use the text embedding processor defined in your pipeline, create a k-NN index, adding the pipeline created in the previous step as the default pipeline. Ensure that the fields defined in the `field_map` are mapped as correct types. Continuing with the example, the `passage_embedding` field must be mapped as a k-NN vector with a dimension that matches the model dimension. Similarly, the `passage_text` field should be mapped as `text`.
diff --git a/_search-plugins/sql/ppl/index.md b/_search-plugins/sql/ppl/index.md
index c39e3429e1..56ffebf555 100644
--- a/_search-plugins/sql/ppl/index.md
+++ b/_search-plugins/sql/ppl/index.md
@@ -12,6 +12,8 @@ redirect_from:
- /search-plugins/ppl/index/
- /search-plugins/ppl/endpoint/
- /search-plugins/ppl/protocol/
+ - /search-plugins/sql/ppl/index/
+ - /observability-plugin/ppl/index/
---
# PPL
diff --git a/_security/access-control/anonymous-authentication.md b/_security/access-control/anonymous-authentication.md
index 429daafb9b..cb2f951546 100644
--- a/_security/access-control/anonymous-authentication.md
+++ b/_security/access-control/anonymous-authentication.md
@@ -30,6 +30,19 @@ The following table describes the `anonymous_auth_enabled` setting. For more inf
If you disable anonymous authentication, you must provide at least one `authc` in order for the Security plugin to initialize successfully.
{: .important }
+## OpenSearch Dashboards configuration
+
+To enable anonymous authentication for OpenSearch Dashboards, you need to modify the `opensearch_dashboards.yml` file in the configuration directory of your OpenSearch Dashboards installation.
+
+Add the following setting to `opensearch_dashboards.yml`:
+
+```yml
+opensearch_security.auth.anonymous_auth_enabled: true
+```
+
+Anonymous login for OpenSearch Dashboards requires anonymous authentication to be enabled on the OpenSearch cluster.
+{: .important}
+
## Defining anonymous authentication privileges
When anonymous authentication is enabled, your defined HTTP authenticators still try to find user credentials inside your HTTP request. If credentials are found, the user is authenticated. If none are found, the user is authenticated as an `anonymous` user.
diff --git a/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md b/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md
index 3eb40fe2ed..7cc533fe76 100644
--- a/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md
+++ b/_tuning-your-cluster/availability-and-recovery/remote-store/remote-cluster-state.md
@@ -24,8 +24,12 @@ _Cluster state_ is an internal data structure that contains the metadata of the
The cluster state metadata is managed by the elected cluster manager node and is essential for the cluster to properly function. When the cluster loses the majority of the cluster manager nodes permanently, then the cluster may experience data loss because the latest cluster state metadata might not be present in the surviving cluster manager nodes. Persisting the state of all the cluster manager nodes in the cluster to remote-backed storage provides better durability.
When the remote cluster state feature is enabled, the cluster metadata will be published to a remote repository configured in the cluster.
-Any time new cluster manager nodes are launched after disaster recovery, the nodes will automatically bootstrap using the latest metadata stored in the remote repository.
-After the metadata is restored automatically from the latest metadata stored, and if the data nodes are unchanged in the index data, the metadata lost will be automatically recovered. However, if the data nodes have been replaced, then you can restore the index data by invoking the `_remotestore/_restore` API as described in the [remote store documentation]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/index/).
+Any time new cluster manager nodes are launched after disaster recovery, the nodes will automatically bootstrap using the latest metadata stored in the remote repository. This provides metadata durability.
+
+You can enable remote cluster state independently of remote-backed data storage.
+{: .note}
+
+If you require data durability, you must enable remote-backed data storage as described in the [remote store documentation]({{site.url}}{{site.baseurl}}/tuning-your-cluster/availability-and-recovery/remote-store/index/).
## Configuring the remote cluster state
@@ -59,4 +63,3 @@ Setting | Default | Description
The remote cluster state functionality has the following limitations:
- Unsafe bootstrap scripts cannot be run when the remote cluster state is enabled. When a majority of cluster-manager nodes are lost and the cluster goes down, the user needs to replace any remaining cluster manager nodes and reseed the nodes in order to bootstrap a new cluster.
-- The remote cluster state cannot be enabled without first configuring remote-backed storage.
diff --git a/images/dashboards/multidata-hide-localcluster.gif b/images/dashboards/multidata-hide-localcluster.gif
new file mode 100644
index 0000000000..b778063943
Binary files /dev/null and b/images/dashboards/multidata-hide-localcluster.gif differ
diff --git a/images/dashboards/multidata-hide-show-auth.gif b/images/dashboards/multidata-hide-show-auth.gif
new file mode 100644
index 0000000000..9f1f945c44
Binary files /dev/null and b/images/dashboards/multidata-hide-show-auth.gif differ
diff --git a/images/dashboards/vega-2.png b/images/dashboards/vega-2.png
new file mode 100644
index 0000000000..1faa3a6e67
Binary files /dev/null and b/images/dashboards/vega-2.png differ