Remote Vector Index Build Component — Remote Vector Service Client #2393

jed326 · 2025-01-14T21:52:45Z

See #2391 for background information

Overview

Following up on the RFCs, this is the first part of the low-level design for the Vector Index Build Component. The Vector Index Build Component is a logical component we further split into 2 subcomponents and their respective responsibilities:

Object Store I/O Component
1. Upload flat vectors to Object Store
2. Download graph file from object store
Remote Vector Service Client Component
1. Signal to Remote Vector Index Build Service to begin graph construction after vector files have been uploaded
2. Receive a signal from Remote Vector Index Build Service to begin graph file download after graph construction is completed

This document contains the low level design for [2] Remote Vector Service Client Component, covering the design for the client the vector engine will use to interact with the remote vector build service as well the workflows associated with using this client.

Tenets

The key tenets of the client are straightforward:

Signal to the remote vector service that vector blob upload is complete and ready for graph construction
Receive a signal from the remote vector service that graph construction is complete and graph download is ready
Handle failures so as to not fail the merge/flush operation

High level overview:

Alternatives Considered

1. [Recommended] REST requests with polling

In this approach we submit the graph build request via REST request and then use REST requests to poll the vector build service for completion status.

Pros:

Simplest implementation
Vector build service itself could still implement a queue to receive the build requests

Cons:

Higher request count to remote vector build service, need to configure polling interval and smart retry logic
A state machine would be required to keep track of the build progress

2. Persistent connection (gRPC, websockets)

In this approach we open a persistent connection between the OpenSearch cluster and the remote vector build service and keep the connection open until the graph construction is complete.
Pros:

Lower request count to remote vector build service

Cons:

Need to build robust reconnect logic. How do we handle the case where vector build completes or fails during a disconnect?
vector build service is multi-tenant so we may be bottlenecked on the number of concurrent persistent connections to the remote vector build service. Specifically any request queueing may lead to unexpected outcomes.
If we design a stateless system then it would be difficult to retry only specific actions, for example retrying only the graph upload part if that were to fail
There is very little data transfer happening through this client

3. REST callback

In this approach we submit a graph build request via REST request and expose a REST callback endpoint for the remote vector build service to notify on when the graph construction is complete
Pros:

No need to maintain a persistent connection or poll for results

Cons:

Very difficult to pass a notification from transport layer down to the index writer in the middle of segment merge operations, especially as GPU builds are per-segment rather than per shard
The REST callback would require a coordination layer to figure out which node / shard / segment the callback is associated with and this would have to exist outside of the segment merge
No way to get intermediate status, so we would have to heuristically determine how long to await on notification. Since graph builds may take on the order of hours, this could waste a lot of time if we need to fall back to the CPU build path

4. Queue based mechanism

In this approach we submit a graph build request to a queue rather than directly to the remove vector build service. We also consume graph build completed notifications through a separate queue.
Pros:

Same as [3]
Load shedding / balancing is easier with the queue in front of the remote vector build service

Cons:

Queue based implementation can make it difficult to prioritize tasks as we would not know the priority of any tasks until they are consumed from the queue
Additional piece of Queue infrastructure adds both cost and complexity
The same as [3]

Workflow

In addition to performing status checks, we also need to fallback to the local CPU build in the remote failure scenarios. Below is a high level workflow overview with highlighted components representing usage of the remote vector service client. In this diagram we do not make distinctions between failure statuses and failed HTTP requests — this will clarified in the sections further below.

Polling Based Client

We will implement a simple HTTP client that performs POST / GET requests against a configurable remote vector service endpoint

Vector Build Service Client Configurations

This section covers all of the different configurations we will expose in order for a user to configure the client to point at their remote vector build service

Cluster setting to store remote vector service endpoint
1. We can also consider an index setting override for the remote vector service endpoint, as we may find certain types of indices perform better on certain types of specialized hardware. This is also logic that the vector build service itself could handle though. We are not planning on this for now.
Cluster settings to store auth header information. For now we will support the following auth headers:
1. Basic Auth
2. API Keys

ml-commons connector docs for reference

Trigger Vector Build

POST /build

Input:
- type: The remote object store type (s3 / azure / gcs, etc)
- container: The name of the container (s3 bucket, azure container, gcs bucket)
- Vector file: Full file path to the vector file, including the container base path
- Index parameters: JSON object including all required graph parameters
- Tenant ID: Unique identifier for the cluster making the request. This can be used for billing, authorization, etc.

Output:
- Job ID: Unique identifier both the vector engine and remote vector build service will use to associate the vector build task.

This API needs to be idempotent in order to support retries.

Additionally, we do not create a task id to track the vector build status because this would require the vector build service to internally maintain a mapping between task id and the graph file being built, and this mapping would need to be persisted after the graph construction is complete in order to signal the vector engine to download the graph file from the object store. In failure scenarios, this makes it complicated for the vector build service to determine how long to persist task IDs after the graph construction is complete.

The key invariant is the vector blob path is unique to a specific segment which is being worked on, so that path can be used to associate a status request with a given graph build request. Moreover since there is a 1:1 mapping between constructed graph file and vector blob, any status request could simply check for the existence of a graph file to determine if a graph build is complete or not (whether to do so or not is left up to the implementation of the remote vector build service).

Get Vector Build Status

For the vector build status the key design decision is the verbosity of the status outputs and subsequent state machine implementations:

[Recommended] 1. Low verbosity for maximum compatibility

The possible task statuses in this solution would look like:

RUNNING_GRAPH_BUILD -- Graph build task is in progress. This state represents all time between when the build request is submitted and when the graph upload is complete.
FAILED_GRAPH_BUILD -- Graph build task is failed
COMPLETED_GRAPH_BUILD -- Graph build task is completed, including graph upload

Pros

Fewest number of states for simplicity
Specific retry implementation logic would be determined by the vector build service rather than implicitly defined by the client (see: Status Request Failure Response)
Fewer number of configurations needed for retries and failure scenarios

Cons

No granular visibility into remote vector build service components, such as vector download time, graph build time, graph upload time

GET /status

Input:
- Job ID: Unique identifier both the vector engine and remote vector build service will use to associate the vector build task.

Output:
- Task Status:
    1. RUNNING_GRAPH_BUILD -- Graph build task is in progress. This state represents all time between when the build request is submitted and when the graph upload is complete.
    2. FAILED_GRAPH_BUILD -- Graph build task is failed
    3. COMPLETED_GRAPH_BUILD -- Graph build task is completed, including graph upload

2. High Verbosity for granular visibility

The possible task statuses in this solution would look like:

PENDING_VECTOR_DOWNLOAD
RUNNING_VECTOR_DOWNLOAD
FAILED_VECTOR_DOWNLOAD
PENDING_GRAPH_BUILD
RUNNING_GRAPH_BUILD
FAILED_GRAPH_BUILD
PENDING_GRAPH_UPLOAD
RUNNING_GRAPH_UPLOAD
COMPLETED_GRAPH_UPLOAD
FAILED_GRAPH_UPLOAD

Pros

Granular visibility into remote vector build service components, such as vector download time, graph build time, graph upload time, etc.

Cons

Additional complexity involved in managing state transitions between all the success and failure states
More complexity in designing the remote vector build service component as the client is strictly dictating the states the vector build service needs to maintain.
More tightly coupled client/service
Retry logic (in state machine) will need to be handled client side

Cancel Vector Build

We also provide a cancellation API for operational support in order to cancel specific graph build tasks.

POST /cancel

Input:
- Job ID: Unique identifier both the vector engine and remote vector build service will use to associate the vector build task.

Output:
- Request acknowledgment

Internal State Machine

Because we want to proceed with less verbose statuses and leave more specific retry implementation up to the remote vector build service itself, we do not need to (and do not want to) maintain a complicated state machine for each remote vector build. Following diagram contains the internal state machine for each remote vector build task as well as the state transitions based on remote vector service client responses.

Since the states and transitions are very straightforward, we will not maintain this state machine as a DAG or any other data structure from within the segment merge/flush operation but instead we are using the term “state machine” as a way to formalize the expected outcomes of each API response.

Failure scenarios including retry logic is discussed in the next section below: Status Request Failure Response

Failure Scenarios

This section covers the various failure scenarios related to the client and how we would handle each failure and specifically we need to make distinctions between retriable and non-retriable results.

Status Request Failure Response

This covers the scenario where we do not receive any request failures however the /status API indicates that the graph build failed. In order to retry these scenarios, we will need to submit another /build request to the remote vector build service.

The number of times we will re-submit the /build request will be controlled by a cluster setting and specific failure retry implementation will be left up to the remote vector build service to implement. For example, if a failure happens in the graph upload step we will leave it up to the remote vector build service to decide whether to retry specifically re-uploading the graph to the object store or if it should start from scratch and rebuild the graph. This type of retry across nodes is naturally unsynchronized as it’s up to the job scheduler of the remote vector build service to schedule the graph build jobs.

Request Failures

This covers any failure responses received when calling the /build and the /status APIs. From the remote vector service client perspective, the main information available to us to make a determination on whether to retry or not is the HTTP status code, and for that we should follow the AWS SDK retry standards on transient errors (source). This means the following status codes will be eligible for retry:

429
500
502
503
504
509

It will be up to the specific remote vector build service to implement these status codes — for example it’s left up to a specific service to choose whether to throw 403 or 404 when a request is received for a non-existent vector blob.

For this type of failure scenario we will provide a separate client retry/backoff + jitter configuration (a separate cluster setting from Status Request Failure Response) to be used to retry failed HTTP requests. In order to mitigate the unlikely scenario of synchronized retries across nodes we will implement retries with exponential backoff + jitter.

Metrics

This section will cover metrics specific to the the remote vector service client and it’s usage. Other metrics emitted by the remote vector build service itself will be handled in a separate document.

Build Request Success/Failure Count
Build Request Retry Count
Status Request Success/Failure Count
Overall Graph Build Success/Failure Count
Overall Graph Build Retry Count

Today the k-NN stats API only supports cluster and node level stats, so we can gather these metrics on a cluster/node level and expose them via the k-nn stats API.

As a separate item we should explore supporting index/shard level k-nn stats as it would be valuable to see specifically which indices are using and benefiting the most from the remote vector build service.

Future Improvements

Although a polling based client will be the simplest implementation in the first iteration, we may encounter scaling problems as adoption of the feature increases. In a future low level design we will further explore how we can design the state machine and state transitions in such a way that are forward compatible with any client architecture changes. For now we are keeping the number of statuses as few as possible to make doing so easier in the future.

The text was updated successfully, but these errors were encountered:

jed326 mentioned this issue Jan 14, 2025

[Meta] Remote Vector Index Build Component in OpenSearch Vector Engine #2391

Open

5 tasks

github-actions bot added the untriaged label Jan 14, 2025

jed326 changed the title ~~HLD - Remote Vector Build Service Client~~ Remote Vector Index Build Component — Remote Vector Service Client Jan 14, 2025

jed326 self-assigned this Jan 14, 2025

opensearch-infra bot added this to OpenSearch Roadmap Jan 14, 2025

jed326 added this to Vector Search RoadMap Jan 14, 2025

github-project-automation bot moved this to New in OpenSearch Roadmap Jan 14, 2025

github-project-automation bot moved this to Backlog in Vector Search RoadMap Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remote Vector Index Build Component — Remote Vector Service Client #2393

Remote Vector Index Build Component — Remote Vector Service Client #2393

jed326 commented Jan 14, 2025 •

edited

Loading

Remote Vector Index Build Component — Remote Vector Service Client #2393

Remote Vector Index Build Component — Remote Vector Service Client #2393

Comments

jed326 commented Jan 14, 2025 • edited Loading

Overview

Tenets

Alternatives Considered

1. [Recommended] REST requests with polling

2. Persistent connection (gRPC, websockets)

3. REST callback

4. Queue based mechanism

Workflow

Polling Based Client

Vector Build Service Client Configurations

Trigger Vector Build

Get Vector Build Status

2. High Verbosity for granular visibility

Cancel Vector Build

Internal State Machine

Failure Scenarios

Status Request Failure Response

Request Failures

Metrics

Future Improvements

jed326 commented Jan 14, 2025 •

edited

Loading