Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Handling heap usage exceed error #711

Closed
sandeshkr419 opened this issue Nov 2, 2023 · 3 comments
Closed

[BUG] Handling heap usage exceed error #711

sandeshkr419 opened this issue Nov 2, 2023 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@sandeshkr419
Copy link
Contributor

What is the bug?
This phenomena is seen when a detector is searching documents (via Alerting plugin) and OpenSearch rejects the percolate query search request.

Sample error messgae:

"error_message" : "IllegalStateException[Failed to run percolate search for sourceIndex [log-aws-cloudtrail-2023-08] and queryIndex [.opensearch-sap-cloudtrail-detectors-queries-000001] for 10000 document(s)]; 
nested: SearchPhaseExecutionException[all shards failed]; 
nested: [cancelled task with reason: heap usage exceeded [45.9mb >= 9.2mb]]; 
nested: OpenSearchRejectedExecutionException[cancelled task with reason: heap usage exceeded [45.9mb >= 9.2mb]];

The reasons for this are:

  1. User setting detector run rate too less frequent - running the detector more frequently would allow documents in batches smaller than 10k already but this again has a constraint of 1m frequency as the most frequent. The size of those 10k documents is also a constraint against the available heap usage at that time.
  2. Using lower RAM/heap instances - one of the biggest contributing factor is less available heap memory in the first place. For smaller instance type, this is more likely to happen.

Possible solutions:

  1. Batching the documents - however, the issue is identification of batch size and maximum number of batches?
  2. Reducing the number of documents to be processed in a single batch as a function of instance heap size. This may require the number of documents in a single batch to be configurable. Something along the lines with 1k documents for 1GB heap, 2k docs for 2GB heap.....10k documents for 8GB and higher. This can be configurable and can be tuned up or down depending upon how the cluster is able to handle the documents.

Related issue: opensearch-project/OpenSearch#2818
How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What is the expected behavior?
A clear and concise description of what you expected to happen.

What is your host/environment?

  • OS: [e.g. iOS]
  • Version [e.g. 22]
  • Plugins

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

@sandeshkr419 sandeshkr419 added bug Something isn't working untriaged labels Nov 2, 2023
@eirsep eirsep removed the untriaged label Dec 22, 2023
@eirsep eirsep self-assigned this Dec 22, 2023
@eirsep
Copy link
Member

eirsep commented Dec 22, 2023

@eirsep
Copy link
Member

eirsep commented Jan 2, 2024

using number of documents is not the right parameter IMO
Rather just use heap size and set a threshold (can start with x% of heap; x should be a cluster setting whose default is derived from the righ benchmarking) to batch available docs in memory to perform percolate query and fetch documents for the remaining shards.

riysaxen-amzn pushed a commit to riysaxen-amzn/security-analytics that referenced this issue Mar 25, 2024
…rch-project#705) (opensearch-project#711)

Signed-off-by: Ashish Agrawal <[email protected]>

Signed-off-by: Ashish Agrawal <[email protected]>
(cherry picked from commit 41265f86c371a1bea697376b51816ab495bdbe96)

Co-authored-by: Ashish Agrawal <[email protected]>
@engechas
Copy link
Collaborator

engechas commented Apr 9, 2024

This was resolved with the recent performance enhancements. The number of docs submitted in each percolate request now considers the available heap

@engechas engechas closed this as completed Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants