Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Speculation] Unexpectedly high memory usage when using bloom filters & monotonic keys #63

Closed
marvin-j97 opened this issue Sep 30, 2024 · 0 comments · Fixed by #72
Closed

Comments

@marvin-j97
Copy link
Contributor

marvin-j97 commented Sep 30, 2024

L0 and L1 segments will be written with very low FP rate (more bits) for bloom filters. This is fine because L0 segments don't tend to live long. However, in a monotonic series, those segments are never rewritten, so at some point, when there are 100s or 1000s of disjoint segments, the cost really builds up, resulting in unexpectedly high memory usage. Of course, in those scenarios one could disable bloom filters, but it's still unexpected.

For very large datasets, Ribbon filters may be desirable, which trade lower memory usage for higher CPU cost. https://github.com/fjall-rs/lsm-tree/issues/64

Plus compression would not be applied, when skipping compression on L0: #37

And with #51 L0/L1 segments will have high memory impact because of the full block index, so higher memory usage should be even more noticable.

Possible solution, in Leveled compaction strategy:
When moving from L1 to L2, if feature "bloom" OR any(lz4, miniz) enabled, rewrite segments instead of trivial move
This increases write amp by 1, but should be worth it. To do so, Leveled compaction needs to improved so it can trivially move into L1 without having to wait for an ongoing compaction (if the key ranges don't overlap - as they don't in a disjoint workload)

@marvin-j97 marvin-j97 changed the title [Speculation] High memory usage when using bloom filters & monotonic keys [Speculation] Unexpectedly high memory usage when using bloom filters & monotonic keys Oct 1, 2024
@marvin-j97 marvin-j97 added enhancement New feature or request performance test and removed enhancement New feature or request labels Oct 1, 2024
@marvin-j97 marvin-j97 pinned this issue Oct 18, 2024
@marvin-j97 marvin-j97 unpinned this issue Oct 21, 2024
marvin-j97 added a commit that referenced this issue Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant