Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing to create table with sstable_compression=ZstdCompressor #22444

Open
fruch opened this issue Jan 22, 2025 · 8 comments · May be fixed by #22451
Open

Failing to create table with sstable_compression=ZstdCompressor #22444

fruch opened this issue Jan 22, 2025 · 8 comments · May be fixed by #22451
Assignees
Labels
area/source available Issues related to the migration to source available area/sstable backport/6.1 should be backported to 6.1 backport/6.2 should be backported to 6.2 status/regression
Milestone

Comments

@fruch
Copy link
Contributor

fruch commented Jan 22, 2025

Scylla version 2025.1.0~dev-0.20250122.a8805c4fc184 with build-id 3555be198546b70f36699ff8733c67420ba66881

creating a table with WITH compression = {'sstable_compression': 'ZstdCompressor'} fails on master:

E   cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.0.6.1:9042 datacenter1>: <Error from server: code=0000 [Server error] message="unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor'">, <Host: 127.0.6.2:9042 datacenter1>: <Error from server: code=0000 [Server error] message="unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor'">, <Host: 127.0.6.3:9042 datacenter1>: <Error from server: code=0000 [Server error] message="unable to find class 'org.apache.cassandra.io.compress.ZstdCompressor'">})

reproducer with dtest (seems like we don't have such test anywhere):

diff --git a/putget_test.py b/putget_test.py
index 2ed62b92..9f97820a 100644
--- a/putget_test.py
+++ b/putget_test.py
@@ -42,6 +42,10 @@ class TestPutGet(Tester):
         """Simple put/get on a single row, but hitting multiple sstables (with deflate compression)"""
         self._putget(compression="Deflate")

+    def test_putget_zstd(self):
+        """Simple put/get on a single row, but hitting multiple sstables (with snappy compression)"""
+        self._putget(compression="Zstd")
+
     # Simple queries, but with flushes in between inserts to make sure we hit
     # sstables (and more than one) on reads
     def _putget(self, compression=None):
# fails
./scripts/run_test.sh --scylla-version=unstable/master:latest putget_test.py::TestPutGet::test_putget_zstd
...
=========================== short test summary info ============================
FAILED putget_test.py::TestPutGet::test_putget_zstd - cassandra.cluster.NoHos...
============================== 1 failed in 33.47s ==============================
Docker exitcode: 1

# works on release 6.2 and 2024.2

./scripts/run_test.sh --scylla-version=release:6.2 putget_test.py::TestPutGet::test_putget_zstd
./scripts/run_test.sh --scylla-version=release:2024.2 putget_test.py::TestPutGet::test_putget_zstd

...
putget_test.py::TestPutGet::test_putget_zstd PASSED

============================== 1 passed in 46.65s ==============================
Docker exitcode: 0
@fruch fruch added status/regression area/sstable triage/master Looking for assignee area/source available Issues related to the migration to source available labels Jan 22, 2025
@fruch
Copy link
Contributor Author

fruch commented Jan 22, 2025

I think it's related to change come in as part of source available, but I could track it down myself as why ZstdCompressor isn't registered

@mykaul
Copy link
Contributor

mykaul commented Jan 22, 2025

@fruch - can we do a bisect to find out? It may be from the advanced compression, but it is strange (that was for networking, not for storage)
@michoecho - any idea?

@tchaikov tchaikov self-assigned this Jan 22, 2025
@tchaikov
Copy link
Contributor

i am able to reproduce with release build, but not with debug build. looking.

@michoecho
Copy link
Contributor

michoecho commented Jan 22, 2025

ZstdCompressor is looked up in a class_registry of compressors, by the compressor_registry::create() call in compress.cc. (N.b. this registry contains only ZstdCompressor, other compressors are looked up directly in compressor::create).

This means that ZstdCompressor becomes visible only after the static initialization (the const class_registrator<compressor, zstd_processor, const compressor::opt_getter&> registrator(COMPRESSOR_NAME);) in zstd.o runs, and registers ZstdCompressor in the registry.

But no symbol in zstd.o is explicitly used by the rest of the codebase. They are only used implicitly, via this static initialization code which causes it to register itself at runtime. So it's prone to being garbage-collected (/ignored) by the linker.

Hypothesis: zstd.o is not picked up by the linker. Likely solution: link it with --whole-archive. (Long-term solution: get rid of the class_registrator, and look up ZstdCompressor directly. ZstdCompressor is not special enough to have its own fancy linking mechanism).

@tchaikov
Copy link
Contributor

it's related to the order of static variables. cooking a patch now.

@tchaikov
Copy link
Contributor

Hypothesis: zstd.o is not picked up by the linker. Likely solution: link it with --whole-archive.

it is linked w/ whole-archive.

(Long-term solution: get rid of the class_registrator, and look up ZstdCompressor directly. ZstdCompressor is not special enough to have its own fancy linking mechanism).

yeah, i concur with you. but i see there were some back and forth in this area.

tchaikov added a commit to tchaikov/scylladb that referenced this issue Jan 22, 2025
…fix a function

Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be
initialized before compressor::namespace_prefix due to undefined
global variable initialization order across translation units. This
was causing ZstdCompressor to be unregistered in release builds,
making it impossible to create tables with Zstd compression.

Replace the global namespace_prefix variable with a function that
returns the fully qualified compressor name. This ensures proper
initialization order and fixes the registration of the ZstdCompressor.

Fixes scylladb#22444
Signed-off-by: Kefu Chai <[email protected]>
@github-actions github-actions bot added backport/6.1 should be backported to 6.1 backport/6.2 should be backported to 6.2 labels Jan 22, 2025
@mykaul mykaul removed the triage/master Looking for assignee label Jan 22, 2025
@mykaul mykaul added this to the 2025.1.0 milestone Jan 22, 2025
@fruch
Copy link
Contributor Author

fruch commented Jan 22, 2025

I've found this issue with:
scylladb/scylla-cluster-tests#9902

it's a side test, use to check SCT machinery, and cassandra-stress working as expected. and we aren't running it regularly.

we should have a test for this feature in test.py or in dtest

@mykaul
Copy link
Contributor

mykaul commented Jan 22, 2025

I've found this issue with: scylladb/scylla-cluster-tests#9902

it's a side test, use to check SCT machinery, and cassandra-stress working as expected. and we aren't running it regularly.

we should have a test for this feature in test.py or in dtest

I'd be very surprised if we don't and we indeed need.

tchaikov added a commit to tchaikov/scylladb that referenced this issue Jan 22, 2025
…fix a function

Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be
initialized before compressor::namespace_prefix due to undefined
global variable initialization order across translation units. This
was causing ZstdCompressor to be unregistered in release builds,
making it impossible to create tables with Zstd compression.

Replace the global namespace_prefix variable with a function that
returns the fully qualified compressor name. This ensures proper
initialization order and fixes the registration of the ZstdCompressor.

Fixes scylladb#22444
Signed-off-by: Kefu Chai <[email protected]>
tchaikov added a commit to tchaikov/scylladb that referenced this issue Jan 22, 2025
…fix a function

Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be
initialized before compressor::namespace_prefix due to undefined
global variable initialization order across translation units. This
was causing ZstdCompressor to be unregistered in release builds,
making it impossible to create tables with Zstd compression.

Replace the global namespace_prefix variable with a function that
returns the fully qualified compressor name. This ensures proper
initialization order and fixes the registration of the ZstdCompressor.

Fixes scylladb#22444
Signed-off-by: Kefu Chai <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/source available Issues related to the migration to source available area/sstable backport/6.1 should be backported to 6.1 backport/6.2 should be backported to 6.2 status/regression
Projects
None yet
4 participants