-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access to data transforms lost when a broker is unreachable #1566
Comments
What's the Admin API config? The Transforms menu item on the left is controlled via Console's "endpoint compatibility" functionality. Console checks different Kafka and Redpanda API for version and API capabilities. For Transforms specifically the recommended way is to attempt to list transforms, and if it fails that means the broker / cluster does not support it. Similarly the actual listing of the transforms is done via Admin API list transforms functionality... The way Admin API works is that for a most cases, it issues the Admin API request against a random broker within the cluster. Admin API client is not fully dynamic, it does built out the map of the brokers in the cluster available at creation, but it does not constantly keep the data up to date, on every Admin API request. What I believe is happening is that the original capability check was performed while the cluster was healthy, or the original capability check was issues against the a healthy broker node. Then one of the nodes is removed, but the Console's Redpanda Admin API client is now out of date. When the subsequent list transforms request is issued to Redpanda Admin API, the Admin client may or may not send the request to the (now outdated) broker node, that is no longer functioning, and the request fails. A subsequent request gets sent to a random different node, that is up and the request works. Same issue for Transforms being enabled and disabled randomly. I think we could improve the situation but refreshing the Console's Admin API client with a timer perhaps. |
Thanks for the detailed response. I can get this to happen consistently (tried three times). Here's the admin config for Console
|
So we have a cluster with 3 brokers. Can you confirm that everything works correct when all brokers are up? Then once 1 or more brokers are brought down, Console UX starts misbehaving and transforms sometimes is enabled and sometimes it is not? Is that correct? This is exactly due to the the way we determine if Transforms is enabled like I described. Console may issue the Admin API request to one of the live brokers, or it may randomly issue the Admin API request to one of the down brokers, which errors out, thus causing it to be disabled.
I am not sure what could cause that. Console just issues Admin API request for Get Cluster Health Overview API. So there may be some lag in cluster's view of the overall health of the cluster, after broker is brought up, and before the update is represented to that API. |
yes that’s right |
When a broker goes down, the Transforms item in the nav tree is sometimes disabled. Other times, you can click it, but the page doesn't load. When you bring the broker back up, the page is available again.
Also, it takes a while for the debug info on the Overview page to disappear after the broker returns online. The status changes to Running, but the debug status still says that a broker is unreachable.
2024-12-19_10-20-32.mp4
The text was updated successfully, but these errors were encountered: