Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to data transforms lost when a broker is unreachable #1566

Open
JakeSCahill opened this issue Dec 19, 2024 · 4 comments
Open

Access to data transforms lost when a broker is unreachable #1566

JakeSCahill opened this issue Dec 19, 2024 · 4 comments

Comments

@JakeSCahill
Copy link
Contributor

JakeSCahill commented Dec 19, 2024

When a broker goes down, the Transforms item in the nav tree is sometimes disabled. Other times, you can click it, but the page doesn't load. When you bring the broker back up, the page is available again.

Also, it takes a while for the debug info on the Overview page to disappear after the broker returns online. The status changes to Running, but the debug status still says that a broker is unreachable.

2024-12-19_10-20-32.mp4
@bojand
Copy link
Member

bojand commented Dec 19, 2024

What's the Admin API config?

The Transforms menu item on the left is controlled via Console's "endpoint compatibility" functionality. Console checks different Kafka and Redpanda API for version and API capabilities. For Transforms specifically the recommended way is to attempt to list transforms, and if it fails that means the broker / cluster does not support it.

Similarly the actual listing of the transforms is done via Admin API list transforms functionality...

The way Admin API works is that for a most cases, it issues the Admin API request against a random broker within the cluster.

Admin API client is not fully dynamic, it does built out the map of the brokers in the cluster available at creation, but it does not constantly keep the data up to date, on every Admin API request.

What I believe is happening is that the original capability check was performed while the cluster was healthy, or the original capability check was issues against the a healthy broker node. Then one of the nodes is removed, but the Console's Redpanda Admin API client is now out of date. When the subsequent list transforms request is issued to Redpanda Admin API, the Admin client may or may not send the request to the (now outdated) broker node, that is no longer functioning, and the request fails. A subsequent request gets sent to a random different node, that is up and the request works.

Same issue for Transforms being enabled and disabled randomly.

I think we could improve the situation but refreshing the Console's Admin API client with a timer perhaps.

@JakeSCahill
Copy link
Contributor Author

Thanks for the detailed response. I can get this to happen consistently (tried three times).

Here's the admin config for Console

        redpanda:
          adminApi:
            enabled: true
            urls: ["http://redpanda-0:9644","http://redpanda-1:9644","http://redpanda-2:9644"]
            username: superuser
            password: secretpassword

@bojand
Copy link
Member

bojand commented Jan 7, 2025

So we have a cluster with 3 brokers.

Can you confirm that everything works correct when all brokers are up?

Then once 1 or more brokers are brought down, Console UX starts misbehaving and transforms sometimes is enabled and sometimes it is not? Is that correct?

This is exactly due to the the way we determine if Transforms is enabled like I described. Console may issue the Admin API request to one of the live brokers, or it may randomly issue the Admin API request to one of the down brokers, which errors out, thus causing it to be disabled.

Also, it takes a while for the debug info on the Overview page to disappear after the broker returns online. The status changes to Running, but the debug status still says that a broker is unreachable.

I am not sure what could cause that. Console just issues Admin API request for Get Cluster Health Overview API. So there may be some lag in cluster's view of the overall health of the cluster, after broker is brought up, and before the update is represented to that API.

@JakeSCahill
Copy link
Contributor Author

Can you confirm that everything works correct when all brokers are up?

Then once 1 or more brokers are brought down, Console UX starts misbehaving and transforms sometimes is enabled and sometimes it is not? Is that correct?

yes that’s right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants