-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad bridge HTTP responses when Kafka cluster is not running/not reachable #488
Comments
I do not think being healthy and having the HTTP port opened is necessarily wrong. I would just fix the HTTP return codes to return some corresponding errors. |
I don't see it as a problem that the healthy and ready endpoints return 200 OK. If Kafka is down that's not a problem with the bridge. Restarting the bridge won't help. I do think we need to look carefully at what the actual API methods are doing in this situation. My gut feeling is that the vertx client could be hiding certain errors. E.g. what does |
From my point of view, it really depends on what users expect from the healthy and ready endpoints of the bridge. |
So it sounds like you agree for I kinda agree with you about
But "up" is not well defined, at all. You're having to add some extra code to poll metadata just to decide whether it's ready and actually there are loads of ways you can that metadata but still not be able to service the client (e.g. it wants to produce to a partition which lacks a leader, there's not enough replicas, or the client's not authorized for the topic). So this definition of ready doesn't seem to be achieving very much for the user except hiding what could be a more useful status message. So I think |
Just making the point that the bridge is supposed to run outside of Kubernetes as well so there is no concept of probe and service selection anymore. |
It turned out that the HTTP bridge has quite bad behavior when a Kafka cluster is not running or is not reachable by the bridge itself.
After adding the admin client endpoint, there is a connection that this tries to establish with the Kafka cluster that of course fails if the cluster is not running/not reachable.
In this scenario, if an HTTP client sends requests to the bridge these following responses are returned:
Of course, they are wrong. The bridge is not working well due to the lack of connection to the Kafka cluster.
I would say that if this happens, the HTTP server accepting requests should not start and not being reachable by the HTTP clients or at least returning propers error codes and mainly an error on the healthy and ready endpoints.
The text was updated successfully, but these errors were encountered: