-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Workload Identity with KEDA #4597
Comments
Hi,
|
So the configuration is correct. I pulled the oidc url out of the error message on purpose and everything looks correct. I have had several people double check it. The keda-operator has the 4 workload identity env variables injected into it. The managed identity does have the federated credentials in it. I can use the same managed identity with a different federated credential in another application and it works just fine. |
Could you have any login restriction policy at AAD level? For example: at work, we have some restrictions for limiting MSI/ServicePrincipal logins from some locations |
I mean, if the |
@tomkerkhove , do you know who could help us here? |
So I asked the person that manages that for my company and set most of our subscriptions up and he said there are no restrictions setup for MSI or SPs. We have also submitted an Azure Help Desk ticket since our company has full MS Support but we have not heard back yet. I just got done setting up a Test case where I have KeyVault CSI SecretProviderClass and Keda using the same Managed Identity and both using the same serviceaccount and thus the same federated credentials. The KeyVault CSI works just fine and KEDA is getting the above error message. So I know the cluster is successfully pulling the OIDC for that Federated Credential. In my POD that is from the Deployment with the ScaledObject I see a constant stream of error messages not from my code. I am assuming they are from the KEDA scaledobject since it matches the event in there. But it does not provide me any more info than above. |
What info would you like to have? I mean, we will release the next version in a few weeks and any improvement is welcomed. Would you be willing to contribute with this extra info? |
Yeah I am not sure what info I would like... This type of problem is a really hard problem to solve. I will tell you that I have now isolated the issue to the managed identity we are using associated with the KEDA install. Today I was able to hand create a Manage Identity in the same subscription that the Cluster is running in and it allowed KEDA to work properly. The one I was trying to use was generated in our Bicep File that installs the cluster and lots of other resources. I now need to start digging through the Managed Identity to figure out why the one I hand created worked and the one Bicep generated does not. As far as I can tell right now they both have the Same exact RBAC roles setup and they have the same federated credentials setup. |
Would you think that printing things like the used clientId could be useful? As KEDA supports to use other clientIds than the default (just providing them in the TriggerAuthentication), maybe it could help 🤔 |
Yes I agree. I am not sure what KEDA has that would help from the KEDA side. The chaining is adding a level of complexity that makes debugging somewhat harder. When seeing the error message I am always questioning which one is it talking about the managed identity associated with the keda-operator or the managed identity that is associated with the workload. I am still confused on why the managed identity works for secret csi but then does not work for KEDA, I do not understand why KEDA works on one cluster than does not work on a second cluster at all with the exact same setup just different subscriptions. All of our setups are using bicep to install the azure resources and we use ArgoCD to install the Kubernetes resources. So the only differences between the KEDA installs is basically the Managed Identity Client ID which I have verified is correct for each cluster. |
Yeah, I agree with the extra complexity of the chained credential. I'd like to add Okteto for improving development experience and also to improving troubleshooting but until that, the only way that I have found to debug scalers with identities is that chained credential :/ |
Hi @kstedman9 |
@JorTurFer We are still waiting on Microsoft Tech Support to figure it out. On one of the clusters, I was able to just recreate the Managed Identity using the same exact bicep to create it and it worked the second time. On a different cluster I have tried recreating the managed identity several times and it still has not fixed it on it. I will update here when I finally get any resolution from them. |
It'd be nice! |
So we finally figured it out. The workloads managed identity being used must have the federated credentials for KEDA Service Account in it along with the workload required federated credentials. Not sure why this is the case. I just happen to have it on all my test cases but when it came to using it on the production clusters I did not do that. |
Oh, I thought that you already federated them. This is mandatory because the working way of workload identity federation is based on federating k8s OIDC with Azure AD, this means that you have to register in Azure the cluster OIDC to trust on it, but also you have to identify the service account because it's part of the token from k8s (it's the subject). |
Actually no. Because of chaining of identities this should not be required correct? The managed identity associated with the keda service account has the federated credential in it. Why would I then need to add that same federated credential to every workloads managed identity also? I do not have to do this for other ones like KeyVault CSI. |
KEDA service account has to be federated with all the identities that KEDA has to use. During the "login" process in AAD, KEDA uses its own k8s token, that is generated for KEDA's service account. It's not enough just federating the workloads, because KEDA doesn't have access to those workloads tokens (they are mounted directly into those pods). If you are using a single identity for KEDA, and it's that identity who has the accesses, you just need to federate KEDA with that, but if you are using different managed identities (overriding the default value in the TriggerAuthentication) to reuse workloads' managed identities for KEDA, KEDA service account must be federated there too. It depends on how you are managing the identities and the accesses, but KEDA must be federated with any managed identity that is used during KEDA operation. Let's put some examples Case 1You have an identity for KEDA, with access to ServiceBus A, ServiceBus B and ServiceBus C, also you have other identities for the workloads, so you have:
In this case, KEDA has to be federated only with KEDA msi. Case 2To avoid stacking too many permissions on KEDA identity, you have an identity for KEDA with any access (not related, for example, to the Key Vault), also you have other identities for the workloads, so you have:
In this case, you are overriding the default identity set during the installation thanks to the option in TriggerAuthentication (
Is it clearer now? I guess that your case is the case 2, am I right? |
Yes Case 2 is what we are using. Yes I understand more now. Is there anyway this can be documented better in the KEDA docs? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. |
@JorTurFer would be nice to add this to documentation. |
We could add this the the docs an example, but honestly, I'd not do it because it's how the workload identity works itself, if MSFT changes something, we'll be updated. This isn't related with KEDA but any workload in general (at least in my mind, it's directly related with the federation concept), but as I said, we can added a reference to this comment or directly copy and paste this into docs. |
Gotcha. Can we at least link relevant Msft MSFT? |
Yeah, we can add the whole example from my comment too. I mean, the concept is clear without entering in implementation details, but this is strongly attached to how Azure Workload Identity works, for example, the same idea in AWS side, works in a totally different way |
Yeah, but I mean the more docs the better :) So I think that adding this with a note that it works with Azure Workload Identity isn't going to do any harm. |
Can this be reopened? I'm trying to understand a bit more about how this works with workload identity, especially when using the AKS add-on that's now GA Am I safe to assume that for this to work as you're describing (using this scenario for example), we still need to install KEDA via the helm chart - as we then have an identity to use for role assignments (as noted in the above service bus scenario)? I think something's just not computing with me on how the role assignments should be configured, either with the helm chart or AKS add-on |
You have to assign the role, via UI, az-cli, or however you prefer: https://github.com/kedacore/sample-dotnet-worker-servicebus-queue/blob/main/workload-identity.md#creating-a-new-azure-service-bus-namespace--queue
about AKS Add-on, I have no idea about how it works, I assume that it creates a Managed Identity for KEDA somehow, but that part has to be asked there as we don't have any control/information about that. It's provided by Microsoft If you have any question about the concept, you can ask here (or open other issue/discussion) |
Report
No response
Expected Behavior
We have KEDA setup on several clusters and are trying to use workload identity with it. We also have other tools setup on these same clusters that use workload identity just fine. One of our clusters is working just fine with KEDA and workload identity, but all the others are not. We install KEDA exactly the same way on all clusters. If I install my KEDA test setup to another cluster we get the error message below in both the scaledobject events and in the logs for the keda-operator.
Does anybody have any Idea how to debug this?
I have gone through the managed identities comparing everything I can find between the two setups and I can find no difference and the Federated credentials all match.
Actual Behavior
KEDA is throwing ERRORs in the log and ScaledObject. KEDA is not scaling up the workload.
Steps to Reproduce the Problem
Unknown how to have someone else reproduce since I do not even understand the problem in the first place. Everything has been checked by multiple people on the team and no one can see any issues.
Logs from KEDA operator
KEDA Version
2.10.1
Kubernetes Version
1.25
Platform
Microsoft Azure
Scaler Details
Azure Service Bus
Anything else?
No response
The text was updated successfully, but these errors were encountered: