Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workload Identity with KEDA #4597

Closed
kstedman9 opened this issue May 31, 2023 · 30 comments
Closed

Workload Identity with KEDA #4597

kstedman9 opened this issue May 31, 2023 · 30 comments
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity

Comments

@kstedman9
Copy link

kstedman9 commented May 31, 2023

Report

No response

Expected Behavior

We have KEDA setup on several clusters and are trying to use workload identity with it. We also have other tools setup on these same clusters that use workload identity just fine. One of our clusters is working just fine with KEDA and workload identity, but all the others are not. We install KEDA exactly the same way on all clusters. If I install my KEDA test setup to another cluster we get the error message below in both the scaledobject events and in the logs for the keda-operator.

Does anybody have any Idea how to debug this?

I have gone through the managed identities comparing everything I can find between the two setups and I can find no difference and the Federated credentials all match.

Actual Behavior

KEDA is throwing ERRORs in the log and ScaledObject. KEDA is not scaling up the workload.

Steps to Reproduce the Problem

Unknown how to have someone else reproduce since I do not even understand the problem in the first place. Everything has been checked by multiple people on the team and no one can see any issues.

Logs from KEDA operator

(combined from similar events): ChainedTokenCredential authentication failed POST https://login.microsoftonline.com/<TENANT ID>/oauth2/v2.0/token -------------------------------------------------------------------------------- RESPONSE 401 Unauthorized -------------------------------------------------------------------------------- { "error": "unauthorized_client", "error_description": "AADSTS70021: No matching federated identity record found for presented assertion. Assertion Issuer: '<OIDC-URL>'. Assertion Subject: 'system:serviceaccount:keda:keda-operator'. Assertion Audience: 'api://AzureADTokenExchange'. https://docs.microsoft.com/en-us/azure/active-directory/develop/workload-identity-federation\r\nTrace ID: 2c9b6265-8984-4d03-b502-cabf8aab5d01\r\nCorrelation ID: 64bd287f-80ee-45ea-b6ab-087abb31ab53\r\nTimestamp: 2023-05-31 21:43:25Z", "error_codes": [ 70021 ], "timestamp": "2023-05-31 21:43:25Z", "trace_id": "2c9b6265-8984-4d03-b502-cabf8aab5d01", "correlation_id": "64bd287f-80ee-45ea-b6ab-087abb31ab53", "error_uri": "https://login.microsoftonline.com/error?code=70021" } --------------------------------------------------------------------------------

KEDA Version

2.10.1

Kubernetes Version

1.25

Platform

Microsoft Azure

Scaler Details

Azure Service Bus

Anything else?

No response

@kstedman9 kstedman9 added the bug Something isn't working label May 31, 2023
@JorTurFer
Copy link
Member

Hi,
It looks like a wrong configuration. Is the managed identity federated with the correct OIDC url? Is the KEDA service account annotated with the correct clientId which has been federated with that cluster OIDC url?
Based on the error, KEDA is trying to use the provided managed identity, but the provided clientId doesn't have the federation registered. That could mean some things, but I'd check typos/misconfigurations:

  • subject in the federation section in azure
  • oidc url in the federation section in azure
  • clientId in the service account

@kstedman9
Copy link
Author

So the configuration is correct. I pulled the oidc url out of the error message on purpose and everything looks correct. I have had several people double check it. The keda-operator has the 4 workload identity env variables injected into it. The managed identity does have the federated credentials in it.

I can use the same managed identity with a different federated credential in another application and it works just fine.

@kstedman9
Copy link
Author

kstedman9 commented Jun 1, 2023

Picture of the Federated Credential from the managed Identity
image

Picture of the Running keda-operator POD Description:
image

@JorTurFer
Copy link
Member

Could you have any login restriction policy at AAD level? For example: at work, we have some restrictions for limiting MSI/ServicePrincipal logins from some locations

@JorTurFer
Copy link
Member

I mean, if the subject, the oidc, the clientId and the audience are correct, the login restrictions are the only things left that I can image

@JorTurFer
Copy link
Member

@tomkerkhove , do you know who could help us here?

@kstedman9
Copy link
Author

I mean, if the subject, the oidc, the clientId and the audience are correct, the login restrictions are the only things left that I can image

So I asked the person that manages that for my company and set most of our subscriptions up and he said there are no restrictions setup for MSI or SPs.

We have also submitted an Azure Help Desk ticket since our company has full MS Support but we have not heard back yet.

I just got done setting up a Test case where I have KeyVault CSI SecretProviderClass and Keda using the same Managed Identity and both using the same serviceaccount and thus the same federated credentials. The KeyVault CSI works just fine and KEDA is getting the above error message. So I know the cluster is successfully pulling the OIDC for that Federated Credential.

In my POD that is from the Deployment with the ScaledObject I see a constant stream of error messages not from my code. I am assuming they are from the KEDA scaledobject since it matches the event in there. But it does not provide me any more info than above.

@JorTurFer
Copy link
Member

In my POD that is from the Deployment with the ScaledObject I see a constant stream of error messages not from my code. I am assuming they are from the KEDA scaledobject since it matches the event in there. But it does not provide me any more info than above.

What info would you like to have? I mean, we will release the next version in a few weeks and any improvement is welcomed. Would you be willing to contribute with this extra info?

@kstedman9
Copy link
Author

Yeah I am not sure what info I would like... This type of problem is a really hard problem to solve.

I will tell you that I have now isolated the issue to the managed identity we are using associated with the KEDA install. Today I was able to hand create a Manage Identity in the same subscription that the Cluster is running in and it allowed KEDA to work properly. The one I was trying to use was generated in our Bicep File that installs the cluster and lots of other resources. I now need to start digging through the Managed Identity to figure out why the one I hand created worked and the one Bicep generated does not. As far as I can tell right now they both have the Same exact RBAC roles setup and they have the same federated credentials setup.

@JorTurFer
Copy link
Member

Would you think that printing things like the used clientId could be useful? As KEDA supports to use other clientIds than the default (just providing them in the TriggerAuthentication), maybe it could help 🤔
Apart from that, I'm not totally sure about what info we have in KEDA side that could help during the troubleshooting, because we are already printing the whole error

@kstedman9
Copy link
Author

Yes I agree. I am not sure what KEDA has that would help from the KEDA side. The chaining is adding a level of complexity that makes debugging somewhat harder. When seeing the error message I am always questioning which one is it talking about the managed identity associated with the keda-operator or the managed identity that is associated with the workload.

I am still confused on why the managed identity works for secret csi but then does not work for KEDA, I do not understand why KEDA works on one cluster than does not work on a second cluster at all with the exact same setup just different subscriptions. All of our setups are using bicep to install the azure resources and we use ArgoCD to install the Kubernetes resources. So the only differences between the KEDA installs is basically the Managed Identity Client ID which I have verified is correct for each cluster.

@JorTurFer
Copy link
Member

JorTurFer commented Jun 2, 2023

Yeah, I agree with the extra complexity of the chained credential. I'd like to add Okteto for improving development experience and also to improving troubleshooting but until that, the only way that I have found to debug scalers with identities is that chained credential :/
Otherwise, you can't debug that authentication way during local development/troubleshooting

@JorTurFer
Copy link
Member

Hi @kstedman9
Have you solved the issue? if yes would you like to share what was happening?

@JorTurFer JorTurFer moved this from To Triage to Pending End-User Feedback in Roadmap - KEDA Core Jun 14, 2023
@kstedman9
Copy link
Author

@JorTurFer We are still waiting on Microsoft Tech Support to figure it out. On one of the clusters, I was able to just recreate the Managed Identity using the same exact bicep to create it and it worked the second time. On a different cluster I have tried recreating the managed identity several times and it still has not fixed it on it. I will update here when I finally get any resolution from them.

@JorTurFer
Copy link
Member

It'd be nice!
As KEDA maintainer I'm interested, but as KEDA user who is integrating managed identities and workload identity everywhere, I'm really interested xD

@kstedman9
Copy link
Author

kstedman9 commented Aug 1, 2023

So we finally figured it out. The workloads managed identity being used must have the federated credentials for KEDA Service Account in it along with the workload required federated credentials. Not sure why this is the case. I just happen to have it on all my test cases but when it came to using it on the production clusters I did not do that.

@JorTurFer
Copy link
Member

Oh, I thought that you already federated them. This is mandatory because the working way of workload identity federation is based on federating k8s OIDC with Azure AD, this means that you have to register in Azure the cluster OIDC to trust on it, but also you have to identify the service account because it's part of the token from k8s (it's the subject).
KEDA can just access to its own service account token (with its subject), if you don't add KEDA's service account to the managed identity, KEDA can't login as it because AAD doesn't trust on KEDA.
Is it now clearer?

@kstedman9
Copy link
Author

kstedman9 commented Aug 1, 2023

Actually no. Because of chaining of identities this should not be required correct? The managed identity associated with the keda service account has the federated credential in it. Why would I then need to add that same federated credential to every workloads managed identity also?

I do not have to do this for other ones like KeyVault CSI.

@JorTurFer
Copy link
Member

KEDA service account has to be federated with all the identities that KEDA has to use. During the "login" process in AAD, KEDA uses its own k8s token, that is generated for KEDA's service account. It's not enough just federating the workloads, because KEDA doesn't have access to those workloads tokens (they are mounted directly into those pods).

If you are using a single identity for KEDA, and it's that identity who has the accesses, you just need to federate KEDA with that, but if you are using different managed identities (overriding the default value in the TriggerAuthentication) to reuse workloads' managed identities for KEDA, KEDA service account must be federated there too.

It depends on how you are managing the identities and the accesses, but KEDA must be federated with any managed identity that is used during KEDA operation.

Let's put some examples

Case 1

You have an identity for KEDA, with access to ServiceBus A, ServiceBus B and ServiceBus C, also you have other identities for the workloads, so you have:

  • KEDA identity with access to ServiceBus A, ServiceBus B and ServiceBus C (identity set during installation and not overrided)
  • Workload A identity with access to Service Bus A
  • Workload B identity with access to Service Bus B
  • Workload C identity with access to Service Bus C

In this case, KEDA has to be federated only with KEDA msi.

Case 2

To avoid stacking too many permissions on KEDA identity, you have an identity for KEDA with any access (not related, for example, to the Key Vault), also you have other identities for the workloads, so you have:

  • KEDA identity without access to any Service Bus
  • Workload A identity with access to Service Bus A
  • Workload B identity with access to Service Bus B
  • Workload C identity with access to Service Bus C

In this case, you are overriding the default identity set during the installation thanks to the option in TriggerAuthentication (.spec.podIdentity.identityId), so each ScaledObject uses its own TriggerAuthentication, and each TriggerAuthentication specifies an override (the TriggerAuthentication for the workload A, sets the idntityId of the workload A, the B for the B, etc...).
Thanks to this, you don't need to stack a lot of accesses on KEDA identity, but in this case, KEDA has to be federated with all the identities that KEDA will try to assume.

  • TriggerAuthentication without overrides will use KEDA identity (for accessing the Key Vault for example)
  • TriggerAuthentications with overrides will use the identity set as part of TriggerAuthentication (so KEDA's service account must be federated on them).

Is it clearer now? I guess that your case is the case 2, am I right?

@kstedman9
Copy link
Author

Yes Case 2 is what we are using. Yes I understand more now. Is there anyway this can be documented better in the KEDA docs?

@stale
Copy link

stale bot commented Sep 30, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale All issues that are marked as stale due to inactivity label Sep 30, 2023
@stale
Copy link

stale bot commented Oct 7, 2023

This issue has been automatically closed due to inactivity.

@stale stale bot closed this as completed Oct 7, 2023
@github-project-automation github-project-automation bot moved this from Pending End-User Feedback to Ready To Ship in Roadmap - KEDA Core Oct 7, 2023
@zroubalik
Copy link
Member

KEDA service account has to be federated with all the identities that KEDA has to use. During the "login" process in AAD, KEDA uses its own k8s token, that is generated for KEDA's service account. It's not enough just federating the workloads, because KEDA doesn't have access to those workloads tokens (they are mounted directly into those pods).

If you are using a single identity for KEDA, and it's that identity who has the accesses, you just need to federate KEDA with that, but if you are using different managed identities (overriding the default value in the TriggerAuthentication) to reuse workloads' managed identities for KEDA, KEDA service account must be federated there too.

It depends on how you are managing the identities and the accesses, but KEDA must be federated with any managed identity that is used during KEDA operation.

Let's put some examples

Case 1

You have an identity for KEDA, with access to ServiceBus A, ServiceBus B and ServiceBus C, also you have other identities for the workloads, so you have:

* KEDA identity with access to ServiceBus A, ServiceBus B and ServiceBus C (identity set during installation and not overrided)

* Workload A identity with access to Service Bus A

* Workload B identity with access to Service Bus B

* Workload C identity with access to Service Bus C

In this case, KEDA has to be federated only with KEDA msi.

Case 2

To avoid stacking too many permissions on KEDA identity, you have an identity for KEDA with any access (not related, for example, to the Key Vault), also you have other identities for the workloads, so you have:

* KEDA identity without access to any Service Bus

* Workload A identity with access to Service Bus A

* Workload B identity with access to Service Bus B

* Workload C identity with access to Service Bus C

In this case, you are overriding the default identity set during the installation thanks to the option in TriggerAuthentication (.spec.podIdentity.identityId), so each ScaledObject uses its own TriggerAuthentication, and each TriggerAuthentication specifies an override (the TriggerAuthentication for the workload A, sets the idntityId of the workload A, the B for the B, etc...). Thanks to this, you don't need to stack a lot of accesses on KEDA identity, but in this case, KEDA has to be federated with all the identities that KEDA will try to assume.

* TriggerAuthentication without overrides will use KEDA identity (for accessing the Key Vault for example)

* TriggerAuthentications with overrides will use the identity set as part of TriggerAuthentication (so KEDA's service account must be federated on them).

Is it clearer now? I guess that your case is the case 2, am I right?

@JorTurFer would be nice to add this to documentation.

@JorTurFer
Copy link
Member

JorTurFer commented Oct 9, 2023

We could add this the the docs an example, but honestly, I'd not do it because it's how the workload identity works itself, if MSFT changes something, we'll be updated. This isn't related with KEDA but any workload in general (at least in my mind, it's directly related with the federation concept), but as I said, we can added a reference to this comment or directly copy and paste this into docs.
Would this be enough @zroubalik ?

@zroubalik
Copy link
Member

Gotcha. Can we at least link relevant Msft MSFT?

@JorTurFer
Copy link
Member

JorTurFer commented Oct 9, 2023

Yeah, we can add the whole example from my comment too. I mean, the concept is clear without entering in implementation details, but this is strongly attached to how Azure Workload Identity works, for example, the same idea in AWS side, works in a totally different way

@zroubalik
Copy link
Member

Yeah, but I mean the more docs the better :) So I think that adding this with a note that it works with Azure Workload Identity isn't going to do any harm.

@JorTurFer
Copy link
Member

kedacore/keda-docs#1247

@thepaulmacca
Copy link

thepaulmacca commented Nov 8, 2023

Can this be reopened?

I'm trying to understand a bit more about how this works with workload identity, especially when using the AKS add-on that's now GA

Am I safe to assume that for this to work as you're describing (using this scenario for example), we still need to install KEDA via the helm chart - as we then have an identity to use for role assignments (as noted in the above service bus scenario)?

I think something's just not computing with me on how the role assignments should be configured, either with the helm chart or AKS add-on

@JorTurFer
Copy link
Member

JorTurFer commented Nov 21, 2023

You have to assign the role, via UI, az-cli, or however you prefer: https://github.com/kedacore/sample-dotnet-worker-servicebus-queue/blob/main/workload-identity.md#creating-a-new-azure-service-bus-namespace--queue

❯ az role assignment create --role 'Azure Service Bus Data Receiver' --assignee --scope /subscriptions//resourceGroups//providers/Microsoft.ServiceBus/namespaces/

about AKS Add-on, I have no idea about how it works, I assume that it creates a Managed Identity for KEDA somehow, but that part has to be asked there as we don't have any control/information about that. It's provided by Microsoft

If you have any question about the concept, you can ask here (or open other issue/discussion)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working stale All issues that are marked as stale due to inactivity
Projects
Archived in project
Development

No branches or pull requests

4 participants