-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of the Prometheus endpoint #1669
Conversation
264caad
to
ddff1c9
Compare
a76f424
to
48d59b5
Compare
f90262d
to
e2a7304
Compare
a963833
to
ba25511
Compare
Co-authored-by: Herman Slatman <[email protected]>
Co-authored-by: Herman Slatman <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a few comments.
We should replace the current signing code with internal methods and do a minimal wrapper on those with the metrics. See an example in one of my comments.
For Sign and SignSSH we do not consider errors in authority.Authorize
as errors.
authority/authority.go
Outdated
func (a *Authority) incrProvisionerCounter(prov *provisioner.Interface, err *error, count func(Meter, string, bool)) { | ||
var name string | ||
if p := *prov; p != nil { | ||
name = p.GetName() | ||
} | ||
|
||
count(a.meter, name, *err == nil) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to this method in Authority.Sign
and Authority.SignSSH
does not consider all the error cases. A POST to /sign will call first authority.Authorize
and then, if there's not error authority.Sign
. The authority.Authorize
can fail for multiple reasons, for example, it will fail if the token is not valid. The same will happen for /ssh/sign
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would be authorization errors and not errors that happen while signing the certificate. Happy to consider adding more instruments in the future or reviewing a PR that adds them.
authority/ssh.go
Outdated
func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption) (*ssh.Certificate, error) { | ||
func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption) (cert *ssh.Certificate, err error) { | ||
var prov provisioner.Interface | ||
defer a.incrProvisionerCounter(&prov, &err, Meter.SSHSigned) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we change this and similar methods, it will be very easy to miss this and return a different error. A better way might be to pass a pointer to the certificate instead, the certificate is probably only set in one place, and we can report an error if cert == nil
. We will need to different methods for x509 and ssh and we will have to put set it to nil if store*Certificate
fails, but that's gonna be harder to miss. Most of the errors will happen way before we set the certificate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also clean this up by moving the signing code to internal methods and getting the error from the return values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're already setting the error to nil
when storeCertificate
fails and the errors.Is(err, db.ErrNotImplemented)
.
If we base our decision to instrument based on whether the returned chain is nil, then we lose the ability (in other places, outside the scope of open source) to instrument in more detail (perhaps according to the underlying type of error). We can look into changing the relevant authority.Meter
signatures to be (provisioner.Interface, error)
instead of (string, bool)
in that case. In any case, the error is more important that the value of the returned chain in this case, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whether (*Authority).storeCertificate
should return a nil
error, where it now returns db.ErrNotImplemented
(so that we can avoid checking for the particular exception), is something to consider, in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, I've changed the signature of meter functions to directly accept the provisioner & the error.
authority/ssh.go
Outdated
@@ -146,22 +146,24 @@ func (a *Authority) GetSSHBastion(ctx context.Context, user, hostname string) (* | |||
} | |||
|
|||
// SignSSH creates a signed SSH certificate with the given public key and options. | |||
func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption) (*ssh.Certificate, error) { | |||
func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption) (cert *ssh.Certificate, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After looking at this package's dependents, I am willing to accept a breaking change that accepts a provisioner as a parameter or Sign
and SignSSH
. We can add the provisioner as a return value in authority.Authorize
, and just pass it.
I'm not worried about breaking changes, but an option that won't cause less breaking changes would be to store the provisioner in the context, although we will need to add a context to Sign
or just create the SignContext
method.
In any case, I'm ok to clean these, and have always a context and a provisioner parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do something like this we can kill a few issues at once:
func (a *Authority) SignSSH(_ context.Context, key ssh.PublicKey, opts provisioner.SignSSHOptions, signOpts ...provisioner.SignOption (*ssg.Certificate, err error) {
certs, prov, err = a.signSSH(...)
a.incrProvisionerCounter(prov, err, Meter.SSHSigned)
return certs, err
}
We can keep the current interface, although, as I said, I'm not opposed to changing it. But especially, it will be harder to cause bugs in the future because a PR adds something like if err := foo(); err != nil { ... }
, well and we won't need the pointer to an interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning up sounds nice and I hope that we will. There's quite a few things that could benefit from being worked on in many places.
Having said that, named returns and 'defer' statements aren't uncommon when dealing with instrumented code and I don't see how adding yet another level of indirection is helpful.
If we instrument SignSSH
and not signSSH
then a caller of signSSH
would need to remember to instrument the call.
The pointer to a pointer (interface), could be solved by refactoring the defer statement to be:
defer func() { incrProvisionerCounter(prov, err, a.Meter.SSHSigned) }()
instead of
defer a.incrProvisionerCounter(prov, err, Meter.SSHSigned)
Other than verbosity, there's not a lot of difference to these signatures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"github.com/prometheus/client_golang/prometheus" | ||
"github.com/prometheus/client_golang/prometheus/promhttp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using OpenTelemetry be more flexible and will allow us to add more things in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was discussed and decided against.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it without OT means we don't have to keep up with all the changes happening on that side. I can see us revisiting this sometime in the future when OT is more stabilized, though. It could also be a possibility to support both, if needed.
Want to chime in and say that, while I agree with the sentiment of doing a deeper rewrite of certificates to enable easier instrumentation, I think it's outside the scope of this feature. Let's set aside some time in the future to plan and execute some deeper changes that will make the APIs easier and more intuitive. |
This PR split the main metered methods in two to make it harder to miss metrics in the future.
This PR implements the
http://{metricsAddress}/metrics
endpoint.