/stats and /metrics API calls do not return data for functions that have not been called #368

nigeldeakin · 2017-09-27T08:07:26Z

The /stats API call returns a map containing queued/running/completed/failed stats for each function, as well as global totals.

However this does not include functions that have not yet been called. This is no what the user (or the UI tool) would expect. If a function has been created then /stats should return queued/running/completed/failed stats for it. Obviously the values would all be zero until the function was actually called.

nigeldeakin · 2017-09-27T08:07:57Z

Required for fnproject/ui#18

nigeldeakin · 2017-10-06T09:00:19Z

This issue also affects Prometheus metrics.

treeder · 2017-10-06T17:40:10Z

I believe this is because async calls go directly on the queue without hitting the database first. @rdallman can confirm. If that's the case, it would probably make sense to hit the database with "queued" state, or at least send a queued event to stats.

rdallman · 2017-10-06T17:56:00Z

yea, related to #281 and #155

just moving the stats.Queued() call to agent.GetCall will probably do the trick (and then our Stats struct thing doesn't have to leak into the front end as much...)

nigeldeakin · 2017-10-09T13:28:47Z

This isn't about the stats.Queued() call. The problem is that neither the stats structure, nor the data held in the Prometheus client, know anything about functions/routes until they have been called.

To fix this issue we need a new function that is called, once for every route in the database, when the server is started and subsequently whenever a new route is created. This function would update stats and the Prometheus client with initial metrics for that route, with queued, running, completed and failed all set to zero. Writing that function is straightforward, but where would we call it from? I'm looking for some existing code which is executed at startup which reads all the routes from the database.

rdallman · 2017-10-09T19:38:26Z

To fix this issue we need a new function that is called, once for every route in the database, when the server is started and subsequently whenever a new route is created.

since we are planning to use an external aggregator service, i don't think we need to add too much machinery here (at the cost of slight precision loss around fn server failures, which I don't think matters so much).

if we call stats.Queued() from agent.GetCall then in theory that data gets sent out or pulled from statsd / prometheus (respectively) within some polling interval, so I don't think it's worth checking the db really. also, since we're running distributed, on startup we can't really have every fn server add up every queued call in the db, otherwise if there are e.g. 100 queued calls and 3 fn servers restart, prom would pull that 300 are queued, and i don't think we can mix gauge and counter very easily.

nigeldeakin · 2017-10-10T08:51:26Z

This issue isn't about "queued calls". It's whether the Prometheus scraper should receive metrics about routes that exist but which haven't been called (or queued).

It sounds (from discussion here and elsewhere) that the current behaviour is considered OK: Prometheus should only receive information about things that happened since the server was started. So I can close this issue. Thanks for the feedback.

As for calling stats.Queued() from agent.GetCall: why is that better than calling it just once as now, when the call is enqueued? Currently when a call is enqueued the Prometheus client is notified so it can increase its counter. The Prometheus server can then scrape the value of this counter (by calling /metrics) any time it likes. Or is the suggestion that we change this from a counter to a gauge whose value is maintained within the Fn server itself? In any case that change is not related to this issue.

rdallman · 2017-11-16T03:49:30Z

As for calling stats.Queued() from agent.GetCall: why is that better than calling it just once as now, when the call is enqueued?

well, just the positioning I think. GetCall is called before queueing the call to the MQ, so while it sits on the MQ, prometheus will have a counter incremented for it. whereas right now it's in Submit, so only after the call gets picked off the MQ (could be seconds, minutes, hours after it was actually queued) will the counter get incremented. there is some consideration for calls that may get pulled off the MQ multiple times (for reasons of failing previously/timeouts/etc), this is an issue in the current spot as well as in GetCall without certain care. I think originally this is how I interpreted this issue, though now I understand it's something else. in any event, this is also going on.

rdallman · 2018-02-06T01:19:11Z

closing, don't think we need to have zeroed stats for routes that have yet been invoked if i understand correctly

nigeldeakin mentioned this issue Sep 27, 2017

Charts don't show functions which have not been called yet fnproject/ui#18

Open

nigeldeakin changed the title ~~/stats API call does not return data for functions that have not been called~~ /stats and /metrics API calls does not return data for functions that have not been called Oct 9, 2017

nigeldeakin changed the title ~~/stats and /metrics API calls does not return data for functions that have not been called~~ /stats and /metrics API calls do not return data for functions that have not been called Oct 9, 2017

rdallman closed this as completed Feb 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/stats and /metrics API calls do not return data for functions that have not been called #368

/stats and /metrics API calls do not return data for functions that have not been called #368

nigeldeakin commented Sep 27, 2017

nigeldeakin commented Sep 27, 2017

nigeldeakin commented Oct 6, 2017

treeder commented Oct 6, 2017

rdallman commented Oct 6, 2017

nigeldeakin commented Oct 9, 2017 •

edited

Loading

rdallman commented Oct 9, 2017

nigeldeakin commented Oct 10, 2017 •

edited

Loading

rdallman commented Nov 16, 2017

rdallman commented Feb 6, 2018

/stats and /metrics API calls do not return data for functions that have not been called #368

/stats and /metrics API calls do not return data for functions that have not been called #368

Comments

nigeldeakin commented Sep 27, 2017

nigeldeakin commented Sep 27, 2017

nigeldeakin commented Oct 6, 2017

treeder commented Oct 6, 2017

rdallman commented Oct 6, 2017

nigeldeakin commented Oct 9, 2017 • edited Loading

rdallman commented Oct 9, 2017

nigeldeakin commented Oct 10, 2017 • edited Loading

rdallman commented Nov 16, 2017

rdallman commented Feb 6, 2018

nigeldeakin commented Oct 9, 2017 •

edited

Loading

nigeldeakin commented Oct 10, 2017 •

edited

Loading