Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Several Testnet metrics missing #14

Open
3 tasks done
ShadowFM opened this issue Mar 30, 2024 · 5 comments
Open
3 tasks done

Several Testnet metrics missing #14

ShadowFM opened this issue Mar 30, 2024 · 5 comments

Comments

@ShadowFM
Copy link

Did you read the documentation and guides?

  • I have inspected the documentation.

Is there an existing issue?

  • I have searched the existing issues.

Description of the problem

Since the last Testnet update r-13.0, I have noticed that metrics such as "aleph_Imported", "aleph_Ordering", "aleph_Ordered", "aleph_Aggregating" or "aleph_Finalized" are no longer available.

As a result, the graphs for "Block Time from Import to Finalized" and "AZERO Transactions per Second" are no longer displayed in the Grafana Dashboard.

Which metrics are now required to display the two graphs mentioned for the testnet? Mainnet does not seem to be affected.

For example, this is the query that was previously used to display the "Block Time from Import to Finalized" graph:
aleph_Imported + aleph_Ordering + aleph_Ordered + aleph_Aggregating + aleph_Finalized

I don't know if you really need but here is the log of the current running Testnet Validator:
logs.txt

Information on your setup.

  1. Running on testnet or mainnet?
    It affects the Testnet environment

  2. Version of aleph-node
    0.13.2

  3. How do you run aleph-node - directly, docker, aleph-node-runner?
    Through docker

  4. Is it a validator node or RPC-node?
    It's a validator node

  5. what flags do you run aleph-node with?
    I'm just using -n $VALIDATOR_NAME --ip $IP

  6. operating system
    Ubuntu 20.04

  7. hardware
    AMD Ryzen 9 3900 12-Core Processor, 128GB RAM, 2TB NVMe (RAID1)

Steps to reproduce

Start a Testnet Validator Node at least on Version r-13.0, when evaluating the metrics the mentioned metrics are no longer transmitted.

Did you attach relevant logs?

  • I have attached logs (if relevant).
@ggawryal
Copy link

ggawryal commented Apr 2, 2024

Hi, thanks for opening the issue.
Indeed, in release 13 these metrics are no longer present as they were reorganized a little bit, as a part of a preparation for the slo metrics. Specifically:

  • aleph_Imported is now aleph_timing_imported and is a histogram instead of a gauge. Particularly, to compute the most recent value of it one can use the following query in grafana
    rate(aleph_timing_imported_sum[$__rate_interval]) / rate(aleph_timing_imported_count[$__rate_interval]).
  • aleph_Ordering was replaced with the aleph_timing_proposed histogram,
  • aleph_Ordered replaced with aleph_timing_ordered histogram,
  • aleph_Aggregating was totally removed, as it was always equal to zero or negligibly close to zero,
  • aleph_Finalized became aleph_timing_finalized histogram.
  • Additionally, we've exposed aleph_timing_imported_to_finalized histogram, which measures the relevant time also when a node is not in the validators set for a given session.

Therefore, the two mentioned graphs should be fixed as follows:

  1. Block Time from Import to Finalized should be replaced with rate(aleph_timing_imported_sum[$__rate_interval]) / rate(aleph_timing_imported_count[$__rate_interval]) + rate(aleph_timing_imported_to_finalized_sum[$__rate_interval]) / rate(aleph_timing_imported_to_finalized_count[$__rate_interval])
  2. AZERO Transactions per Second should simply be
    substrate_proposer_number_of_transactions / rate(aleph_timing_imported_to_finalized_count[$__rate_interval]).

@ShadowFM
Copy link
Author

Thanks for the breakdown of the new metrics!
The adaptation to the new metrics of the testnet node produced the desired result

I assume with the mainnet release to 13.x the metrics will also be replaced by the new ones?

@ggawryal
Copy link

Correct, these metrics will change with the mainnet 13.x release exactly in the same way

@k3vmcd
Copy link

k3vmcd commented Apr 22, 2024

Will the dashboard template linked from the documentation be updated?

https://grafana.com/grafana/dashboards/16691-aleph-metrics/

@ggawryal
Copy link

Updated, now it should contain the latest version, compatible with the node release 13.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants