new(scap,pman): add new per-CPU driver metrics #1998

Andreagit97 · 2024-08-07T16:17:12Z

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area libscap-engine-bpf

/area libscap-engine-kmod

/area libscap-engine-modern-bpf

/area libscap

/area libpman

/area tests

Does this PR require a change in the driver versions?

No

What this PR does / why we need it:

This PR introduces new per-CPU stats for our drivers. When collecting some metrics about drops in our drivers, it could be useful to understand where we are dropping. It could be useful to know how drops and events are distributed between our CPUs, whether it is just one CPU under pressure or if the whole system is having a hard time.

This is an example of the output on 8 CPUs with scap-open

[1] n_evts: 88939
[1] n_drops: 0
[1] n_evts_cpu_0: 10326
[1] n_drops_cpu_0: 0
[1] n_evts_cpu_1: 12517
[1] n_drops_cpu_1: 0
[1] n_evts_cpu_2: 11418
[1] n_drops_cpu_2: 0
[1] n_evts_cpu_3: 11937
[1] n_drops_cpu_3: 0
[1] n_evts_cpu_4: 10960
[1] n_drops_cpu_4: 0
[1] n_evts_cpu_5: 6675
[1] n_drops_cpu_5: 0
[1] n_evts_cpu_6: 11195
[1] n_drops_cpu_6: 0
[1] n_evts_cpu_7: 13911
[1] n_drops_cpu_7: 0

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

Signed-off-by: Andrea Terzolo <[email protected]>

github-actions · 2024-08-07T16:17:40Z

Please double check driver/API_VERSION file. See versioning.

/hold

Andreagit97 · 2024-08-07T16:19:31Z

While i was there i tried to uniform the metric collection among drivers at scap level

github-actions · 2024-08-07T16:21:32Z

Perf diff from master - unit tests

     1.94%     -1.10%  [.] sinsp_evt::get_ts
     1.27%     +1.04%  [.] std::_Hashtable<long, std::pair<long const, std::shared_ptr<sinsp_threadinfo> >, std::allocator<std::pair<long const, std::shared_ptr<sinsp_threadinfo> > >, std::__detail::_Select1st, std::equal_to<long>, std::hash<long>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::_M_find_before_node
     4.91%     -0.88%  [.] sinsp_parser::process_event
     5.86%     -0.80%  [.] next
     0.05%     +0.75%  [.] std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, libsinsp::state::dynamic_struct::field_info>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, libsinsp::state::dynamic_struct::field_info> >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node
     3.15%     -0.54%  [.] sinsp_thread_manager::get_thread_ref
     5.85%     -0.53%  [.] sinsp_evt::get_type
     1.82%     -0.44%  [.] sinsp::fetch_next_event
     0.91%     +0.44%  [.] 0x00000000000e8380
     0.72%     -0.40%  [.] sinsp_parser::parse_clone_exit_child

Perf diff from master - scap file

    15.02%     -7.55%  [.] sinsp_filter_check::extract_nocache
    12.52%     -4.48%  [.] sinsp_evt_formatter::tostring_withformat
     7.32%     -3.65%  [.] sinsp_filter_check_thread::extract_single
     7.33%     -3.65%  [.] std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char const*>
     2.57%     +3.63%  [.] sinsp_parser::reset
     2.56%     +2.32%  [.] rawstring_check::extract_single
     2.56%     +2.28%  [.] main
     5.08%     -2.11%  [.] sinsp_evt::load_params
     5.01%     -2.06%  [.] sinsp_filter_check::tostring
     5.01%     -2.06%  [.] sinsp_evt::get_type

Heap diff from master - unit tests

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

Heap diff from master - scap file

peak heap memory consumption: 0B
peak RSS (including heaptrack overhead): 0B
total memory leaked: 0B

codecov · 2024-08-07T16:36:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.08%. Comparing base (5fa87bb) to head (f1d52cc).
Report is 13 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #1998   +/-   ##
=======================================
  Coverage   74.08%   74.08%           
=======================================
  Files         253      253           
  Lines       30766    30766           
  Branches     5395     5388    -7     
=======================================
+ Hits        22793    22794    +1     
+ Misses       7949     7944    -5     
- Partials       24       28    +4

Flag	Coverage Δ
libsinsp	`74.08% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

incertum · 2024-08-07T17:01:58Z

This is very nice @Andreagit97 🚀 . Looking forward to gathering more insights into issues due to possible bursts of events that lead to higher drops. I would expect higher spikes of drops on a subset of CPUs aka not a uniform distribution in such cases.

incertum · 2024-08-08T06:28:31Z

userspace/libpman/src/lifecycle.c

@@ -48,18 +47,6 @@ static void pman_save_attached_progs()
 	g_state.attached_progs_fds[7] = bpf_program__fd(g_state.skel->progs.pf_kernel);
 #endif
 	g_state.attached_progs_fds[8] = bpf_program__fd(g_state.skel->progs.signal_deliver);
-
-	for(int j = 0; j < MODERN_BPF_PROG_ATTACHED_MAX; j++)


@Andreagit97 mind getting me up to speed wrt the reason for changing the logic above to

for(int j = 0; j < MODERN_BPF_PROG_ATTACHED_MAX; j++) { g_state.attached_progs_fds[j] = -1; }

and below we have

for(int j = 0; j < MODERN_BPF_PROG_ATTACHED_MAX; j++) { if(g_state.attached_progs_fds[j] != -1) { nprogs_attached++; } }

Besides this question, LGTM!

Ei! The idea here was to move all the logic inside pman_get_metrics_v2 in this way we could avoid creating a global variable g_state.n_attached_progs and just use a local variable. In the end, the only place where we need this information regarding attached_progs is inside pman_get_metrics_v2. Now the modern ebpf and the legacy one do the same loop in the same place so it should be easily to maintain in the future

incertum

/approve

poiana · 2024-08-08T16:35:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, incertum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [Andreagit97,incertum]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

poiana · 2024-08-08T16:35:25Z

LGTM label has been added.

Git tree hash: 16dee91caa9c259317c0279508cc70080cf0afed

incertum · 2024-08-08T16:35:36Z

/milestone 0.18.0

Andreagit97 added 2 commits August 7, 2024 18:09

new(scap,pman): add new per-CPU metrics

5b55b59

Signed-off-by: Andrea Terzolo <[email protected]>

test: add tests for the new per-CPU metrics

f1d52cc

Signed-off-by: Andrea Terzolo <[email protected]>

poiana added release-note-none kind/feature New feature or request dco-signoff: yes area/libscap-engine-bpf area/libscap-engine-kmod area/libscap-engine-modern-bpf area/libscap area/libpman area/tests size/XL approved labels Aug 7, 2024

poiana requested review from hbrueckner and Molter73 August 7, 2024 16:17

poiana added the do-not-merge/hold label Aug 7, 2024

incertum reviewed Aug 8, 2024

View reviewed changes

incertum approved these changes Aug 8, 2024

View reviewed changes

poiana assigned incertum Aug 8, 2024

poiana added the lgtm label Aug 8, 2024

poiana added this to the 0.18.0 milestone Aug 8, 2024

jasondellaluce approved these changes Aug 19, 2024

View reviewed changes

poiana assigned jasondellaluce Aug 19, 2024

Andreagit97 removed the do-not-merge/hold label Aug 19, 2024

poiana merged commit 18de8ce into falcosecurity:master Aug 19, 2024
46 of 49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new(scap,pman): add new per-CPU driver metrics #1998

new(scap,pman): add new per-CPU driver metrics #1998

Andreagit97 commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

Andreagit97 commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

codecov bot commented Aug 7, 2024 •

edited

Loading

incertum commented Aug 7, 2024

incertum Aug 8, 2024

Andreagit97 Aug 8, 2024

incertum left a comment

poiana commented Aug 8, 2024

poiana commented Aug 8, 2024

incertum commented Aug 8, 2024

new(scap,pman): add new per-CPU driver metrics #1998

new(scap,pman): add new per-CPU driver metrics #1998

Conversation

Andreagit97 commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

Andreagit97 commented Aug 7, 2024

github-actions bot commented Aug 7, 2024

Perf diff from master - unit tests

Perf diff from master - scap file

Heap diff from master - unit tests

Heap diff from master - scap file

codecov bot commented Aug 7, 2024 • edited Loading

Codecov Report

incertum commented Aug 7, 2024

incertum Aug 8, 2024

Choose a reason for hiding this comment

Andreagit97 Aug 8, 2024

Choose a reason for hiding this comment

incertum left a comment

Choose a reason for hiding this comment

poiana commented Aug 8, 2024

poiana commented Aug 8, 2024

incertum commented Aug 8, 2024

codecov bot commented Aug 7, 2024 •

edited

Loading