Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new(driver): add ppc64le support to old bpf and kmod plus CI job #1497

Merged
merged 12 commits into from
Nov 29, 2023

Conversation

mdafsanhossain
Copy link
Contributor

What type of PR is this?

/kind feature

Any specific area of the project related to this PR?

/area build

/area CI

/area driver-kmod

/area driver-bpf

/area driver-modern-bpf

/area libsinsp

/area tests

What this PR does / why we need it:

This PR adds GHA job to build kmod,bpf,modern-bpf on ppc64le (using QEMU) and also run libsinsp unit tests. It is similar to what is currently enabled for s390x here.

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@@ -15,6 +15,7 @@ if(BUILD_BPF)
set(bpf_min_kver_map_x86_64 4.14)
set(bpf_min_kver_map_aarch64 4.17)
set(bpf_min_kver_map_s390x 5.5)
set(bpf_min_kver_map_ppc64le 4.18)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure that eBPF support will also need same fixes as modern eBPF, ie:

I also noticed that we skipped an include here: https://github.com/falcosecurity/libs/blob/master/userspace/libscap/engine/gvisor/parsers.cpp#L35.
I am not sure whether it is needed, but i guess so given we added aarch64 and s390x too (btw gvisor should only be supported on x86_64 atm, since the option is only available on it: https://github.com/falcosecurity/libs/blob/master/cmake/modules/engine_config.cmake#L26)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. I was able to build bpf and kmod out of the box without any changes, so I assumed I did not need to make any other modifications. I will add the fixes you suggested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FedeDP Btw, I noticed these 2 lines point to incorrect links for each architecture.

* https://github.com/libbpf/libbpf/blob/master/src/bpf_tracing.h#L166-L178

* https://github.com/libbpf/libbpf/blob/master/src/bpf_tracing.h#L132-L144

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think libbpf touched those files (we should've linked a fixed hash instead of master). No problem anyway!

@@ -26,6 +27,7 @@ else()
set(kmod_min_kver_map_aarch64 3.16)
set(kmod_min_kver_map_s390x 2.6)
set(kmod_min_kver_map_riscv64 5.0)
set(kmod_min_kver_map_ppc64le 2.6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@FedeDP I was going through this

#if defined(CONFIG_ARM64) || defined(CONFIG_S390) || defined(CONFIG_RISCV)

We need this patch for ppc64 too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure! That depends if ppc64le is hit by the kernel behavior of not sending the clone exit event for the child; at the moment, all supported architectures but x86_64 are hit.
We have drivers tests for it though, eg: https://github.com/falcosecurity/libs/blob/master/test/drivers/test_suites/syscall_exit_suite/clone3_x.cpp#L191
So, if this test is passing fine running on ppc64le, we should not need the CAPTURE_SCHED_PROC_FORK enabled.
The same goes for CAPTURE_SCHED_PROC_EXEC.

cc @Andreagit97

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uhm it seems like ppc64le behaves like x86 since looking at failed tests here #1475 (comment) execve and clone test are passing.

BTW always looking at the tests there is another issue:

[  FAILED  ] SyscallExit.execveatX_correct_exit
[  FAILED  ] SyscallExit.execveatX_execve_exit
[  FAILED  ] SyscallExit.execveatX_success_memfd

it seems that ppc64le correctly returns an execveat exit event when an execveat syscall is called (like s390x and riscv)

#if defined(__s390x__) || defined(__riscv)

Other architectures like x86 instead return an execve instead of an execveat exit event.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @FedeDP @Andreagit97. I reran the test and yes clone3_x is passing so we won't need the patch.

[ RUN      ] SyscallExit.clone3X_father
[       OK ] SyscallExit.clone3X_father (2780 ms)
[ RUN      ] SyscallExit.clone3X_child
[       OK ] SyscallExit.clone3X_child (1139 ms)
[ RUN      ] SyscallExit.clone3X_create_child_with_2_threads
[       OK ] SyscallExit.clone3X_create_child_with_2_threads (1659 ms)
[ RUN      ] SyscallExit.clone3X_child_clone_parent_flag
[       OK ] SyscallExit.clone3X_child_clone_parent_flag (899 ms)
[ RUN      ] SyscallExit.clone3X_child_new_namespace_from_child
[       OK ] SyscallExit.clone3X_child_new_namespace_from_child (1389 ms)
[ RUN      ] SyscallExit.clone3X_child_new_namespace_from_caller
[       OK ] SyscallExit.clone3X_child_new_namespace_from_caller (3079 ms)
[ RUN      ] SyscallExit.clone3X_child_new_namespace_create_thread
[       OK ] SyscallExit.clone3X_child_new_namespace_create_thread (2649 ms)
[ RUN      ] SyscallExit.cloneX_father
[       OK ] SyscallExit.cloneX_father (2619 ms)
[ RUN      ] SyscallExit.cloneX_child
[       OK ] SyscallExit.cloneX_child (1279 ms)

For the execveat , I will add a condition for ppc64le but should that be done with a separate PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried it. That seems to be the offending line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extensive tests!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm it seems strange

	struct list_head *head = &(task->children);
	struct list_head *next_child = (struct list_head *)_READ(head->next);
	if(next_child != head)
	{
		/* We have at least one child, so we need a reaper for it */
		reaper_pid = find_new_reaper_pid(data, task);
	}

if head!= null, next_child cannot be null because in the worst case, it would be next_child==head, BTW we never dereference next_child so it shouldn't be an issue at all 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep the issue is within find_new_reaper_pid :/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened the issue to track the legacy bpf verifier failure: #1521

@@ -362,6 +362,44 @@ jobs:
KERNELDIR=/lib/modules/$(ls /lib/modules)/build make scap-open driver bpf unit-test-libsinsp -j6
./libsinsp/test/unit-test-libsinsp

build-libs-ppc64le:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice idea! Can we use a matrix to merge together this job with the build-libs-s390x one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can. I will try it out.

Copy link
Member

@Andreagit97 Andreagit97 Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a great point, moreover, I just thought that the s390x job takes almost 1 h to run but in the end, we want to check the build of our drivers (modern_bpf, kmod, bpf) why do we need to build sinsp and run tests? In order to not slow down our CI too much I would propose just building the drivers in this CI job, and maybe we can do the same for the s390x, WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big +1 from me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you recommend a separate job for testing sinsp or do we stick to the driver builds for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sticking with the drvers build for now is fine; higher level code rarely needs adjustments for new architecture, therefore we are pretty confident that we cannot break the build of scap/sinsp for a given architecture.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep let's say that breaking the build of the userspace part should never happen and a 1h job at every build seems a high price to pay

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I will modify for both archs. @hbrueckner let me know if this is ok for s390x

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HI @mdafsanhossain

Makes sense. I will modify for both archs. @hbrueckner let me know if this is ok for s390x

@Andreagit97 @FedeDP let's reduce the build time on s390x by focusing on the drivers only.

@FedeDP FedeDP changed the title new(CI): Job to build libs on ppc64le arch new(CI): add ppc64le support to old bpf and kmod plus CI job Nov 24, 2023
@FedeDP
Copy link
Contributor

FedeDP commented Nov 24, 2023

I updated the title to be more precise!
Let's try this for
/milestone next-driver

@poiana poiana added this to the next-driver milestone Nov 24, 2023
@poiana poiana added size/L and removed size/M labels Nov 28, 2023
@FedeDP
Copy link
Contributor

FedeDP commented Nov 28, 2023

Failing tests for each driver:
kmod:

[  FAILED  ] 9 tests, listed below:
[  FAILED  ] SyscallExit.getrlimitX_success
[  FAILED  ] SyscallExit.prlimit64X
[  FAILED  ] SyscallExit.socketcall_sendmsgX_fail
[  FAILED  ] SyscallExit.socketcall_wrong_code_socketcall_interesting
[  FAILED  ] SyscallExit.socketcall_wrong_code_socketcall_not_interesting
[  FAILED  ] SyscallExit.socketcall_null_pointer_and_wrong_code_socketcall_interesting
[  FAILED  ] SyscallEnter.socketcall_wrong_code_socketcall_interesting
[  FAILED  ] SyscallEnter.socketcall_wrong_code_socketcall_not_interesting
[  FAILED  ] SyscallEnter.socketcall_null_pointer_and_wrong_code_socketcall_interesting

bpf: (without reaper_pid part of procexit filler though...)

[  FAILED  ] 8 tests, listed below:
[  FAILED  ] SyscallExit.getrlimitX_success
[  FAILED  ] SyscallExit.prlimit64X
[  FAILED  ] SyscallExit.socketcall_sendmsgX_fail
[  FAILED  ] SyscallExit.socketcall_wrong_code_socketcall_not_interesting
[  FAILED  ] SyscallEnter.socketcall_wrong_code_socketcall_not_interesting
[  FAILED  ] GenericTracepoints.sched_proc_exit_prctl_subreaper
[  FAILED  ] GenericTracepoints.sched_proc_exit_child_namespace_reaper
[  FAILED  ] GenericTracepoints.sched_proc_exit_reaper_in_the_same_group

modern-bpf:

[  FAILED  ] 14 tests, listed below:
[  FAILED  ] SyscallExit.getrlimitX_success
[  FAILED  ] SyscallExit.getsockoptX_SO_RCVTIMEO
[  FAILED  ] SyscallExit.getsockoptX_SO_PASSCRED
[  FAILED  ] SyscallExit.io_uring_setupX
[  FAILED  ] SyscallExit.mlockallX
[  FAILED  ] SyscallExit.prlimit64X
[  FAILED  ] SyscallExit.sendmsgX_fail
[  FAILED  ] SyscallExit.setsockoptX_SO_RCVTIMEO
[  FAILED  ] SyscallExit.setsockoptX_SO_PASSCRED
[  FAILED  ] SyscallExit.socketcall_sendmsgX_fail
[  FAILED  ] SyscallExit.socketcall_getsockoptX_SO_RCVTIMEO
[  FAILED  ] SyscallExit.socketcall_getsockoptX_SO_PASSCRED
[  FAILED  ] SyscallExit.socketcall_setsockoptX_SO_RCVTIMEO
[  FAILED  ] SyscallExit.socketcall_setsockoptX_SO_PASSCRED

@FedeDP
Copy link
Contributor

FedeDP commented Nov 28, 2023

The syscall 'getrlimit' must be successful. Errno: 38 err_message: Function not implemented

Getrlimit is not implemented and test should be skipped. Weird thing is that the __NR_ code exists though.
prlimit64 has the same issue because it calls getrlimit.
I think we should find a way to dynamically skip tests for errno=38 (ENOSYS); i am trying a patch:

--- a/test/drivers/event_class/event_class.cpp
+++ b/test/drivers/event_class/event_class.cpp
@@ -51,6 +51,12 @@ void assert_syscall_state(int syscall_state, const char* syscall_name, long sysc
 {
        bool match = false;

+       if (errno == ENOSYS)
+       {
+               GTEST_SKIP() << "Syscall " << syscall_name << " not implemented"  << std::endl;
+               return;
+       }
+

@FedeDP
Copy link
Contributor

FedeDP commented Nov 28, 2023

Seems to work reliably; added the PR: #1517

@mdafsanhossain
Copy link
Contributor Author

mdafsanhossain commented Nov 28, 2023

Seems to work reliably; added the PR: #1517

Thanks for the patch

@FedeDP
Copy link
Contributor

FedeDP commented Nov 29, 2023

So, what we need here is the CI part, right? Ie: the matrix merging s390x and ppc64le (and stopping at drivers build without testing libsinsp unit tests).

@mdafsanhossain
Copy link
Contributor Author

So, what we need here is the CI part, right? Ie: the matrix merging s390x and ppc64le (and stopping at drivers build without testing libsinsp unit tests).

Yes. I am testing it. I will commit in some time

@FedeDP
Copy link
Contributor

FedeDP commented Nov 29, 2023

Yes. I am testing it. I will commit in some time

No problem, it was just to recap :) take your time! And thank you!

@poiana
Copy link
Contributor

poiana commented Nov 29, 2023

LGTM label has been added.

Git tree hash: 0aebaf20cc5a04cab1334f71d39b1a8dfc11e95f

Andreagit97
Andreagit97 previously approved these changes Nov 29, 2023
Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unfortunately, the build is really slow also without the sinsp tests :/ BTW we can think of something

/approve

@Andreagit97
Copy link
Member

/hold

@Andreagit97
Copy link
Member

waiting for the CI to complete

@mdafsanhossain
Copy link
Contributor Author

Let me add fail-fast: false as the job strategy before merging

@FedeDP
Copy link
Contributor

FedeDP commented Nov 29, 2023

While you are here @mdafsanhossain care to update naming of the jobs from "build-libs-" to build-drivers-? Thanks :D
Wasn't going to bother you with this little change, but since you are already there...

@mdafsanhossain
Copy link
Contributor Author

While you are here @mdafsanhossain care to update naming of the jobs from "build-libs-" to build-drivers-? Thanks :D Wasn't going to bother you with this little change, but since you are already there...

Will do it. No issues :)

@mdafsanhossain mdafsanhossain dismissed stale reviews from Andreagit97 and FedeDP via 5079dae November 29, 2023 14:40
@poiana poiana removed the lgtm label Nov 29, 2023
@poiana poiana requested review from Andreagit97 and FedeDP November 29, 2023 14:40
Copy link
Contributor

@FedeDP FedeDP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana poiana added the lgtm label Nov 29, 2023
@poiana
Copy link
Contributor

poiana commented Nov 29, 2023

LGTM label has been added.

Git tree hash: 92174aa30ca74ec5bdcc889534d01267c54e8b7c

@FedeDP
Copy link
Contributor

FedeDP commented Nov 29, 2023

/unhold

@FedeDP
Copy link
Contributor

FedeDP commented Nov 29, 2023

/hold

@FedeDP
Copy link
Contributor

FedeDP commented Nov 29, 2023

@Andreagit97 didn't see your comment, sorry!

@FedeDP
Copy link
Contributor

FedeDP commented Nov 29, 2023

/unhold

Copy link
Member

@Andreagit97 Andreagit97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@poiana
Copy link
Contributor

poiana commented Nov 29, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Andreagit97, FedeDP, mdafsanhossain

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@poiana poiana merged commit 9677b66 into falcosecurity:master Nov 29, 2023
30 checks passed
@Andreagit97 Andreagit97 changed the title new(CI): add ppc64le support to old bpf and kmod plus CI job new(driver): add ppc64le support to old bpf and kmod plus CI job Jan 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants