-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reenable switch profile. #25
base: main
Are you sure you want to change the base?
Conversation
It is possible that we run out of memory trying to build the switch profile. This could be a regression in the compiler. |
@jafingerhut Can you try this locally? |
Yes, will try this locally on a VM today with 2 vcpu + 24gb ram. If it fails due to insufficient ram, that would be pretty bad. Not sure that I will be able to distinguish between that reason and whatever other failure reason might appear. Will let you know. |
FYI, it seems occasionally useful to have commands like these in the CI scripts to tell us what basic hardware resources the CI VMs have:
|
With a VM with 2 VCPUs and 24 GB RAM, I was able to successfully build with this branch. I was also able to start bf_switchd, and press TAB at the prompt, and saw I would recommend that maybe in CI we try to use something like the code I just added to batch-install.sh here: https://github.com/p4lang/open-p4studio/pull/27/files to limit the number of parallel tasks run in hopes of avoiding using too much RAM. Perhaps CI could even run the batch-install.sh script as is? |
7da79e7
to
ad74dc5
Compare
Hmmm the problem is not the compiler build but bf-p4c itself, which is single-threaded. It looks like it has massively increased memory usage. |
We can tweak my changes to limit the parallelism down to as low as 1 parallel job, for the entire build run. Not great in that it increases the elapsed time of the rest of the build, but if it always fails, that is effectively infinite build time, so worse :-) |
6b693f6
to
8fe0e64
Compare
I reduced the number of jobs to 1 but the tests still fail. The problem is not related to parallelism but to increased memory usage in bf-p4c. Also the runners only have 16GB available. It is possible that we had a regression in P4C at some point. @asl pointed out the absurd memory usage with https://github.com/p4lang/p4c/issues?q=is%3Aissue%20state%3Aopen%20label%3Acompiler-performance. Maybe the changes to bdwgc are at fault... let me check. |
@fruffy I do not see particular error message in the log, do you see how the problem is reported? I doubt BWDGC is an issue. It is just bad memory use practices overall in the compiler... |
If it is an out-of-memory issue, unless Github CI has special mechanisms to report them, they tend not to be reported unless you look specially for them. Maybe adding some commands like these at the end of a CI run, if there is a way to predictably run bash commands after one of the earlier build steps fails?
That is from reading some SO answers here [1], but if you have knowledge of better ways to look for these events, let me know. [1] https://stackoverflow.com/questions/624857/finding-which-process-was-killed-by-linux-oom-killer |
The reason I suspect BDWGC is simply that I do not remember the compiler consuming this much memory in the past for the basic switch programs and the only change that I can think of that could impact the allocation behavior at this scale is the garbage collector. |
8fe0e64
to
0374fd5
Compare
Signed-off-by: fruffy <[email protected]>
Signed-off-by: fruffy <[email protected]>
Signed-off-by: fruffy <[email protected]>
0374fd5
to
b43147c
Compare
Check what breaks.