Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests #562

valassi · 2022-12-10T08:40:10Z

This is a second set of patches towards random color/helicity and towards a backport of my patches upstream. It is a followup to #559, that I already merged.

It is based on the second set of upstream patches that I opened upstream, mg5amcnlo/mg5amcnlo#24.

It is still in WIP because

the upstream MR is still WIP, as it has some cosmetic things I want to fix
I am running performance tests on madgraph4gpu with these latest changes

Concerning performance tests, n particular, I observed a bizarre feature: Fortran MEs are now a factor 4 faster, see #561. I am strongly suspecting that this is due to some regression in how I integrated cudacpp, which resurrected issue #419 about LIMHEL. In practice, I suspect that in Fortran I am running four times fewer helicities than in cudacpp, because there are different LIMHEL thresholds? To be checked.

Anyway, I keep this in WIP. I will rerun tput perf tests, and I will investigate the performace changes. Note that this performance regression (Fortran speedup) was already in #559 that I merged.

When these issues are fixed, I will work on a third "|lhe" patch to actually integrate random color/helicity.

…eam novec generation is ok)

…s.f for simplicity

…ded, remove its definition in a COMMON Replace "IF (VECSIZE_USED.LE.1)" by "IF (VECSIZE_MEMMAX.LE.1)" to detect no-vector mode

… was building but gave a segfault)

… VECSIZE_USED is a function argument everywhere This seems to complete the modifications that are necessary to turn VECSIZE into an argument! This is how VECSIZE_USED is currently passed throughout the application: - in driver.f, program DRIVER reads VECSIZE_USED from user inputs - in driver.f, program DRIVER calls FBRIDGECREATE with argument VECSIZE_USED == in fbridge.inc, subroutine FBRIDGECREATE receives it as argume nevt in the cudacpp bridge - in driver.f, program DRIVER calls SAMPLE_FULL with argument VECSIZE_USED > in dsample.f, subroutine SAMPLE_FULL calls SAMPLE_INIT (in dsample.f) with argument VECSIZE_USED == in dsample.f, subroutine SAMPLE_INIT uses argument VECSIZE_USED > in dsample.f, subroutine SAMPLE_FULL calls SELECT_GROUPING with argument VECSIZE_USED == in auto_dsig.f, subroutine SELECT_GROUPING uses argument VECSIZE_USED > in dsample.f, subroutine SAMPLE_FULL calls DSIG_VEC with argument VECSIZE_USED > in auto_dsig.f, subroutine DSIG_VEC calls UPDATE_SCALE_COUPLING_VEC with argument VECSIZE_USED == in reweight.f, subroutine UPDATE_SCALE_COUPLING_VEC uses argument VECSIZE_USED > in auto_dsig.f, subroutine DSIG_VEC calls DSIG_PROC_VEC with argument VECSIZE_USED > in auto_dsig.f, subroutine DSIG_PROC_VEC calls DSIG1_VEC with argument VECSIZE_USED > in auto_dsig1.f, subroutine DSIG1_VEC calls SMATRIX1_MULTI with argument VECSIZE_USED == in auto_dsig1.f, subroutine SMATRIX1_MULTI uses argument VECSIZE_USED

…h vecsize2

… generation

…from upstream vecsize2 codegen)

…d it to patch.common instead (most changes have been backported to upstream vecsize2)

… value in driver.f

…ector.inc ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad

… in vecsize2

…e2 branch

STARTED AT Fri Dec 9 20:32:57 CET 2022 ENDED AT Sat Dec 10 00:35:04 CET 2022 *** NB! A large performance speedup appears in Fortran MEs! madgraph5#561 ***

…ans 1 thread" feature in fortran madevent Previously this was hardcoded only inside the body of check_sa.cc, move it to ompnumthreads.h/cc This should remove the ~factor x4 speedup observed in fortran between nuvecMLM and vecMLM madgraph5#561

… check_sa.cc

…link in P1 (as for timer.h)

…(excluding patch.*)

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad

…y it starts complaining now...)

… for patch.*)

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad

…l to retrieve the number of good helicities

… from fortran matrix1.f

… from cudacpp Bridge in auto_dsig1.f To implement this, add the retrieval of nTotHel from the Bridge Also add a sanity check that this is equal t oNCOMB from Fortran

…IZE_MEMMAX=1

…511a6

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad

…es exist

valassi · 2022-12-11T22:52:40Z

I am running again all tests, but this is essentially ready to be mrged.

I have also added the printout of helicities #563 from both fortran and cudacpp

valassi · 2022-12-12T10:16:20Z

This is now ready to be merged. I will wait for the CI to succeed and then self merge.

valassi · 2022-12-12T10:37:37Z

Al lchecks passed, self merging

valassi added 17 commits December 9, 2022 12:29

[lhe] add --madnovec option to generateAndCompare.sh (make sure upstr…

f030d3d

…eam novec generation is ok)

[lhe] in gg_tt.mad, use VECSIZE_MEMMAX instead of VECSIZE_USED in cut…

0eac80b

…s.f for simplicity

[lhe] in gg_tt.mad add VECSIZE_USED as function argument wherever nee…

626181b

…ded, remove its definition in a COMMON Replace "IF (VECSIZE_USED.LE.1)" by "IF (VECSIZE_MEMMAX.LE.1)" to detect no-vector mode

[lhe] in gg_tt.mad, add one more VECSIZE_USED to function calls (code…

662b41c

… was building but gave a segfault)

[lhe] in CODEGEN move branch.GIT and commit.GIT to new upstream branc…

7616875

…h vecsize2

[lhe] in gg_tt.mad accept the minor changes coming from vecsize2 code…

24d2b8b

… generation

[lhe] in gg_tt.mad, improve the comments in vector.inc (as they come …

04eae6f

…from upstream vecsize2 codegen)

[lhe] in CODEGEN remove vector.inc from replacement patches - will ad…

b45398b

…d it to patch.common instead (most changes have been backported to upstream vecsize2)

[lhe] in gg_tt.mad, keep DATA VECSIZE_USED/VECSIZE_MEMMAX/ as initial…

6faaedc

… value in driver.f

[lhe] regenerate gg_tt.mad - all ok, stable

1ea6c04

[lhe] regenerate all 5 processes mad

0c75279

[lhe] regenerate all 6 processes sa - no change, only fortran changed…

9fc53cf

… in vecsize2

[lhe] in CODEGEN, fix commit.GIT to fc4dfd925 for the upstream vecsiz…

d4d7cac

…e2 branch

[lhe] regenerate gg_tt.mad and check all ok with the new commit.GIT

79dd5ae

[lhe] run 15 tmad allTees with the latest code

f4329da

STARTED AT Fri Dec 9 20:32:57 CET 2022 ENDED AT Sat Dec 10 00:35:04 CET 2022 *** NB! A large performance speedup appears in Fortran MEs! madgraph5#561 ***

valassi self-assigned this Dec 10, 2022

valassi marked this pull request as draft December 10, 2022 08:40

valassi added 10 commits December 10, 2022 18:28

[lhe] rerun 60 tput alltees

ba7dd3d

[lhe] in gg_tt.mad accept clang-format changes in ompnumthreads.h and…

fc5f539

… check_sa.cc

[lhe] in gg_tt.mad move ompnumthreads.h to Subprocesses and add a sym…

38d9f7b

…link in P1 (as for timer.h)

[lhe] madgraph5#561 backport ompnumthreads to CODEGEN from gg_tt.mad …

6054dd3

…(excluding patch.*)

[lhe] in gg_tt,mad rename counters.cpp as counters.cc for consistency

7caaa50

[lhe] in gg_tt.map timermap.h add iomanip for std::setw (not clear wh…

e722de5

…y it starts complaining now...)

[lhe] backport counters.cc renaming from gg_tt.mad to CODEGEN (excpet…

5529e5f

… for patch.*)

valassi added 4 commits December 11, 2022 12:24

[lhe] regenerate gg_tt.mad and check all is stable

44dfe86

[lhe] regenerate all 5 processes mad - git add new files to repo

9954af1

[lhe] regenerate all 6 processes sa - git add new files to repo

b23668c

[lhe] madgraph5#561 add OMP thread printout to madX.sh

5d25acf

valassi mentioned this pull request Dec 11, 2022

Large (x4) speedup in Fortran MEs with Olivier's random helicity/color vecMLM code: disable OMP multithreading #561

Closed

valassi added 11 commits December 11, 2022 22:03

[lhe] madgraph5#563 in gg_tt.mad add a fortran call fbridgegetngoodhe…

d800433

…l to retrieve the number of good helicities

[lhe] madgraph5#563 in gg_tt.mad add a printout of NGOODHEL and NCOMB…

9619d35

… from fortran matrix1.f

[lhe] madgraph5#563 in gg_tt.mad add a printout of NGOODHEL and NCOMB…

0c9c050

… from cudacpp Bridge in auto_dsig1.f To implement this, add the retrieval of nTotHel from the Bridge Also add a sanity check that this is equal t oNCOMB from Fortran

[lhe] in gg_tt.mad, change the comments in vector.inc to mention VECS…

1b51921

…IZE_MEMMAX=1

[lhe] in CODEGEN update commit.GIT to the latest vecsize2 commit 6764…

f4e22ef

…511a6

[lhe] in CODEGEN backport latest changes for cudacpp from gg_tt.mad

2d8c3b9

[lhe] regenerate gg_tt.mad, looks stable

d41825d

[lhe] regenerate 5 processes mad

204f037

[lhe] regenerate 6 processes SA

69e69fe

[lhe] madgraph5#563 in madX.sh print out how many good/total heliciti…

1b0433d

…es exist

This was linked to issues Dec 11, 2022

Large (x4) speedup in Fortran MEs with Olivier's random helicity/color vecMLM code: disable OMP multithreading #561

Closed

In tmad/tput scripts, print out how many helicities are used (helicity filtering) #563

Closed

This was referenced Dec 11, 2022

In tmad/tput scripts, print out how many helicities are used (helicity filtering) #563

Closed

Second set of vector.inc changes for cudacpp integration (make VECSIZE_USED a function argument) mg5amcnlo/mg5amcnlo#24

Merged

valassi marked this pull request as ready for review December 12, 2022 10:14

[lhe] ** COMPLETE LHE PART 2 ** rerun 15 tmad and 60 tput alltees

28d42f9

valassi merged commit 085f022 into madgraph5:master Dec 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests #562

Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests #562

valassi commented Dec 10, 2022

valassi commented Dec 11, 2022

valassi commented Dec 12, 2022

valassi commented Dec 12, 2022

Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests #562

Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests #562

Conversation

valassi commented Dec 10, 2022

valassi commented Dec 11, 2022

valassi commented Dec 12, 2022

valassi commented Dec 12, 2022