-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests #562
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…eam novec generation is ok)
…s.f for simplicity
…ded, remove its definition in a COMMON Replace "IF (VECSIZE_USED.LE.1)" by "IF (VECSIZE_MEMMAX.LE.1)" to detect no-vector mode
… was building but gave a segfault)
… VECSIZE_USED is a function argument everywhere This seems to complete the modifications that are necessary to turn VECSIZE into an argument! This is how VECSIZE_USED is currently passed throughout the application: - in driver.f, program DRIVER reads VECSIZE_USED from user inputs - in driver.f, program DRIVER calls FBRIDGECREATE with argument VECSIZE_USED == in fbridge.inc, subroutine FBRIDGECREATE receives it as argume nevt in the cudacpp bridge - in driver.f, program DRIVER calls SAMPLE_FULL with argument VECSIZE_USED > in dsample.f, subroutine SAMPLE_FULL calls SAMPLE_INIT (in dsample.f) with argument VECSIZE_USED == in dsample.f, subroutine SAMPLE_INIT uses argument VECSIZE_USED > in dsample.f, subroutine SAMPLE_FULL calls SELECT_GROUPING with argument VECSIZE_USED == in auto_dsig.f, subroutine SELECT_GROUPING uses argument VECSIZE_USED > in dsample.f, subroutine SAMPLE_FULL calls DSIG_VEC with argument VECSIZE_USED > in auto_dsig.f, subroutine DSIG_VEC calls UPDATE_SCALE_COUPLING_VEC with argument VECSIZE_USED == in reweight.f, subroutine UPDATE_SCALE_COUPLING_VEC uses argument VECSIZE_USED > in auto_dsig.f, subroutine DSIG_VEC calls DSIG_PROC_VEC with argument VECSIZE_USED > in auto_dsig.f, subroutine DSIG_PROC_VEC calls DSIG1_VEC with argument VECSIZE_USED > in auto_dsig1.f, subroutine DSIG1_VEC calls SMATRIX1_MULTI with argument VECSIZE_USED == in auto_dsig1.f, subroutine SMATRIX1_MULTI uses argument VECSIZE_USED
…from upstream vecsize2 codegen)
…d it to patch.common instead (most changes have been backported to upstream vecsize2)
… value in driver.f
…ector.inc ./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
STARTED AT Fri Dec 9 20:32:57 CET 2022 ENDED AT Sat Dec 10 00:35:04 CET 2022 *** NB! A large performance speedup appears in Fortran MEs! madgraph5#561 ***
valassi
changed the title
Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests
WIP: Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests
Dec 10, 2022
…ans 1 thread" feature in fortran madevent Previously this was hardcoded only inside the body of check_sa.cc, move it to ompnumthreads.h/cc This should remove the ~factor x4 speedup observed in fortran between nuvecMLM and vecMLM madgraph5#561
…link in P1 (as for timer.h)
…(excluding patch.*)
./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
…y it starts complaining now...)
./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
…l to retrieve the number of good helicities
… from fortran matrix1.f
… from cudacpp Bridge in auto_dsig1.f To implement this, add the retrieval of nTotHel from the Bridge Also add a sanity check that this is equal t oNCOMB from Fortran
./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1 git checkout gg_tt.mad
This was
linked to
issues
Dec 11, 2022
I am running again all tests, but this is essentially ready to be mrged. I have also added the printout of helicities #563 from both fortran and cudacpp |
valassi
changed the title
WIP: Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests
Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests
Dec 12, 2022
This is now ready to be merged. I will wait for the CI to succeed and then self merge. |
Al lchecks passed, self merging |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a second set of patches towards random color/helicity and towards a backport of my patches upstream. It is a followup to #559, that I already merged.
It is based on the second set of upstream patches that I opened upstream, mg5amcnlo/mg5amcnlo#24.
It is still in WIP because
Concerning performance tests, n particular, I observed a bizarre feature: Fortran MEs are now a factor 4 faster, see #561. I am strongly suspecting that this is due to some regression in how I integrated cudacpp, which resurrected issue #419 about LIMHEL. In practice, I suspect that in Fortran I am running four times fewer helicities than in cudacpp, because there are different LIMHEL thresholds? To be checked.
Anyway, I keep this in WIP. I will rerun tput perf tests, and I will investigate the performace changes. Note that this performance regression (Fortran speedup) was already in #559 that I merged.
When these issues are fixed, I will work on a third "|lhe" patch to actually integrate random color/helicity.