Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests #562

Merged
merged 43 commits into from
Dec 12, 2022

Conversation

valassi
Copy link
Member

@valassi valassi commented Dec 10, 2022

This is a second set of patches towards random color/helicity and towards a backport of my patches upstream. It is a followup to #559, that I already merged.

It is based on the second set of upstream patches that I opened upstream, mg5amcnlo/mg5amcnlo#24.

It is still in WIP because

  • the upstream MR is still WIP, as it has some cosmetic things I want to fix
  • I am running performance tests on madgraph4gpu with these latest changes

Concerning performance tests, n particular, I observed a bizarre feature: Fortran MEs are now a factor 4 faster, see #561. I am strongly suspecting that this is due to some regression in how I integrated cudacpp, which resurrected issue #419 about LIMHEL. In practice, I suspect that in Fortran I am running four times fewer helicities than in cudacpp, because there are different LIMHEL thresholds? To be checked.

Anyway, I keep this in WIP. I will rerun tput perf tests, and I will investigate the performace changes. Note that this performance regression (Fortran speedup) was already in #559 that I merged.

When these issues are fixed, I will work on a third "|lhe" patch to actually integrate random color/helicity.

…ded, remove its definition in a COMMON

Replace "IF (VECSIZE_USED.LE.1)" by "IF (VECSIZE_MEMMAX.LE.1)" to detect no-vector mode
… VECSIZE_USED is a function argument everywhere

This seems to complete the modifications that are necessary to turn VECSIZE into an argument!

This is how VECSIZE_USED is currently passed throughout the application:
- in driver.f, program DRIVER reads VECSIZE_USED from user inputs
- in driver.f, program DRIVER calls FBRIDGECREATE with argument VECSIZE_USED
 == in fbridge.inc, subroutine FBRIDGECREATE receives it as argume nevt in the cudacpp bridge
- in driver.f, program DRIVER calls SAMPLE_FULL with argument VECSIZE_USED
 > in dsample.f, subroutine SAMPLE_FULL calls SAMPLE_INIT (in dsample.f) with argument VECSIZE_USED
  == in dsample.f, subroutine SAMPLE_INIT uses argument VECSIZE_USED
 > in dsample.f, subroutine SAMPLE_FULL calls SELECT_GROUPING with argument VECSIZE_USED
  == in auto_dsig.f, subroutine SELECT_GROUPING uses argument VECSIZE_USED
 > in dsample.f, subroutine SAMPLE_FULL calls DSIG_VEC with argument VECSIZE_USED
  > in auto_dsig.f, subroutine DSIG_VEC calls UPDATE_SCALE_COUPLING_VEC with argument VECSIZE_USED
   == in reweight.f, subroutine UPDATE_SCALE_COUPLING_VEC uses argument VECSIZE_USED
  > in auto_dsig.f, subroutine DSIG_VEC calls DSIG_PROC_VEC with argument VECSIZE_USED
   > in auto_dsig.f, subroutine DSIG_PROC_VEC calls DSIG1_VEC with argument VECSIZE_USED
    > in auto_dsig1.f, subroutine DSIG1_VEC calls SMATRIX1_MULTI with argument VECSIZE_USED
     == in auto_dsig1.f, subroutine SMATRIX1_MULTI uses argument VECSIZE_USED
…d it to patch.common instead

(most changes have been backported to upstream vecsize2)
…ector.inc

./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
STARTED AT Fri Dec  9 20:32:57 CET 2022
ENDED   AT Sat Dec 10 00:35:04 CET 2022

*** NB! A large performance speedup appears in Fortran MEs! madgraph5#561 ***
@valassi valassi self-assigned this Dec 10, 2022
@valassi valassi marked this pull request as draft December 10, 2022 08:40
@valassi valassi changed the title Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests WIP: Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests Dec 10, 2022
…ans 1 thread" feature in fortran madevent

Previously this was hardcoded only inside the body of check_sa.cc, move it to ompnumthreads.h/cc

This should remove the ~factor x4 speedup observed in fortran between nuvecMLM and vecMLM madgraph5#561
./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
…l to retrieve the number of good helicities
… from cudacpp Bridge in auto_dsig1.f

To implement this, add the retrieval of nTotHel from the Bridge
Also add a sanity check that this is equal t oNCOMB from Fortran
./CODEGEN/generateAndCompare.sh gg_tt --mad --nopatch
git diff --no-ext-diff -R gg_tt.mad/Source/dsample.f gg_tt.mad/Source/genps.inc gg_tt.mad/Source/vector.inc gg_tt.mad/SubProcesses/makefile > CODEGEN/MG5aMC_patches/PROD/patch.common
git diff --no-ext-diff -R gg_tt.mad/SubProcesses/P1_gg_ttx/auto_dsig1.f gg_tt.mad/SubProcesses/P1_gg_ttx/driver.f gg_tt.mad/SubProcesses/P1_gg_ttx/matrix1.f > CODEGEN/MG5aMC_patches/PROD/patch.P1
git checkout gg_tt.mad
@valassi
Copy link
Member Author

valassi commented Dec 11, 2022

I am running again all tests, but this is essentially ready to be mrged.

I have also added the printout of helicities #563 from both fortran and cudacpp

@valassi valassi changed the title WIP: Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests Second set of lhe patches (towards random color/helicity): streamline VECSIZE_USED in upstream MG5aMC and rerun performance tests Dec 12, 2022
@valassi valassi marked this pull request as ready for review December 12, 2022 10:14
@valassi
Copy link
Member Author

valassi commented Dec 12, 2022

This is now ready to be merged. I will wait for the CI to succeed and then self merge.

@valassi
Copy link
Member Author

valassi commented Dec 12, 2022

Al lchecks passed, self merging

@valassi valassi merged commit 085f022 into madgraph5:master Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant