Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change of signature for cudacpp #18

Open
oliviermattelaer opened this issue Jun 8, 2022 · 9 comments
Open

change of signature for cudacpp #18

oliviermattelaer opened this issue Jun 8, 2022 · 9 comments

Comments

@oliviermattelaer
Copy link
Owner

current signature is

sigmakin(momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, IEVT)
calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, iEVT)

I do have a couple of question for you @valassi, @roiser:

  1. why sigmakin has the " multi_channel_num, multi_channel_denom" in the signature? I guess that we can remove those (they should not be passed from the cpu/gpu and should be useless after sigmaKin.
  2. Do you mind if I change (also in the fortran side) channelID to be the "real" channel id (not the diagram id)? This will simplify the handling of the color (and reduce memory footprint)
  3. What do you think about the following signature:

sigmakin(momenta, coupling, answer, channelid,random_for_helicity [in],selected_hel[out], random_for_color [in], selected_color [out], IEVT)
calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, jamp2, iEVT)

   -  random_for_helicity  and random_for_color would be a (list) of random number to do the selection
   - selected_hel/selected_color would be the one selected (selection done within sigmaKin
   - jamp2 would be a running sum over the helicity which correspond to the probability of a given color. (so init to zero in sigmakin)

@valassi
Copy link
Collaborator

valassi commented Jun 8, 2022

Hi Olivier, thanks, useful questions

current signature is

sigmakin(momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, IEVT)
calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, iEVT)

I do have a couple of question for you @valassi, @roiser:

1. why sigmakin has the " multi_channel_num, multi_channel_denom" in the signature? I guess that we can remove those (they should not be passed from the cpu/gpu and should be useless after sigmaKin.

Ok this is something I must have added these days for multichannel in madgraph5/madgraph4gpu#465

I was also a bit surprised that I had to do this, but I found this to be the most consistent to the rest of the code

  • the memory buffers for numerators and denominators are allocated (once, at the beginning of the application) by MatrixElementKernels.cc
  • then they are passed around as arguments in every call

This is consistent to what is done for other memory buffers, namely for couplings that are calculated from Gs

  • Buffers that must be visible externally (eg in the Bridge) are allocated even outside MatrixElementKernels, and then they are passed as references to MEK
  • Buffers that are "internal" (this is what you are saying esssentially) would ideally be allocated even within sigmakin for instance or within calculate wavefunction. The problem is that this would give a malloc/free at every iteration, which is not ideal. Or alternatively one has to statically allocate them as in Fortran, but then you need to know the dimensions at build time, while we have several things in cudacpp that we allocate dynamically at runtime.

I am not saying this is the optimal solution. We can certainly reconsider this. But for the moment I would keep it like that...

(Incidentally, this is very vaguely related to madgraph5/madgraph4gpu#356).

2. Do you mind if I change (also in the fortran side) channelID to be the "real" channel id (not the diagram id)? This will simplify the handling of the color (and reduce memory footprint)

Not a problem! Just let me know when you do it, so I need to synchronise the code generation of cudacpp.

I think I actually like it better in the new way you propose!

3. What do you think about the following signature:

sigmakin(momenta, coupling, answer, channelid,random_for_helicity [in],selected_hel[out], random_for_color [in], selected_color [out], IEVT) calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, jamp2, iEVT)

  • random_for_helicity and random_for_color would be a (list) of random number to do the selection
  • selected_hel/selected_color would be the one selected (selection done within sigmaKin
  • jamp2 would be a running sum over the helicity which correspond to the probability of a given color. (so init to zero in sigmakin)

Two points

Currently it is
https://github.com/madgraph5/madgraph4gpu/blob/c2c57cd456b3d1555f04ac7f77f78398f5a8180e/epochX/cudacpp/gg_tt.mad/SubProcesses/P1_gg_ttx/CPPProcess.h#L131

#ifdef __CUDACC__ /* clang-format off */
  __global__ void
  sigmaKin( const fptype* allmomenta,       // input: momenta[nevt*npar*4]
            const fptype* allcouplings,     // input: couplings[nevt*ndcoup*2]
            fptype* allMEs                  // output: allMEs[nevt], |M|^2 final_avg_over_helicities
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
            , fptype* allNumerators         // output: multichannel numerators[nevt], running_sum_over_helicities
            , fptype* allDenominators       // output: multichannel denominators[nevt], running_sum_over_helicities
            , const unsigned int channelId  // input: multichannel channel id (1 to #diagrams); 0 to disable channel enhancement
#endif
            );
#else
  __global__ void
  sigmaKin( const fptype* allmomenta,     // input: momenta[nevt*npar*4]
            const fptype* allcouplings,   // input: couplings[nevt*ndcoup*2]
            fptype* allMEs,               // output: allMEs[nevt], |M|^2 final_avg_over_helicities
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
            fptype* allNumerators,        // output: multichannel numerators[nevt], running_sum_over_helicities
            fptype* allDenominators,      // output: multichannel denominators[nevt], running_sum_over_helicities
            const unsigned int channelId, // input: multichannel channel id (1 to #diagrams); 0 to disable channel enhancement
#endif
            const int nevt );             // input: #events (for cuda: nevt == ndim == gpublocks*gputhreads)
#endif /* clang-format on */

In both cases I would add after channelId (and before nevt when present) the extra arguments you suggest,
random_for_helicity [in],selected_hel[out], random_for_color [in], selected_color [out]

sigmakin(momenta, coupling, answer, channelid,random_for_helicity [in],selected_hel[out], random_for_color [in], selected_color [out], IEVT) calculate_wavefunctions(ihel, momenta, coupling, answer, multi_channel_num, multi_channel_denom, channelid, jamp2, iEVT)

  • random_for_helicity and random_for_color would be a (list) of random number to do the selection
  • selected_hel/selected_color would be the one selected (selection done within sigmaKin
  • jamp2 would be a running sum over the helicity which correspond to the probability of a given color. (so init to zero in sigmakin)

For calculate_wavefunctions, I am rather confused about the algorithm here (and the physics). I need to think about it more. For the moment, in practice I would say I just have this vague thought:

  • you propose to extract from calculate_wavefunction the jamp2 for each helicity, then it is sigmakin that computes a random color
  • I was kind of wondering whether instead one can first toss a random helicity, and then inside calculate_wavefunction for that one give helicity toss a random color (so there would be no need to extract the jamp2 for each helicity, instead you would compute a random color for each helicity and export that, and then depending on the helicity you chose you would also hav ethe choice of color)...

So, summary is, I think you have a much better idea what to do with calculate_wavefunction! Just go ahead, write the algorithm, thehn I will vectorize it when porting it to cudacpp...

[ As mentioned, I realise I need to study a bit more the color algebra. A couple of references fr myself

@oliviermattelaer
Copy link
Owner Author

oliviermattelaer commented Jun 9, 2022

ok thanks i will move in that direction then.

Concerning the color, the madevent algorithms pick the color and helicity independently.
(i.e. we do miss the correlation between the color and the helicity picked if any)

so therefore the helicity "h" is picked with probability

|M_h|^2/(\sum_i |M_i|^2)

and the color "c" is picked according to

Jamp2(c)/(\sum Jamp2)

where

Jamp2(c) = \sum_h Jamp2_h(c) # here sum over helicity

So Jamp2 is at the exact same level as the multichannel numerator/denominator since this is a running sum over helicity.
(but Jamp2 is a list per event and not a number per event)

@oliviermattelaer
Copy link
Owner Author

Now another method (also supported in madevent with nhel=1 in the run_card but not compatible with helicity recycling)
is to evaluate only ONE helicity per event in that case, the helicity picked for each event is determined a priori via a vegas (auto-optimized) grid. The bias in under-picking/over-picking will be automatically adjusted by the weight of the events.

Interestingly:

  • This change the definition of the multi-channel weight (since the weight is helicity dependent now) and therefore the efficiency of the phase-space integrator (no clue in which direction).
  • helicity filtering (with threshold) is not required here (they will be toss-away anyway)
  • This reduces the complexity of the "kernel" but should increase the number of kernel call.
  • We have never really study the efficiency of one method compare to the other within the code. (This mode was implemented for loop-induced where the sum over helicity is very expansive to do)

@valassi
Copy link
Collaborator

valassi commented Jun 9, 2022

ok thanks i will move in that direction then.

Concerning the color, the madevent algorithms pick the color and helicity independently. (i.e. we do miss the correlation between the color and the helicity picked if any)

so therefore the helicity "h" is picked with probability

|M_h|^2/(\sum_i |M_i|^2)

and the color "c" is picked according to

Jamp2(c)/(\sum Jamp2)

where

Jamp2(c) = \sum_h Jamp2_h(c) # here sum over helicity

So Jamp2 is at the exact same level as the multichannel numerator/denominator since this is a running sum over helicity. (but Jamp2 is a list per event and not a number per event)

Hi Olivier, thanks.

What confuses me here is the fact that jamp2(c) is the sum over helicity of jamp2(h,c).

I assume here by the way that jamp2 means this is the square of jamp - or rather, the real square of the norm of the complex number jamp, ie jamp2(h,c)=conj(jamp(h,c))*jamp(h,c) for a given helicity h and color c. Maybe I am wrong here?

Let me try to vaguely clarify. What we do is

  • for each helicity h and color c we calculate jamp(h,c) which is a complex number
  • for each helicity, inc calculate_wavefunction, we calculate ME(h) as sum(c1,c2) conj(jamp(h,c1))*colormatrix(c1,c2)*jamp(h,c2)
  • then the total matrix element is ME = sum(h) ME(h)
  • in other words, ME = sum(c1,c2) colormatrix(c1,c2) [sum(h)conj(jamp(h,c1)*jamp(h,c2)]

What confuses me is that

  • if the colormatrix were the identity delta(c1,c2), then ME would be sum(c) sum(h) conj(jamp(h,c))*jamp(h,c) = sum(h) sum(c) jamp2(h,c) = sum(c) [sum(h) jamp2{h,c)]: however, the color matrix is not delta, so this is not true!
  • the algorithm you described above however does seem to be based on this formula: normalization is based on total probability sum(c) [ sum(h) jamp2(h,c) ], and the probability of one color c is taken to be [ sum(h) jamp2(h,c) ] divided by the previous sum
  • so in a way I have the impression that the color matrix is completely ignored in calculating the probabilities to pick a color?

It is not just "missing the correlation between the color and the helicity picked", it is really that I have the impression that the color picking by itself is based on some approximation? But again, probably I misunderstand...

There is the following discussion in one paper which is maybe related, but probably I do not fully understand that either...
https://arxiv.org/abs/hep-ph/0209271
"The color-flow decomposition nicely lends itself to merging with a shower Monte Carlo,
such as HERWIG [18, 19] or Pythia [20], which is based on the color flow of a given hard-
scattering subprocess. A given color assignment typically has several color flows that con-
tribute. One of these color flows is randomly chosen to be associated with the event, weighted
by the square of the partial amplitude corresponding to that color flow. The weight does
not include a color coefficient (since it is unity), unlike the fundamental-representation de-
composition [10]. The event is then evolved with a shower Monte Carlo. This neglects the
interference between different color flows, but this interference is suppressed by a power of
1/N2. This is not a deficiency of the color-flow decomposition, but rather is an inherent
feature of the shower Monte-Carlo approximation for soft radiation [21]."

@valassi
Copy link
Collaborator

valassi commented Jun 9, 2022

Or maybe actually that paragraph actually is exactly what I am saying above? I mean, the color matrix(c1,c2) is NOT delta(c1,c2), but the color picking algorithm treats it as if it was delta(c1,c2) and neglects the off-diagonal terms, which are suppressed by a power of 1/9 (I guess N=3 here)?

For instance
https://github.com/madgraph5/madgraph4gpu/blob/c2c57cd456b3d1555f04ac7f77f78398f5a8180e/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/CPPProcess.cc#L2369

      static constexpr fptype cf[ncolor][ncolor] = {
        { 512, -64, -64, 8, 8, 80, -64, 8, 8, -1, -1, -10, 8, -1, 80, -10, 71, 62, -1, -10, -10, 62, 62, -28 },
        { -64, 512, 8, 80, -64, 8, 8, -64, -1, -10, 8, -1, -1, -10, -10, 62, 62, -28, 8, -1, 80, -10, 71, 62 },
        { -64, 8, 512, -64, 80, 8, 8, -1, 80, -10, 71, 62, -64, 8, 8, -1, -1, -10, -10, -1, 62, -28, -10, 62 },
        { 8, 80, -64, 512, 8, -64, -1, -10, -10, 62, 62, -28, 8, -64, -1, -10, 8, -1, -1, 8, 71, 62, 80, -10 },
        { 8, -64, 80, 8, 512, -64, -1, 8, 71, 62, 80, -10, -10, -1, 62, -28, -10, 62, -64, 8, 8, -1, -1, -10 },
        { 80, 8, 8, -64, -64, 512, -10, -1, 62, -28, -10, 62, -1, 8, 71, 62, 80, -10, 8, -64, -1, -10, 8, -1 },
        { -64, 8, 8, -1, -1, -10, 512, -64, -64, 8, 8, 80, 80, -10, 8, -1, 62, 71, -10, 62, -1, -10, -28, 62 },
        { 8, -64, -1, -10, 8, -1, -64, 512, 8, 80, -64, 8, -10, 62, -1, -10, -28, 62, 80, -10, 8, -1, 62, 71 },
        { 8, -1, 80, -10, 71, 62, -64, 8, 512, -64, 80, 8, 8, -1, -64, 8, -10, -1, 62, -28, -10, -1, 62, -10 },
        { -1, -10, -10, 62, 62, -28, 8, 80, -64, 512, 8, -64, -1, -10, 8, -64, -1, 8, 71, 62, -1, 8, -10, 80 },
        { -1, 8, 71, 62, 80, -10, 8, -64, 80, 8, 512, -64, 62, -28, -10, -1, 62, -10, 8, -1, -64, 8, -10, -1 },
        { -10, -1, 62, -28, -10, 62, 80, 8, 8, -64, -64, 512, 71, 62, -1, 8, -10, 80, -1, -10, 8, -64, -1, 8 },
        { 8, -1, -64, 8, -10, -1, 80, -10, 8, -1, 62, 71, 512, -64, -64, 8, 8, 80, 62, -10, -28, 62, -1, -10 },
        { -1, -10, 8, -64, -1, 8, -10, 62, -1, -10, -28, 62, -64, 512, 8, 80, -64, 8, -10, 80, 62, 71, 8, -1 },
        { 80, -10, 8, -1, 62, 71, 8, -1, -64, 8, -10, -1, -64, 8, 512, -64, 80, 8, -28, 62, 62, -10, -10, -1 },
        { -10, 62, -1, -10, -28, 62, -1, -10, 8, -64, -1, 8, 8, 80, -64, 512, 8, -64, 62, 71, -10, 80, -1, 8 },
        { 71, 62, -1, 8, -10, 80, 62, -28, -10, -1, 62, -10, 8, -64, 80, 8, 512, -64, -1, 8, -10, -1, -64, 8 },
        { 62, -28, -10, -1, 62, -10, 71, 62, -1, 8, -10, 80, 80, 8, 8, -64, -64, 512, -10, -1, -1, 8, 8, -64 },
        { -1, 8, -10, -1, -64, 8, -10, 80, 62, 71, 8, -1, 62, -10, -28, 62, -1, -10, 512, -64, -64, 8, 8, 80 },
        { -10, -1, -1, 8, 8, -64, 62, -10, -28, 62, -1, -10, -10, 80, 62, 71, 8, -1, -64, 512, 8, 80, -64, 8 },
        { -10, 80, 62, 71, 8, -1, -1, 8, -10, -1, -64, 8, -28, 62, 62, -10, -10, -1, -64, 8, 512, -64, 80, 8 },
        { 62, -10, -28, 62, -1, -10, -10, -1, -1, 8, 8, -64, 62, 71, -10, 80, -1, 8, 8, 80, -64, 512, 8, -64 },
        { 62, 71, -10, 80, -1, 8, -28, 62, 62, -10, -10, -1, -1, 8, -10, -1, -64, 8, 8, -64, 80, 8, 512, -64 },
        { -28, 62, 62, -10, -10, -1, 62, 71, -10, 80, -1, 8, -10, -1, -1, 8, 8, -64, 80, 8, 8, -64, -64, 512 } }; // 2-D array[24][24]

In a way, the algorithm proceeds as if the matrix above only had the 512's on the diagonal, and neglects all other terms (which indeed are one order of magnitude smaller)?

@oliviermattelaer
Copy link
Owner Author

yes that's the point.
The parton-shower wants/need the leading color information (which is the same that N is very large --even if it is only 3--) and therefore you only got a (subset) of the diagonal to consider.

Interference between color is actually something that can not be written on the lhef (due to format limitation)

@valassi
Copy link
Collaborator

valassi commented Jun 9, 2022

ok very good if that's the algorithm then I agree also with your calculate_wavefunction API... essentially you need to add a jamp2 to the signature, which is a pointer to an array with ncolors real numbers per event, and calling it just adds the contribution for the given helicity.

Now
https://github.com/madgraph5/madgraph4gpu/blob/c2c57cd456b3d1555f04ac7f77f78398f5a8180e/epochX/cudacpp/gg_ttgg.mad/SubProcesses/P1_gg_ttxgg/CPPProcess.cc#L91

 // Evaluate |M|^2 for each subprocess
  // NB: calculate_wavefunctions ADDS |M|^2 for a given ihel to the running sum of |M|^2 over helicities for the given event(s)
  // (similarly, it also ADDS the numerator and denominator for a given ihel to their running sums over helicities)
  __device__ INLINE void /* clang-format off */
  calculate_wavefunctions( int ihel,
                           const fptype* allmomenta,      // input: momenta[nevt*npar*4]
                           const fptype* allcouplings,    // input: couplings[nevt*ndcoup*2]
                           fptype* allMEs                 // output: allMEs[nevt], |M|^2 running_sum_over_helicities
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
                           , fptype* allNumerators        // output: multichannel numerators[nevt], running_sum_over_helicities
                           , fptype* allDenominators      // output: multichannel denominators[nevt], running_sum_over_helicities
                           , const unsigned int channelId // input: multichannel channel id (1 to #diagrams); 0 to disable channel enhancement
#endif
#ifndef __CUDACC__
                           , const int nevt               // input: #events (for cuda: nevt == ndim == gpublocks*gputhreads)
#endif
                           )

In practice you need to add one variable, something like

 // Evaluate |M|^2 for each subprocess
  // NB: calculate_wavefunctions ADDS |M|^2 for a given ihel to the running sum of |M|^2 over helicities for the given event(s)
  // (similarly, it also ADDS the numerator and denominator for a given ihel to their running sums over helicities)
  __device__ INLINE void /* clang-format off */
  calculate_wavefunctions( int ihel,
                           const fptype* allmomenta,      // input: momenta[nevt*npar*4]
                           const fptype* allcouplings,    // input: couplings[nevt*ndcoup*2]
                           fptype* allMEs,                // output: allMEs[nevt], |M|^2 running_sum_over_helicities
#ifdef MGONGPU_SUPPORTS_MULTICHANNEL
                           fptype* allNumerators,        // output: multichannel numerators[nevt], running_sum_over_helicities
                           fptype* allDenominators,      // output: multichannel denominators[nevt], running_sum_over_helicities
                           const unsigned int channelId, // input: multichannel channel id (1 to #diagrams); 0 to disable channel enhancement
#endif
                           fptype* allJamp2      // output: [nevt*ncolor], running_sum_over_helicities
#ifndef __CUDACC__
                           , const int nevt               // input: #events (for cuda: nevt == ndim == gpublocks*gputhreads)
#endif
                           )

Then internally I would yet another memory buffer and memory access functions for this new type of data structure...

@oliviermattelaer
Copy link
Owner Author

Hi @valassi, I'm a bit confused about all the assignment in the cudacpp part.
I have branch your plugin (new branch color) where I have started the work to add support for the color.
This is certainly not fully working (and require the latest 3.1.1_lo_vectorization commit) but I guess that either you can start from there for the cpp part and/or that we need to discuss more.
(I think that, it is a bit pointless to continue to support the "builtin" standalone_gpu now...)

I will now concentrate on the change on the madgraph side (including fortran) for the channelID) where I will likely be more usefull than in the cpp part.

@valassi
Copy link
Collaborator

valassi commented Jun 9, 2022

Hi @oliviermattelaer that's fine for me! I think I understood what I need to do on the cudacpp part.

I am a bit behind with a few minor issues though (the move to github, and later the makefile cleanup - maybe I can postpone the latter, but I want to move to github and I need to test other bits of my scripts first).

Go ahead and I will try to catch up
Thanks!
Andrea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants