Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweaks #5

Merged
merged 6 commits into from
Apr 14, 2022
Merged

Tweaks #5

merged 6 commits into from
Apr 14, 2022

Conversation

mohawk2
Copy link
Collaborator

@mohawk2 mohawk2 commented Apr 10, 2022

While pursuing a segfault caused by latest PDL, I've discovered a couple of possible improvements. The second commit switches to initialise missingVal before it gets used.

@coveralls
Copy link

coveralls commented Apr 10, 2022

Pull Request Test Coverage Report for Build 2146365966

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 13 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.2%) to 50.198%

Files with Coverage Reduction New Missed Lines %
/home/runner/work/PDL-CCS/PDL-CCS/t/common.plt 4 72.41%
/home/runner/work/PDL-CCS/PDL-CCS/CCS/t/05_binops.t 9 89.41%
Totals Coverage Status
Change from base Build 1865402927: -0.2%
Covered Lines: 1013
Relevant Lines: 2018

💛 - Coveralls

Makefile.PL Show resolved Hide resolved
@moocow-the-bovine moocow-the-bovine merged commit ef9df4f into moocow-the-bovine:master Apr 14, 2022
@moocow-the-bovine
Copy link
Owner

Thanks for PR -- especially the extensive test cleanup! Merged to master & uploaded to CPAN as v1.23.19

@mohawk2 mohawk2 deleted the tweaks branch April 15, 2022 08:33
@mohawk2
Copy link
Collaborator Author

mohawk2 commented Apr 15, 2022

Very welcome! As mentioned above, I was investigating a segfault caused by a PDL change, and I'd be interested in your thoughts, since it's triggered (as a segfault at least) only by this module.

The issue there is triggered by very specific circumstances: the output ndarrays' badflag getting set, when the output ndarray was a converted one (using converttypei_new, a dataflowing transform) from a passed-in ndarray (in the case of this module, a zeroes one, being passed in as a way to communicate an index size, instead of using either RedoDimsCode or simply an OtherPars). This all works due to ccs_accum_nbad writing its output into the converted ndarray, which then gets changed called on it, which triggers a writeback to the passed-in output ndarray.

PDL, when destroying a transform, currently destroys all associated ndarrays that don't have parents or children or, crucially, a true sv member. A null (which is the output end of the converttypei_new transform) currently gets connected to a "mortal" SV, which means on the next Perl garbage-collect (returning from the XS ccs_accum_nbad) it will then get destroyed, and no memory is leaked. However, and this is a sign of a real but rarely-triggered current problem, (see also PDLPorters/pdl#217), that output ndarray has two real parents (the convert, and the CCS transform) but only registers one (since currently PDL only has a concept of one trans_parent). This reveals a minor weakness of the idea of a "parent" when it comes to data-flowing transforms.

The aim of the PDL change, pursuing the aim in PDLPorters/pdl#358 (specifically point 5), is to stop using makescratchhash which attaches the mortal SV as above. It wasn't clear to me before this investigation its particular role in this. Without an attached SV, on leaving make_trans_mutual, the unattached ndarrays get destroyed, including their struct pdl actually being freed. The rest of the function, before returning, updates badflag in output ndarrays, and if they have been freed this is incorrect and can cause a segfault. To illustrate this with a concrete example, ccs_accum_nbad signature:

Signature: (
  indx ixIn(Ndims,NnzIn);
  nzvalsIn(NnzIn);
  missing();
  indx N();
  indx [o]ixOut(Ndims,NnzOut);
  int+ [o]nzvalsOut(NnzOut);
  indx [o]nOut();
  )

The inputs passed in to the PP transform:

(PDL: Indx D [1,16])
(PDL: Indx D [16]) # pdl 1 (it's 0-based), type Indx = 6
(PDL: Double D [1])
(a Perl scalar) # pdl 3, gets turned into type SByte = 0 since the value of 5 will fit in a signed byte
(PDL: Indx D [1,16]) # first output ndarray
(PDL: Indx D [16]) # pdl 5, type Indx = 6 
(PDL: Indx D [])

Annotated excerpt from the output of perl -Mblib CCS/t/03_ufuncs.t when PDL::Core::set_debugging(1) is in effect:

pdl_type_coerce (PDL::CCS::Ufunc::ccs_accum_nbad) pdl=1 from 6 to 10 # 10 is a Double
pdl_type_coerce (PDL::CCS::Ufunc::ccs_accum_nbad) pdl=3 from 0 to 6 # 6 is an Indx
pdl_type_coerce (PDL::CCS::Ufunc::ccs_accum_nbad) pdl=5 from 6 to 10

The problem here arises because ndarray 5 (nzValsOut) is int+, which might be a mistake (without having dug into this yet, it looks to me like it should be identical in type to nzValsIn) but let us consider the practical outcome as it is. An x+ specifier means if the transform's "type" is less than x, it will be x. If the transform is more (and here, it is - it's a double), then the ndarray must be upgraded to the transform's type (implemented with this, if one is interested: https://github.com/PDLPorters/pdl/blob/dde57890aff9b52325cd5436dfc20bdba6d3f135/Basic/Gen/PP/PdlParObj.pm#L177-L187) - C in general, and in particular the C generated by PDL, cannot accommodate the ndarray's values (and specifically, the data pointer) being "somewhere between the minimum of int and the current type of double" - that would need a further embedded "generic switch", leading to a ridiculous explosion of generated code.

Options within PDL:

  • create a sort of double-headed modification of converttypei_new such that the output ndarray of ccs_accum_nbad was an input to it, rather being an output of two things which is fairly obviously a conceptual mistake
  • make it be an error to pass as an output ndarray something with a trans_parent (it probably should be anyway) - with that, it would be possible insert a normal flowing convert transform as the parent of a passed-in output ndarray if needed, though the badflag-updating situation would still need thought
  • re-reading the generated C code of pdl_ccs_accum_nbad_run (which calls make_trans_mutual, then does the badflag updating), the C variable used to update the badflag is assigned to e.g. the [5] member of the trans.pdls after type_coerce, but before make_trans_mutual; this means the badflag-setting (which is done by twiddling a bit in the struct pdl, it's very low-level) is done on the ndarray that gets destroyed immediately after, not on the one passed in; this is a bug that can be fixed without breaking ABI by allocating extra space in the trans.pdls pointer-array for the (potentially converted and therefore ephemeral) outputs, and putting the convert-ees in that extra space, and putting the input ndarrays in the expected places so their badflags can be correctly updated - no ABI change needed
  • have more formal transform and ndarray lifecycle "hooks" (would need an ABI update which I like to avoid)
  • make badflag propagation more automatic, and thereby making this manual updating in the generated C code unnecessary (ABI update again)

Probably desirable actions for PDL::CCS, but only to be done after the PDL issues are fixed (which I don't want to mask by fixing the symptom here first):

  • pass in null as the output params where possible (such as where @nnzOut has a value - zeroes is not necessary)
  • investigate whether nzvalsOut even needs the type-specifier at all, or if it does (for the quantity ones like nbad) it should be indx+
  • investigate whether it ever makes sense to have NnzOut different from NnzIn, which if not would allow dropping more code by just setting nulls as output params where not given

@moocow-the-bovine
Copy link
Owner

moocow-the-bovine commented Apr 16, 2022

Whoof. Looks like you had some detective work tracking that down -- sorry if PDL::CCS if causing undue trouble. Without fully understanding all of what's going on, I'm not sure how meaningful my input will be. One thing in particular jumps out at me immediately though:

Signature: (
  indx ixIn(Ndims,NnzIn);
  nzvalsIn(NnzIn);
  missing();
  indx N();
  indx [o]ixOut(Ndims,NnzOut);
  int+ [o]nzvalsOut(NnzOut);
  indx [o]nOut();
  )

The inputs passed in to the PP transform:

(PDL: Indx D [1,16])
(PDL: Indx D [16]) # pdl 1 (it's 0-based), type Indx = 6
(PDL: Double D [1])

This is probably wrong (bad test design, my fault): pdl 2 (a.k.a. $missing()) as used by PDL::CCS::Nd is always a slice of the same parent as $nzvalsIn ($nzvalsIn=$vals->slice("0:-2"); $missing=$vals->slice("-1");). This is klunky, but it ensures that PDL's builtin dim-checking matches up NnzIn correctly, since I can't (easily) specify a signature like:

  indx ixIn(Ndims,NnzIn);
  valsIn(NnzIn+1);

While I can imagine cases where $missing and $nzvalsIn do not share a common parent, I don't think it ever makes sense for them to be of distinct types.

(a Perl scalar) # pdl 3, gets turned into type SByte = 0 since the value of 5 will fit in a signed byte
(PDL: Indx D [1,16]) # first output ndarray
(PDL: Indx D [16]) # pdl 5, type Indx = 6 
(PDL: Indx D [])

Annotated excerpt from the output of perl -Mblib CCS/t/03_ufuncs.t when PDL::Core::set_debugging(1) is in effect:

pdl_type_coerce (PDL::CCS::Ufunc::ccs_accum_nbad) pdl=1 from 6 to 10 # 10 is a Double
pdl_type_coerce (PDL::CCS::Ufunc::ccs_accum_nbad) pdl=3 from 0 to 6 # 6 is an Indx
pdl_type_coerce (PDL::CCS::Ufunc::ccs_accum_nbad) pdl=5 from 6 to 10

I think the coercions from indx to double (6->10) on pdls 1 and 5 ($nzvalsIn, $nzvalsOut) shouldn't be happening here at all. What I'd really like to be able do for this (and other) methods is to specify something like type-variables in the signature, and let some cunning & devious PDL magic take care of matching them up & promoting as necessary (unification a la PROLOG, ML, or Haskell). As far as I know, that currently works only for a single (implicit) type, which is fine for arithmetic operations, but breaks down when (as here) we have 2 different (variable) types in our signature. Concretely, what this function wants are type variables IndxType and ValueType, leading to a signature like:

 Signature: (
   IndxType     ixIn      (Ndims, NnzIn);
   ValueType    nzvalsIn  (       NnzIn);
   ValueType    missing   ();
   IndxType     N         ();
   IndxType  [o]ixOut     (Ndims, NnzOut);
   IndxType  [o]nzvalsOut (       NnzOut);
   IndxType  [o]nOut      ();
   )

Lacking that kind of functionality (and I do realize that it's a tall order which is probably overkill for 99% of PDL applications: I certainly wouldn't want to write it), I settled for nailing down IndxType (currently to indx, originally to long, inbetween to ccs_indx which was either long or indx depending on compile-time parameters).

The problem here arises because ndarray 5 (nzValsOut) is int+, which might be a mistake (without having dug into this yet, it looks to me like it should be identical in type to nzValsIn)

nzValsOut is a counter here, so it should always be an integer type. Since its value can never exceed the dimension size, it can be safely pegged to IndxType, i.e. indx (and probably should). I think the int+ specifier was modeled on an older version of PDL's nbadover(), the idea being that PDL::CCS::Nd objects signatures should be the same as those of the corresponding methods for dense PDLs. I see that PDL::nbadover() currently has Signature: (a(n); indx [o] b()), so the CCS version should be updated accordingly too.

... so I see 2 concrete things going wrong already in PDL::CCS here:

  1. type-mismatch between (indx != double) input ndarrays $nzValsIn and $missing
  2. over-general type (int+ != indx) for output ndarray $nzValsOut

I'm not sure to what extent (if any) fixing these would help to avoid the segfault behavior you're seeing, but I'm certainly willing to fix them.

[...] that would need a further embedded "generic switch", leading to a ridiculous explosion of generated code.

I suspect my vision of "type variables" would lead to a similarly ridiculous explosion of code, and should probably be avoided for that reason.

Options within PDL:
[...]

I'm still struggling to get my head around these, and don't have any immediate intuitions. I do agree that it's preferable to avoid ABI changes if possible.

Probably desirable actions for PDL::CCS, but only to be done after the PDL issues are fixed (which I don't want to mask by fixing the symptom here first):

Understood. Please let me know when it's OK for me to fix the type-mismatch in ccs_accum_nbad() and its tests.

  • pass in null as the output params where possible (such as where @nnzOut has a value - zeroes is not necessary)

Probably a good thing, but I'm hesitant. I'm certain™ that I originally introduced the zeroes() calls into PMCode for a reason - if the "usual trick" of using nulls had worked, I would have left things that way. Whatever that original reason was (maybe it was finding/defaulting NnzOut?), it may have been fixed by now so that some or all of the zeroes() calls can now be omitted. The idea of (not) finding out which ones those are (or aren't) makes me nervous (touching running systems & all).

  • investigate whether nzvalsOut even needs the type-specifier at all, or if it does (for the quantity ones like nbad) it should be indx+

It needs it here for nbadover(), but as noted above it should now just be indx. Most other (arithmetic) CCS ufuncs omit the output type specifier (actually the out_type keyword argument to ccs_accum_hash(), and handled in the generated PMCode) so that nzvalsOut gets the same type as nzvalsIn.

  • investigate whether it ever makes sense to have NnzOut different from NnzIn, which if not would allow dropping more code by just setting nulls as output params where not given

For the high-level PDL::CCS::Nd wrappers, I think it's always the case that NNzOut==NnzIn, so it doesn't make sense there. I can imagine cases with structured data where it does make sense though, and it could be wasteful (especially for ufuncs) to force NNzOut==NnzIn there, so I'm inclined to leave those dimensions distinct.

Thanks again for the detective work!

@moocow-the-bovine
Copy link
Owner

Created #6 so that the CCS-specific bugs don't get buried here.

@mohawk2
Copy link
Collaborator Author

mohawk2 commented Apr 16, 2022

As noted on the relevant PDL commit (PDLPorters/pdl@bcd4ea3), reproduced here:

The tests here (particularly in t/bad.t) capture the error found in the discussion of #5 (and fix the most-important problem described therein) - the passed-in float-typed output was (as in current PDL::CCS with currently-released PDL) not getting its badflag set due to the wrong output ndarray getting its badflag set (and not propagated) by the CopyBadStatusCode in the ccs_accum_* operations.

Spelling the point out again (because there are many points under discussion here!): PDL::CCS was doing something a bit funky, but exposed a genuine problem with propagation of badflags, and also revealed the very clever (but tricky) feature of PDL that you can pass in a differently-typed ndarray to an operation, and get the results back, correctly converted (including bad values). If I'd thought it was only a problem in CCS (and there are a couple of improvable things), I would just have fixed them in this PR :-)

The motivation behind all of this, as touched on above, is to smooth out all the quirks and wrinkles in the current PDL implementation, to allow loop-fusion (PDLPorters/pdl#349), which I think will actually be possible fairly soon. This will still need eliminating more of those quirks, but without breaking downstream modules more than the absolute minimum (and hopefully only where they had genuine mistakes).

@mohawk2
Copy link
Collaborator Author

mohawk2 commented Apr 16, 2022

Options within PDL (changed to numbered for clarity):

  1. create a sort of double-headed modification of converttypei_new such that the output ndarray of ccs_accum_nbad was an input to it, rather being an output of two things which is fairly obviously a conceptual mistake

Very tricky, and with the single-parent model we have, might be impossible.

  1. make it be an error to pass as an output ndarray something with a trans_parent (it probably should be anyway) - with that, it would be possible insert a normal flowing convert transform as the parent of a passed-in output ndarray if needed, though the badflag-updating situation would still need thought

This would break $slice++, so is not a valid option. The restriction in pdlapi.c is to disallow output ndarrays with flow, or output ndarrays with any parent with dataflow. The first condition triggers in a similar context to this bit of t/thread.t:

my $pa = pdl 2,3,4;
$pa->doflow;
my $pb = $pa + $pa;
$pa->set(0,50);

This operates correctly and changes $pb - note, specifically set is used (implemented in Core.xs's set_c) which is a sort of micro-transform that calls PDL.set to mutate the data, then PDL.changed to trigger flow. However, if one tries to use a normal mutating transform on $pa such as $pa++ (which obviously uses the ndarray as an output as well as an input), the condition triggers. This is the stuff described in PDL::Dataflow, but not really implemented (as alluded to in PDLPorters/pdl#324). The restriction allows non-flowing operations on slices, so does not impede the idea of loop-fusion, which requires lazy evaluation, not dataflow.

  1. re-reading the generated C code of pdl_ccs_accum_nbad_run (which calls make_trans_mutual, then does the badflag updating), the C variable used to update the badflag is assigned to e.g. the [5] member of the trans.pdls after type_coerce, but before make_trans_mutual; this means the badflag-setting (which is done by twiddling a bit in the struct pdl, it's very low-level) is done on the ndarray that gets destroyed immediately after, not on the one passed in; this is a bug that can be fixed without breaking ABI by allocating extra space in the trans.pdls pointer-array for the (potentially converted and therefore ephemeral) outputs, and putting the convert-ees in that extra space, and putting the input ndarrays in the expected places so their badflags can be correctly updated - no ABI change needed

This was implemented the linked commit above, and together with making PDL.propagate_badflag actually set the badflag as well, makes a very nice, comprehensible API. When the badflag-setting is controlled by a state flag within the Code, and only done outside the broadcastloop, it really renders the whole CopyBadStatusCode idea almost irrelevant. The only exception to this is revealed by CCS, which wants the badflag-setting of two different output ndarrays to depend on the badflag value of two different input ndarrays (and it doesn't call PDL.propagate_badflag, so doesn't work properly).

Luckily, there is an easy solution to this:

  • set the badflag as desired in the Code with $PDLSTATESETBAD(ndarrayname)
  • make the CopyBadStatusCode just call the PDL.propagate_badflag with the set values, i.e. to do the same as the new API does

With that, when-not-if I delete the CopyBadStatusCode mechanism, it will all still work right and be backwards-compatible.

  1. have more formal transform and ndarray lifecycle "hooks" (would need an ABI update which I like to avoid)

Deferred for now.

  1. make badflag propagation more automatic, and thereby making this manual updating in the generated C code unnecessary (ABI update again)

Done - no ABI update required! But see notes to point 3 (which are largely needed anyway due to the CopyBadStatusCode usually not propagating badflag).

@mohawk2
Copy link
Collaborator Author

mohawk2 commented Nov 14, 2024

After some time...

The pdlapi.c checks regarding dataflow was updated in 2.090 (PDLPorters/pdl#485) to disallow output to ndarrays that have input-only dataflow. This still allows output to ones that have two-way dataflow, including converts and slices, since that still makes sense as writing to something calls changed which with backwards flow will trigger immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants