-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): refactor base AnnData
class to use a AnnDataBase
abstract class
#949
Conversation
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #949 +/- ##
==========================================
- Coverage 83.08% 82.95% -0.14%
==========================================
Files 34 35 +1
Lines 5404 5502 +98
==========================================
+ Hits 4490 4564 +74
- Misses 914 938 +24
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks interesting! Put some comments down, but have some additional thoughts:
__getitem__(self, idx) -> AbstractAnnData:
abstract method?write_h5ad
/write_zarr
abstract methods?
The obs_names and var_names (which are basically indices for the AnnData object at the moment) are coupled to the existence of obs and var, so it's a bit of a dog eating it's tail. This gets back to the first point - if we had an easy/uniform was to generate "empty" versions of the wrapping classes like DataFrame, this wouldn't be such an issue.
For pandas dataframes, this is easy: pd.DataFrame({}, index=index)
But one could imagine using a pyarrow Table
, which doesn't have an index
.
There are A LOT of abstract methods, which may be confusing.
Yeah. It would be good to think about including more concrete implementations, instead of just a list of things to implement.
I'm wondering if we could even just explicitly say layers
, obsm
, varm
, obsp
, varp
are aligned mappings.
How do we determine what contract obs, obsm, etc. should fulfill to work with AnnData's indexing/view system? Should we be particular about this?
For sure on things like shape. Which would be addressed by my point above.
Other things (like indexing behavior) are difficult to specify/ determine statically and can be expensive to ensure dynamically.
for more information, see https://pre-commit.ci
No need after #1247 |
This PR refactors the
AnnData
class to inherit from anAnnDataBase
class so that we can develop new candidateAnnData
classes more easily and in a structured. At the moment there a number of issues I see here:AxisArray
forobsm
andDataFrame
forvar
) is not particularly uniform, so we can't just set upAnnData
classes with different classes as attributes - instead we have these_assign_XXXX
methodsAnnData
class performs a series of checks that are not easily decoupled from the initialization process. I've taken a stab at this but it will require more refinement.obs_names
andvar_names
(which are basically indices for theAnnData
object at the moment) are coupled to the existence ofobs
andvar
, so it's a bit of a dog eating it's tail. This gets back to the first point - if we had an easy/uniform was to generate "empty" versions of the wrapping classes likeDataFrame
, this wouldn't be such an issue. I'm not really sure this is an issue, though. We could set up a check on theobs_names
property/setter to just look for the existence foobs
first and handle the case where it doesn't exist._init_as_actual
that are both somehwat logical and based on the currentAnnData
method but there is probably a cleaner way of doing this. My hope would be that people could fill in these methods without touching_init_as_actual
_init_as_view
needs to be abstracted. It's not so long.There might be other things, but these are my first thoughts.