diff --git a/README.md b/README.md index 329e69a..62646c4 100644 --- a/README.md +++ b/README.md @@ -23,6 +23,17 @@ It allows you to: 3. create a more complex single cell dataset 4. extend it to your need +## About + +the idea is to use it to train models like scGPT / GeneFormer (and soon, scPrint ;)). It is: + +1. loading from lamin +2. doing some dataset specific preprocessing if needed +3. creating a dataset object on top of .mapped() (that is needed for mapping genes, cell labels etc..) +4. passing it to a dataloader object that can work with it correctly + +Currently one would have to use the preprocess function to make the dataset fit for different tools like scGPT / Geneformer. But I would want to enable it through different Collators. This is still missing and a WIP... (please do contribute!) + ## Install it from PyPI ```bash