Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive a sample dataset #12

Open
ns-rse opened this issue Jan 19, 2024 · 5 comments
Open

Derive a sample dataset #12

ns-rse opened this issue Jan 19, 2024 · 5 comments
Labels
data Issues pertaining to sample data

Comments

@ns-rse
Copy link
Contributor

ns-rse commented Jan 19, 2024

We require a sample dataset for this tutorial so that there is a standard set of images for users to process as they work through the different stages.

I realised whilst working on #11 that we probably don't want to copy over the files and directory structure used in the Unix Shell Software Carpentry and would be better served to have a set of AFM images in a hierarchical structure which we can use to demonstrate directory structure and navigation in the Command Line sections and how TopoStats scans for images across the base_dir and how that in turn influences the structure of output.

For this I feel we need more than just minicircles.spm that is used in the TopoStats test suite.

It would also be worthwhile including an image that doesn't process so that we can demonstrate what happens when this arises and how to handle such cases.

If anyone has images that are not under Non-Disclosure Agreements and is able to share them please provide links in reply to this issue (or email me the Google Drive link, no need to send the image).

@derollins
Copy link

Images from the NDP52 project are now all published into a repository so can probably be used for this purpose.

e.g.
20220617_NDP52_FL_40ng_DNA_339LIN_100xGFbuffer_0 2uL_HEPES_20mM_NaCl_50mM 0_00032_height_thresholded
SPM file: https://drive.google.com/file/d/15pfSigAkha5VWlaIVDgSA4pZ_F8ETm49/view?usp=drive_link

more files can be found here:
https://drive.google.com/drive/folders/1oVpwfpvVGQNnL6XBGl45hEjA0dGAVBV9?usp=drive_link

@derollins
Copy link

derollins commented Jan 19, 2024

Additionally there is 'dirty' blanks that all experimentalists will have a good collection of and could be used for this purpose (they are useless otherwise!)

UPDATE (using edit ;) ): here is a folder of "interesting" blanks: https://drive.google.com/drive/folders/1-4rucSQqm1aY40rxd63e8nGEuAcHnRiA?usp=drive_link

@ns-rse
Copy link
Contributor Author

ns-rse commented Jan 19, 2024

Brilliant, thanks @derollins I'll have a look through next week when I'm back on training development.

Useless GitHub Tip : You can edit posts (pen icon at the top right of the gray bar above posts.

@ns-rse
Copy link
Contributor Author

ns-rse commented Jan 23, 2024

Another source of material is those that are available from the Minicircle work, see figshare.

This would be useful is it has both "super-coiled" (-3) and "relaxed" molecules imaged which would help show differences in the distribution of statistics that are calculated with each.

@MaxGamill-Sheffield
Copy link
Collaborator

MaxGamill-Sheffield commented Jan 23, 2024

Looking at minicircle ∆Lk 0 & -3 for a DNA example. Trying to replicate the following results (primarily looking at aspect ratio to show greater compaction as the DNA is under twisted).
Screenshot 2024-02-05 at 12 16 54
The config file for this shouldn't be too fiddly as the sample is "nice".

Looking at NDP52 for a protein example. Trying to replicate the following results (primarily looking at the ferets, maybe volume too to visualise difference between monomers and dimers). "Peaks in KDE plots were used to determine particle size (KDE max ± SD) minimum = 13 ± 6 nm, maximum = 20 ± 12. N = 1365 particles".
Screenshot 2024-02-05 at 12 30 26
The datasets config files are a little more fiddly:

  • gaussian filter of (σ = 1) of 1.5 pixels (1–2 nm)
  • std masking of 0.7σ for CNDP52, 0.57σ for NDP52-FL, and 1.5σ for the selective masking of terminal domains in NDP52-FL
  • min_size of 50 nm2
  • Large and small masks were removed by median size of the masked molecules, and then removed objects outside of a customisable size range based on this (no info on what)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Issues pertaining to sample data
Projects
None yet
Development

No branches or pull requests

8 participants