-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idr0125 NGFF import #4
Comments
Updated bulk import yaml to import I went run bulk import on
but found that this failed because every Plate.zarr directory is empty.
Could this be due to the update of symlinks in ManagedRepo? Commands are at https://github.com/IDR/idr0125-way-cellpainting/blob/d883ec3a88178b3aaf207acbf9612ef0e5e3f840/scripts/symlinks.bash E.g.
But this shouldn't affect what's under |
Instead, lets simply download and import the NB: this loop repeats for every
|
Needed to rename each dir to remove
|
|
Seems like import got stopped at some point (maybe due to server restart)...
Moved the imported dirs out of
and restarted import in for loop as above, from idr0125/ |
Moved
|
For running
Lets run mkngff on this Plate/Filesets... as regular
as omero-server...
Viewing... http://localhost:1040/webclient/?show=well-2339987 EDIT: after 3 days this image(s) are still not viewable. Probably just takes too long for ZarrReader to read a BIG plate with all the data being remote in USA. |
All METADATA.ome.xml imported above. 94 Plates imported in about 90 hours. 1 hour per plate! |
Since running mkngff above, and attempting to view image at http://localhost:1040/webclient/?show=well-2339987 (which still isn't visible) we now have ZarrReader update on idr0125-pilot. Try another plate... 2nd plate: Fileset ID: As wmoore user on idr0125-pilot...
As omero-server user...
View image at http://localhost:1040/webclient/?show=image-14847759 ... |
This image is also still not viewable. I think we have to conclude that this approach isn't going to work. There's probably just too many files to load remotely (from the US). |
Since the size of a "metadata-only" plate appears to be moderate, we can look again at the workflow tested earlier:
|
With idr0125_plates.csv:
...completed 18:54. Used
|
Started import the metadata-only plates downloaded to
|
51 plates imported so far: ~ 10 per day. |
Test import of first Plate on idr-testing with ZarrReader: Using
Moved data to
|
Also on idr0125-pilot, wanted to test whether import with chunks is working with current ZarrReader (which is likely 0.3.1 and Canonical paths PR)... EDIT - probably this is Performace PR before getUsedFiles - this works for import and for viewing mkngff Filesets after memo is generated - but doesn't work for generating memo file after mkngff. Import idr0054 data directly from goofys mount of BIA bucket...
Or from local copy...
|
Since import from #4 (comment) got interrupted (server restarts above), 61 plates were imported, so deleted 61 rows from
EDIT: oops - forgot to use |
Importing metadata-only plates is still in progress on idr0125-pilot. Testing symlinking etc... with IDR/idr-utils#54 Moved some imported Plates into temp
A couple of issues:
|
The 2nd issue (bullet point) above seems to be due to the path in However, the usual behaviour elsewhere is for the path/to/plate.zarr truncated to just give the plate.zarr, which is added to the Fileset prefix. This is probably because I didn't use Stopped import (restarted the server and updated ZarrReader to use 77 plates imported now. Last one was Restart...
|
For the first 77 plates imported, we need to use a TEMP commit of IDR/idr-utils@2888b21 to add in the
|
Something strange going on..... ALL 3 of the images rendered above (one for the each fileset) are orphans!
Images are found in that script with:
|
It also appears that all plates imported above (without |
Seems that with the first 2 batches imported above (without in-place import) each Fileset creates lots of extra images:
Will need to re-import all those 77 plates! |
Checking last import logs
Edited
EDIT... 3 failed imports re-tried above ALL failed with |
EDIT - Kinda slow to render Images and create symlinks: Took 75 mins, to do 17 Plates |
Import done.
As omero-server... 10:38...
EDIT: 11:00... leave running..
Tried browsing webclient in preparation for thumbnailing... Got
Can't login - need 12:41...
completed (last link created at) 13:53. 70 mins for 43 symlinks |
Browsing webclient to check links before running render...
This is without running anything else just now... Restart server again... Browsing a different plate on webclient - again...
|
After leaving the server for a few days... for i in {11053..11058}; do echo "$i"; python idr-utils/scripts/managed_repo_symlinks.py Plate:$i /cellpainting-gallery/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr --repo /data/OMERO/ManagedRepository --report; done Tried again viewing an Image from the 3rd Plate: Plate: 11053,
However, images from the Plate are not viewable.
|
Also still seeing ZarrReader logs...
|
Returning to idr0125-pilot again (since updating the ZarrReader to test idr0015 NGFF data) it looks like all the idr0125 data imported above is not viewable, possibly due to memo file regeneration. Since we now have ZarrReader-0.4.0 released now with the NGFF performance improvements, lets install that and start a fresh import of all data...
Renamed previous idr0125 Screen in webclient, so fresh one should get created...
|
Import failed...
Looking at Blitz logs around then doesn't show much...
|
Dom saw the same error for idr0138 NGFF import at https://idr-redmine.openmicroscopy.org/issues/266#note-20 |
4 months later...We now have s3-support in ZarrReader, using a
Test import on
Want to install aws..
Download metadata-only plates as omero-server...
Import...
log... First plate...
|
Let's try to create a new bfoptions file for each Fileset as at IDR/idr-metadata#684 (comment) with the appropriate data We want to generate .bfoptions like this for first Plate
First Plate Fileset ID is
As omero-server on pilot-idrngff For ZarrReader, the
Try to view image... 10:40... |
Saving memo was very fast (300 ms!?):
Images are viewable! - approx 6 seconds to render a plane (we only have a single resolution level - added comment at broadinstitute/lincs-cell-painting#54 (comment))
|
To run the above bfoptions generation for "all" plates... Need to generate csv with:
Used https://github.com/IDR/idr0125-way-cellpainting/blob/0a59dab08898e0deb4e75d5b6a80f36e94ac9315/scripts/get_filesets_to_csv.py to generate idr0125_filesets.csv
Plate images are viewable 👍 |
Copy rendering settings from image viewed above (settings applied in webclient) to the 5 plates imported to date... Run in a screen...
|
Rendering of thumbnails took ~16 hours for the first plate - approx 16 secs per image on average. |
First plate took ~16 hours, 2nd plate took 24 hours, then last 3 plates took about 12 hours each.
Checking the
This is missing 2 + 7 images: Browsing the data I can see missing thumbnails for Wells:
These Images all have Channels named e.g. The images appear to have "black" 0-array chunks (rather than missing chunks) of 9.2 kb each: We can manually fix this small number of images by copy, paste & save of rendering settings in webclient. |
Sample plate at https://ome.github.io/ome-ngff-validator/?source=https://cellpainting-gallery.s3.amazonaws.com/cpg0004-lincs/broad/images/2016_04_01_a549_48hr_batch1/images_zarr_withdownscale8/SQ00014812__2016-05-23T20_44_31-Measurement1.ome.zarr/ has downsampled resolutions... Need to download and re-import metadata only plate etc...
|
Create bfoptions...
Looks faster!
EDIT: Done after approx 1.5 hours. |
After OME meeting - now we have pyramids for all platesOn
Import ALL metadata-only plates as aboveCreate
In a screen...
~6 hours later:
|
First import log
Takes approx 5 hours instead of under 4 hours, probably due to extra resolution level |
For first 5 Plates, put these into a Screen:3502 so we can update symlinks...
As omero-server
Took less than 10 mins for 5 plates. Rendering times (without bfoptions s3) in Preview panel is about 0.5 secs or less for low-resolution image plane (3 channels) and 2-3 seconds for full-resolution tiles. Try setting rendering settings and thumbnails without bfoptions, for first 5 plates imported above...
Logs show 2-3 secs per Thumbnail...
...and 3 hours per plate:
|
Import still running from 3 weeks ago #4 (comment) Now we have 90 newly-imported Plates (in addition to the 5 above).
|
Previously we don't need to add the Test on one Plate...
Need to create .bfoptions at
Delete memo...
Taking a while to regenerate (viewing in webclient)... Eventually done - logs claim 414 milli-secs - although this completed at
Previously, when bfoptions file was added to the Fileset, memo file regeneration was very fast (300 ms)! #4 (comment)
Create bfoptions at
with
Delete memo...
View in webclient.... again, memo file generation is not quick.... |
Copy rendering settings from first Image to the rest of first Plate...
|
Seems that not all 90 plates in Screen above have had symlinks created correctly at #4 (comment) Try re-running:
While browsing webclient while still running, got DatabaseBusyException
|
Re-ran
Looks good this time - All 94 Plates completed... in ~3 hours
Repeat with remaining 37 plates in temp Screen:3504 - add to same log...
EDIT (7th Aug) - Mistake in previous command - missed the leading /
|
Trying to view images in webclient to trigger memo file regeneration: Taking a very long time, with occasional DatabaseBusy exceptions -> restart server... Checking logs for memo saving times for one plate give 4 results, all starting within a couple of seconds and taking 5 hours! I wonder if clicking in webclient can trigger multiple memo generations in parallel? Probably better to use script.
|
Memo file generation:
Remove first row, then
Memo generation took about an hour for each Plate (first 3 were already done):
and the last one completed about 7am on the 9th (today). Took approx 13 hours. 👍 |
Copy rendering settings to 131 plates...
Let's try to use parallel...
Then... using demo login...
|
This issue has been mentioned on Image.sc Forum. There might be relevant details there: |
for i in {10562..10692}; do Not 34 in the following Plates: 10568 (33) - SQ00015043 P9 Field 1 |
Annotations...
omero-server
|
On idr0125-pilot, going to test import of METADATA.ome.xml and then use mkngff to replace the fileset with full NGFF plate mounted on s3.
Running mkngff to replace OME.METADATA.ome.xml single-file Fileset with NGFF plate...
The text was updated successfully, but these errors were encountered: