Non-ASCII output for data files (egsdat, 3ddose, ... etc) #824
Replies: 2 comments
-
I agree, let's start looking at a binary data io formats for EGSnrc. Off the bat, I would suggest looking a standard scientific columnar format such as Apache Arrow; any other suggestions? We could devise our own, but Arrow claims the following very useful features, especially from a simulation perspective: 1, O(1) (constant-time) random access |
Beta Was this translation helpful? Give feedback.
-
It also looks to natively be able to handle I/O in binary, and the IPC output they have make it seem fairly straightforward to implement. Reading/writing csv and json files might also be useful for auxiliary data like spectra or histograms. From a quick glance, it seems pretty good. |
Beta Was this translation helpful? Give feedback.
-
Follow up on this comment: clrp-code/egs_brachy#22 (comment)
Although it is something I've been thinking about for close to a decade now, my work with egs_brachy and the new egs_mird in the last two years have really started to show the inefficiency of using standard EGSnrc 3ddose and egsphant files and my simulations have experienced massive slowdowns due to outputting very large egsdat files. Even when reducing nbatch*nchunk to 1, in simulations that read in an egsphant, run a quick simulation, and output a 3ddose file, simulation time is often in parity with I/O based on my studies.
I think something that would really help ameliorate the situation would be adding a suite of binary output options. Pure data files that hold regions & media as ints and boundaries & values as doubles. Though in some cases it might lead to a small increase in filesize, (ie, ASCII text media assignment in egsphants currently uses 1 byte chars and int implementation would be 2-8 bytes) most inputs would be slightly more future-proofed and would read and write faster. With decimal values, the increase in speed would be much higher when reading in mantissa, and would likely occupy less space, as a double representation of a value would occupy the same space as the string "1.63E-12" which only has 3 digits for the non-exponent number value.
In the past, I've implemented begsphant and b3ddose equivalents to egsphant and 3ddose files: https://github.com/MartinMartinov/3ddose_tools/tree/master/source
which are almost exact equivalents to their text version, just stored as a series of chars, ints (I hope I made sure it was a set size), and doubles. The begsphant files are a little rough, because I maintained the value indices of the text file (so one could easily swap back and forth), so they might be an okay jumping off point.
Beta Was this translation helpful? Give feedback.
All reactions