Parallelize the process #8

labra · 2021-07-14T12:32:50Z

At this moment the dump processor seems to be sequential and uses only one of the 60 cores in our server.

This can be seen with htop

It would be nice to do some research about how we can parallelize the dump processing code. Some questions that I would like to answer:

Is it possible to parallelize the processing of gzip files? At this moment, the processor extends EntityTimerProcessor. Can that processor work in parallel? Could we use another approach for gzipped files?
Ideally, I would like to use cats.effect.IO or fs2, but I am not sure if this has been tried before or even if it is possible...

The text was updated successfully, but these errors were encountered:

labra · 2022-11-09T17:49:09Z

Another option discussed during the biohackathon would be to split the source dump in the same number of parts as CPUs, run the dockers in parallel in the different CPUs, and join the results. @nilshoffmann

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize the process #8

Parallelize the process #8

labra commented Jul 14, 2021 •

edited

Loading

labra commented Nov 9, 2022

Parallelize the process #8

Parallelize the process #8

Comments

labra commented Jul 14, 2021 • edited Loading

labra commented Nov 9, 2022

labra commented Jul 14, 2021 •

edited

Loading