Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Apr 30, 2024
1 parent e66b038 commit fcac7bd
Show file tree
Hide file tree
Showing 44 changed files with 4,978 additions and 1,134 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0e655726
1abc1ad1
Binary file added _tex/figures/cost_vs_cpus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _tex/figures/csdm_efficency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _tex/figures/csdm_speedup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _tex/figures/efficency_vs_cpus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _tex/figures/fwdtim_efficency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified _tex/figures/fwdtim_speedup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
128 changes: 75 additions & 53 deletions _tex/index.tex
Original file line number Diff line number Diff line change
Expand Up @@ -189,8 +189,30 @@ \section{Abstract}\label{abstract}

\section{Introduction}\label{introduction}

Ray is a great system for parallelization
(https://arxiv.org/abs/1712.05889).
Dipy is a popular open-source Python library used for the analysis of
diffusion imaging data. It provides tools for preprocessing,
reconstruction, and analysis of MRI data. Here we focused on three
different reconstruction (XXX need to finish testing of sfm model )
models included in Dipy, the constrained spherical deconvolution, free
water diffusion tensor and sparse facile models. These reconstruction
models, along with several others not tested here are good candidates
for parallel computing, as they are independent on the voxel level.
While in theory parallelizing these workloads should be a fairly simple
task, due to pythons GIL (global interpreter lock), it can prove more
difficult in practice. To work around pythons GIL we utilized the
library Ray, which is a great system for parallelization of python
(https://arxiv.org/abs/1712.05889). In preliminary testing, we looked at
three different libraries to accomplish this task, Joblib, Dask, and
Ray, but ray quickly proved to be both the most performant, as well as
user-friendly and reliable option of the three. Ray's approach to
serialization, the process of converting Python objects into a format
that can be easily stored and transmitted (XXX improve definition of
serialization), also proved to be the least prone to errors for our use
case.

(XXX this was written as a word dump, some of this might need to be
moved to discussion or methods) Ray is a great system for
parallelization (https://arxiv.org/abs/1712.05889).

\section{Methods}\label{methods}

Expand All @@ -200,9 +222,9 @@ \section{Methods}\label{methods}
encapsulate the test and allow for easy reproducibility of the tests.
The testing program computes each model 5 times for each set of unique
parameters. We then iterate across chunk sizes exponentially, from 1-15,
where the number of chunks is 2\^{}x (explain better). We ran the tests
with the following arguments on docker instances with CPU counts, 8, 16,
32, 48, and 72:
where the number of chunks is 2\^{}x (XXX explain better). We ran the
tests with the following arguments on docker instances with CPU counts,
8, 16, 32, 48, and 72:

\begin{verbatim}
--models csdm fwdtim --min_chunks 1 --max_chunks 15 --num_runs 5
Expand All @@ -211,67 +233,49 @@ \section{Methods}\label{methods}
\section{Results}\label{results}

Parallelization with \texttt{ray} provided considerable speedups over
serial excicution for both constrained sperical deconvolution models and
serial execution for both constrained spherical deconvolution models and
free water models. We saw a much greater speedup for the free water
model, which is possibly explained by the fact that it is much more
computationally expensive per voxel. This would mean that the overhead
from parallelizing the model would have a smaller effect on the runtime.
Interestlingly 48 and 72 core instances performed slightly worse than
the 32 core instance on the csdm model, which may indicate that there is
Interestingly 48 and 72 core instances performed slightly worse than the
32 core instances on the csdm model, which may indicate that there is
some increased overhead for each core, separate from the overhead for
each task sent to ray.

\begin{longtable}[]{@{}
>{\raggedright\arraybackslash}p{(\columnwidth - 2\tabcolsep) * \real{0.5000}}
>{\raggedright\arraybackslash}p{(\columnwidth - 2\tabcolsep) * \real{0.5000}}@{}}
\toprule\noalign{}
\endhead
\bottomrule\noalign{}
\endlastfoot
\includegraphics[width=0.8\textwidth,height=0.8\textheight]{figures/csdm_speedup.png}
&
\includegraphics[width=0.8\textwidth,height=0.8\textheight]{figures/fwdtim_speedup.png} \\
\end{longtable}
\includegraphics[width=0.8\textwidth,height=0.8\textheight]{figures/fwdtim_speedup.png}

Efficiency decreases as a function of number of CPUs, but is still
rather high in many configurations. Efficiency is also considerably
higher for the free water tensor model, which is consistent with out
higher for the free water tensor model, which is consistent with our
expectations given that it is more computationally expensive per voxel
and therefor ray overhead would have less effect. The high efficency of
8 core machines suggest that the most cost effective configuration for
processing may be relativly cheap low core machines.

\begin{longtable}[]{@{}
>{\raggedright\arraybackslash}p{(\columnwidth - 2\tabcolsep) * \real{0.5000}}
>{\raggedright\arraybackslash}p{(\columnwidth - 2\tabcolsep) * \real{0.5000}}@{}}
\toprule\noalign{}
\endhead
\bottomrule\noalign{}
\endlastfoot
\includegraphics[width=0.8\textwidth,height=0.8\textheight]{figures/csdm_efficency.png}
&
\includegraphics[width=0.8\textwidth,height=0.8\textheight]{figures/fwdtim_efficency.png} \\
\end{longtable}

XXX Plot peak efficiency as a function of number of CPUs for the two
models. The slope is probably related to the cost-per-voxel of each
model (a lot higher for FWDTI).

Ray tends to spill a large amount of data to disk and does not clean up
afterwards. This can quickly become problematic when running multiple
consecuitive models. Withing just an hour or two of running ray could
easily spill over 500gb to disk. We have implemented a fix for this
within our model as follows:

There seems to be an inverse relationship between the computational cost
per voxel and the speedup that you get from parallelization. This is why
CSD speedup is maximal for 32 cores.
and therefore ray overhead would have less effect. The high efficiency
of 8 core machines suggests that the most cost-effective configuration
for processing may be relatively cheap low core machines.

XXX We should try to make a theoretical guesstimate of the cost (in \$)
per model with the cost of different machines in mind, making some
assumptions about the differences between a 32-core and a 72-core
machine. We might still come out ahead using 72 CPU machines, given the
cost differential in this kind of calculation..
\includegraphics[width=0.8\textwidth,height=0.8\textheight]{figures/csdm_efficency.png}
\includegraphics[width=0.8\textwidth,height=0.8\textheight]{figures/fwdtim_efficency.png}

We can also look at peak efficiency per core (efficiency at the optimal
number of chunks for the given parameters), relative to the number of
cores for both models. What's interesting is that we see a very similar
relationship between both models, with the fwdti model being higher by
almost the same amount for all core counts. This suggests that models
such as fwdti that are more computationally expensive per voxel will see
better speedups due to the overhead of parallelization being lower
relative to the total cost. Interestingly increasing core counts doesn't
further increase the benefit of parallelization relative to overhead,
which suggests that ray overhead may be very linearly related to the
number of cores.

\includegraphics{figures/efficency_vs_cpus.png}

Ray tends to spill a large amount of data to the disk and does not clean
up afterward. This can quickly become problematic when running multiple
consecutive models. Within just an hour or two of running, Ray could
easily spill over 500gb to disk. We have implemented a quick fix for
this within our model as follows:

\begin{Shaded}
\begin{Highlighting}[]
Expand Down Expand Up @@ -299,6 +303,24 @@ \section{Results}\label{results}
\end{Highlighting}
\end{Shaded}

There seems to be an inverse relationship between the computational cost
per voxel and the speedup that you get from parallelization. This is why
CSD speedup is maximal for 32 cores.

We have also made a rough approximation of the total cost of computation
relative to the number of CPUs. Because all tests were run on a
`'c5.18xlarge'' machine, and the docker container was simply limited in
its access to cores, This approximation makes the following assumptions
to estimate the cost of using smaller machines: It assumes that the only
differentiating factor between aws c5 machines' performance is the
number of CPUs, which may not be true for several reasons, such as total
memory available, memory bandwidth, and single-core performance. With
this approximation, we see that cost increases as a function of CPUs.
This suggests that using the smallest machine that still computes in a
reasonable amount of time is likely the best option.

\includegraphics[width=0.8\textwidth,height=0.8\textheight]{figures/cost_vs_cpus.png}

\section{Discussion}\label{discussion}

\subsection{Acknowledgments}\label{acknowledgments}
Expand Down
Binary file added figures/cost_vs_cpus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified figures/csdm_efficency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified figures/csdm_speedup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figures/efficency_vs_cpus.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified figures/fwdtim_efficency.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified figures/fwdtim_speedup.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
496 changes: 0 additions & 496 deletions figures/graphs.ipynb

This file was deleted.

Loading

0 comments on commit fcac7bd

Please sign in to comment.