IBM cloudFPGA Distributed Operator Set Architectures (DOSA) [version gradatim]
The computational requirements of artificial intelligence workloads are growing exponentially. In addition, more and more compute is moved towards the edge due to latency or localization constraints. At the same time, Dennard scaling has ended and Moore’s law is winding down. These trends created an opportunity for specialized accelerators including field-programmable gate arrays (FPGAs), but the poor support and usability of today’s tools prevents FPGAs from being deployed at scale for deep neural network (DNN) inference applications.
Therefore, we propose an organic compiler — DOSA — that drastically lowers the barrier for deploying FPGAs. DOSA builds on the operation set architecture concept and integrates the DNN accelerator components generated by existing DNN-to-FPGA frameworks to produce an overall efficient solution. DOSA starts from DNNs represented in the community standard ONNX and automatically implements model- and data-parallelism, based on the performance targets and resource footprints provided by the user.
This repository contains the enhanced proof-of-concept implementation of this organic compiler principle that can compile and partition an ONNX to multiple FPGAs with just a one command. Currently, the gradatim
version of DOSA supports the hls4ml and VHDL4CNN libraries for building, and additional VTA for analysis. ZRLMPI is used as hardware-agnostic communication protocol. The FPGA binaries are built using the cFDK.
Depending on the selected target device, the deployment of the FPGA binaries requires access to the IBM cloudFPGA platform.
DOSA supports two input file formats: ONNX and torchscript.
For the onnx
flow, DOSA assumes that the weights in the ONNX are already fully quantized (by tools like e.g. Brevitas or Aimet). The corresponding number representation must be configured in the input constraints.
However, if using the torchscript
flow in combination with --calibration-data
, then DOSA does the post-training quantization using Brevitas automatically.
More details of the supported libraries and flows are described in ./doc/DOSA_flow.md. A detailed description of concepts and research behind DOSA can be found here (Chapter 4). More publications around DOSA are listed below. Please also note the known limitations.
Basically:
git clone --recurse-submodules https://github.com/cloudFPGA/DOSA.git
cd DOSA
virtualenv venv -p /usr/bin/python3.8
source venv/bin/activate
pip install -r requirements.txt --no-dependencies
Besides this, DOSA requires python3
and llvm
development environment and a local installation of our TVM fork.
The detailed requirements as well as all steps to setup DOSA are described in ./doc/Install.md.
Alternatively, DOSA can also be run inside a docker container, see Docker section in ./doc/Install.md.
General usage:
Usage:
./gradatim.sh onnx <path-to-dosa_config.json> <path-to-model.file> <path-to-constraints.json> <path-to-build_dir> [--no-roofline|--no-build|--only-stats|--only-coverage]
./gradatim.sh torchscript <path-to-dosa_config.json> <path-to-model.file> <path-to-constraints.json> <path-to-build_dir> [--calibration-data <path-to-calibration_data.npy>] [--no-roofline|--no-build|--only-stats|--only-coverage]
Commands:
onnx Uses the ONNX flow (this excludes post-training quantization).
torchscript Uses the torchscript flow.
Options:
-h --help Show this screen.
-v --version Show version.
<path-to-dosa_config.json> Path to the DOSA config JSON.
<path-to-model.file> Path to the model to compile (either ONNX or torchscript).
<path-to-constraints.json> Path the the constraints JSON.
<path-to-build_dir> Path to the output build folder.
--calibration-data <path-to-calibration_data.npy> If the torchscript flow is used, post-training quantization
is possible using the specified numpy array as calibration data
(i.e. training data without labels).
--no-roofline Disables the display of Roofline plots.
--no-build Disables the generation of build files (just Roofline plots are shown).
--only-stats Just print the architecture statistics (and disables all other outputs).
--only-coverage Just print the OSG coverage (and disable all other outputs).
The mandatory arguments are:
- the flow
onnx
ortorchscript
dosa_config.json
: JSON file containing the general configuration of DOSA. In most cases the default configuration in ./config/dosa_config_default.json is sufficient.model.file
: The ONNX or torchscript of the DNN that should be compiled.constraint.json
: The JSON file containing the target constraints. See examples in the ./examples/ folder.path/to/build_dir/
: The path to the directory where the FPGA build files should be emitted to. How to handle non-empty build directories can be configured in thedosa_config.json
.
The optional arguments to change the output are:
--no-roofline
: Deactivates the display of the Roofline analysis (could be up to 30 windows).--no-build
: Deactivates the generation of build files and just the Roofline analysis is shown.--only-stats
: DOSA emits only the architecture draft including its characteristics. No build files are generated and no Roofline anlaysis is shown.--only-coverage
: DOSA emits only the coverage of each OSG of the given ONNX. No build files are generated, no Roofline anlaysis is shown, and no architecture draft is generated. Only one of those optional argument is allowed! As default, DOSA shows the Roofline analysis, generates the build files, and prints the high-level architecture draft.
Additional optional arguments are:
--calibration-data
: To provide calibration data for the post-training quantization, if thetorchscript
flow is used.
See more details:
./gradatim.sh -h
After DOSA finished, the build of all FPGA binaries can be invoked with
cd path/to/build_dir/
# source build tools requirements, e.g. Vivado/Vitis/...
./dosa_build.sh
# the build processes are started in a tmux, to view them
tmux at
The deployment is platform specific, of course. However, if supported by the target platform, DOSA wraps the necessary commands in the script dosa_deploy.py
in the build_dir
.
The ./examples/ folder contains some ONNX files and their corresponding constraints.
For example, to show the Roofline analysis of the PTTCNN example (CNN fom the pytorch tutorial) without generating build files, execute the following command:
. venv/bin/activate
./gradatim.sh onnx ./config/dosa_config_default.json ./examples/PTTCNN_int8.onnx ./examples/PTTCNN_meta.json ./my_build_dirs/pttcnn/ --no-build
# maybe `export PYTHONPATH=.` is necessary before
This is a research project and therefore proof-of-concept prototype! So, naturally, there are some limitations regarding the features and supported use cases:
- While the architecture of DOSA is flexible, there is right now only one supported build tool:
cFBuild
for the cloudFPGA project. However, another build tool could be implemented by simply inherit theHwBuildTopVhdl
in gradatim/backend/buildTools/BaseBuild.py. - Likewise, DOSA could support many communication libraries, but right now only the support for ZRLMPI is implemented. To add a new communication library, implement a new class and inherit from
BaseCommLib
gradatim/backend/commLibs/BaseCommLib.py. The corresponding wrapper for the hardware and software cores, must then implement theCommunicationWrapper
class in gradatim/backend/codeGen/CommunicationWrapper.py. - The post-training quantization feature of DOSA depend on a custom TVM version, as explained in ./doc/Install.md.
- The OSG
hls4ml
depends on Vivado 2019.2, due to incompatibilities between Vivado version and HLS4ML major internal architecture change. Also,hls4ml
cannot generate all sizes of themulti_threshold
operation required for post-training quantized networks.
On one hand, the above listed limitations highlight the difficulty to create and maintain an open-source framework within the FPGA community. Two reasons for this (among many others) is the fact that there are no commonly used APIs between IP cores or other components (e.g. Shell and Role) and a limited commitment to true open-source tool chains. We discussed this at multiple workshops with the community (cf. cFDevOps20, cFDevOps21, cFDevOps22) and also tried to push for common "POSIX-like" interfaces within the FPGA. On the other hand, we conclude that the concept of Operation Set Architectures did overcome many typical "road blocks" of FPGA tool chains and exhibits great flexibility and efficiency.
If you use this software in a publication, please cite our two papers introducing Operation Set Architectures and explaining DOSA:
@Article{CAL_OSA,
author = {Ringlein, Burkhard and Abel, Francois and Diamantopoulos, Dionysios and Weiss, Beat and Hagleitner, Christoph and Fey, Dietmar},
journal = {IEEE Computer Architecture Letters},
title = {{Advancing Compilation of DNNs for FPGAs using Operation Set Architectures}},
year = {2023},
issn = {1556-6064},
month = jan,
number = {1},
pages = {9--12},
volume = {22},
doi = {10.1109/LCA.2022.3227643},
url = {https://ieeexplore.ieee.org/document/9984183/},
}
@InProceedings{EDGE_DOSA,
author = {Ringlein, Burkhard and Abel, Francois and Diamantopoulos, Dionysios and Weiss, Beat and Hagleitner, Christoph and Fey, Dietmar},
booktitle = {Proceedings of the 2023 IEEE International Conference On Edge Computing & Communications (EDGE 2023))},
title = {{DOSA: Organic Compilation for Neural Network Inference on Distributed FPGAs}},
year = {2023},
address = {Chicago, Illinois},
month = jul,
pages = {43--50},
publisher = {IEEE},
date = {2-8 July 2023},
doi = {10.1109/EDGE60047.2023.00019},
}
DOSA is released under the Apache 2.0 License.
- config/: Contains the default configurations for DOSA.
- db/: Contains resource databases needed by some OSGs.
- gradatim/: The python package containing the DOSA compiler (version gradatim).
- doc/: Contains some documentation.
- examples/: Contains some example DNNs with their constraint files.
- scripts/: Contains further scripts helping to use DOSA.
- setup/: Contains files required during installation.
- gradatim.sh: A script invoking DOSA.
- DOSA: Organic Compilation for Neural Network Inference on Distributed FPGAs, 2023 IEEE International Conference On Edge Computing & Communications (EDGE 2023).
- recording of the conference presentation can be found here
- Advancing Compilation of DNNs for FPGAs Using Operation Set Architectures, IEEE Computer Architecture Letters, 2023.
- video introduction can be found here
- Compiling DNNs to Distributed FPGAs Using Operator Set Architectures (Chapter 4), PhD Thesis, Friedrich-Alexander University Erlangen-Nürnberg, 2022.
This research was supported in part by the Horizon 2020 EU Research & Innovation programme under GA No 957269 (EVEREST project).
Trivia: The second version of DOSA is named after the Latin word for "one step after the other". (The first version of DOSA was named after one popular exoplanet while at the same time meaning "half" in Latin.)