Skip to content
/ DOSA Public

IBM cloudFPGA Distributed Operator Set Architectures (DOSA)

License

Notifications You must be signed in to change notification settings

cloudFPGA/DOSA

Repository files navigation

DOSA

IBM cloudFPGA Distributed Operator Set Architectures (DOSA) [version gradatim]

OSA concept simple

About

The computational requirements of artificial intelligence workloads are growing exponentially. In addition, more and more compute is moved towards the edge due to latency or localization constraints. At the same time, Dennard scaling has ended and Moore’s law is winding down. These trends created an opportunity for specialized accelerators including field-programmable gate arrays (FPGAs), but the poor support and usability of today’s tools prevents FPGAs from being deployed at scale for deep neural network (DNN) inference applications.

Therefore, we propose an organic compiler — DOSA — that drastically lowers the barrier for deploying FPGAs. DOSA builds on the operation set architecture concept and integrates the DNN accelerator components generated by existing DNN-to-FPGA frameworks to produce an overall efficient solution. DOSA starts from DNNs represented in the community standard ONNX and automatically implements model- and data-parallelism, based on the performance targets and resource footprints provided by the user.

This repository contains the enhanced proof-of-concept implementation of this organic compiler principle that can compile and partition an ONNX to multiple FPGAs with just a one command. Currently, the gradatim version of DOSA supports the hls4ml and VHDL4CNN libraries for building, and additional VTA for analysis. ZRLMPI is used as hardware-agnostic communication protocol. The FPGA binaries are built using the cFDK. Depending on the selected target device, the deployment of the FPGA binaries requires access to the IBM cloudFPGA platform.

DOSA supports two input file formats: ONNX and torchscript. For the onnx flow, DOSA assumes that the weights in the ONNX are already fully quantized (by tools like e.g. Brevitas or Aimet). The corresponding number representation must be configured in the input constraints. However, if using the torchscript flow in combination with --calibration-data, then DOSA does the post-training quantization using Brevitas automatically.

More details of the supported libraries and flows are described in ./doc/DOSA_flow.md. A detailed description of concepts and research behind DOSA can be found here (Chapter 4). More publications around DOSA are listed below. Please also note the known limitations.

Installation

Basically:

git clone --recurse-submodules https://github.com/cloudFPGA/DOSA.git
cd DOSA
virtualenv venv -p /usr/bin/python3.8
source venv/bin/activate
pip install -r requirements.txt --no-dependencies

Besides this, DOSA requires python3 and llvm development environment and a local installation of our TVM fork. The detailed requirements as well as all steps to setup DOSA are described in ./doc/Install.md.

Alternatively, DOSA can also be run inside a docker container, see Docker section in ./doc/Install.md.

Usage

Compilation

General usage:

Usage: 
    ./gradatim.sh onnx <path-to-dosa_config.json> <path-to-model.file> <path-to-constraints.json> <path-to-build_dir> [--no-roofline|--no-build|--only-stats|--only-coverage]
    ./gradatim.sh torchscript <path-to-dosa_config.json> <path-to-model.file> <path-to-constraints.json> <path-to-build_dir> [--calibration-data <path-to-calibration_data.npy>] [--no-roofline|--no-build|--only-stats|--only-coverage]

Commands:
    onnx            Uses the ONNX flow (this excludes post-training quantization).
    torchscript     Uses the torchscript flow.

Options:
    -h --help       Show this screen.
    -v --version    Show version.

    <path-to-dosa_config.json>              Path to the DOSA config JSON.
    <path-to-model.file>                    Path to the model to compile (either ONNX or torchscript).
    <path-to-constraints.json>              Path the the constraints JSON.
    <path-to-build_dir>                     Path to the output build folder.

    --calibration-data <path-to-calibration_data.npy>   If the torchscript flow is used, post-training quantization 
                                                        is possible using the specified numpy array as calibration data 
                                                        (i.e. training data without labels). 

    --no-roofline                           Disables the display of Roofline plots.
    --no-build                              Disables the generation of build files (just Roofline plots are shown).
    --only-stats                            Just print the architecture statistics (and disables all other outputs).
    --only-coverage                         Just print the OSG coverage (and disable all other outputs).

The mandatory arguments are:

  • the flow onnx or torchscript
  • dosa_config.json: JSON file containing the general configuration of DOSA. In most cases the default configuration in ./config/dosa_config_default.json is sufficient.
  • model.file: The ONNX or torchscript of the DNN that should be compiled.
  • constraint.json: The JSON file containing the target constraints. See examples in the ./examples/ folder.
  • path/to/build_dir/: The path to the directory where the FPGA build files should be emitted to. How to handle non-empty build directories can be configured in the dosa_config.json.

The optional arguments to change the output are:

  • --no-roofline: Deactivates the display of the Roofline analysis (could be up to 30 windows).
  • --no-build: Deactivates the generation of build files and just the Roofline analysis is shown.
  • --only-stats: DOSA emits only the architecture draft including its characteristics. No build files are generated and no Roofline anlaysis is shown.
  • --only-coverage: DOSA emits only the coverage of each OSG of the given ONNX. No build files are generated, no Roofline anlaysis is shown, and no architecture draft is generated. Only one of those optional argument is allowed! As default, DOSA shows the Roofline analysis, generates the build files, and prints the high-level architecture draft.

Additional optional arguments are:

  • --calibration-data: To provide calibration data for the post-training quantization, if the torchscript flow is used.

See more details:

./gradatim.sh -h

Build

After DOSA finished, the build of all FPGA binaries can be invoked with

cd path/to/build_dir/
# source build tools requirements, e.g. Vivado/Vitis/...
./dosa_build.sh
# the build processes are started in a tmux, to view them
tmux at

Deployment

The deployment is platform specific, of course. However, if supported by the target platform, DOSA wraps the necessary commands in the script dosa_deploy.py in the build_dir.

Examples

The ./examples/ folder contains some ONNX files and their corresponding constraints.

For example, to show the Roofline analysis of the PTTCNN example (CNN fom the pytorch tutorial) without generating build files, execute the following command:

. venv/bin/activate
./gradatim.sh onnx ./config/dosa_config_default.json ./examples/PTTCNN_int8.onnx ./examples/PTTCNN_meta.json ./my_build_dirs/pttcnn/ --no-build
# maybe `export PYTHONPATH=.` is necessary before

Known Limitations

This is a research project and therefore proof-of-concept prototype! So, naturally, there are some limitations regarding the features and supported use cases:

  • While the architecture of DOSA is flexible, there is right now only one supported build tool: cFBuild for the cloudFPGA project. However, another build tool could be implemented by simply inherit the HwBuildTopVhdl in gradatim/backend/buildTools/BaseBuild.py.
  • Likewise, DOSA could support many communication libraries, but right now only the support for ZRLMPI is implemented. To add a new communication library, implement a new class and inherit from BaseCommLib gradatim/backend/commLibs/BaseCommLib.py. The corresponding wrapper for the hardware and software cores, must then implement the CommunicationWrapper class in gradatim/backend/codeGen/CommunicationWrapper.py.
  • The post-training quantization feature of DOSA depend on a custom TVM version, as explained in ./doc/Install.md.
  • The OSG hls4ml depends on Vivado 2019.2, due to incompatibilities between Vivado version and HLS4ML major internal architecture change. Also, hls4ml cannot generate all sizes of the multi_threshold operation required for post-training quantized networks.

On one hand, the above listed limitations highlight the difficulty to create and maintain an open-source framework within the FPGA community. Two reasons for this (among many others) is the fact that there are no commonly used APIs between IP cores or other components (e.g. Shell and Role) and a limited commitment to true open-source tool chains. We discussed this at multiple workshops with the community (cf. cFDevOps20, cFDevOps21, cFDevOps22) and also tried to push for common "POSIX-like" interfaces within the FPGA. On the other hand, we conclude that the concept of Operation Set Architectures did overcome many typical "road blocks" of FPGA tool chains and exhibits great flexibility and efficiency.

Citation

If you use this software in a publication, please cite our two papers introducing Operation Set Architectures and explaining DOSA:

@Article{CAL_OSA,
  author   = {Ringlein, Burkhard and Abel, Francois and Diamantopoulos, Dionysios and Weiss, Beat and Hagleitner, Christoph and Fey, Dietmar},
  journal  = {IEEE Computer Architecture Letters},
  title    = {{Advancing Compilation of DNNs for FPGAs using Operation Set Architectures}},
  year     = {2023},
  issn     = {1556-6064},
  month    = jan,
  number   = {1},
  pages    = {9--12},
  volume   = {22},
  doi      = {10.1109/LCA.2022.3227643},
  url      = {https://ieeexplore.ieee.org/document/9984183/},
}

@InProceedings{EDGE_DOSA,
  author   = {Ringlein, Burkhard and Abel, Francois and Diamantopoulos, Dionysios and Weiss, Beat and Hagleitner, Christoph and Fey, Dietmar},
  booktitle = {Proceedings of the 2023 IEEE International Conference On Edge Computing & Communications (EDGE 2023))},
  title     = {{DOSA: Organic Compilation for Neural Network Inference on Distributed FPGAs}},
  year      = {2023},
  address   = {Chicago, Illinois},
  month     = jul,
  pages     = {43--50},
  publisher = {IEEE},
  date      = {2-8 July 2023},
  doi       = {10.1109/EDGE60047.2023.00019},
}

License

DOSA is released under the Apache 2.0 License.

Structure of this repository

  • config/: Contains the default configurations for DOSA.
  • db/: Contains resource databases needed by some OSGs.
  • gradatim/: The python package containing the DOSA compiler (version gradatim).
  • doc/: Contains some documentation.
  • examples/: Contains some example DNNs with their constraint files.
  • scripts/: Contains further scripts helping to use DOSA.
  • setup/: Contains files required during installation.
  • gradatim.sh: A script invoking DOSA.

Publications

Funding

This research was supported in part by the Horizon 2020 EU Research & Innovation programme under GA No 957269 (EVEREST project).


Trivia: The second version of DOSA is named after the Latin word for "one step after the other". (The first version of DOSA was named after one popular exoplanet while at the same time meaning "half" in Latin.)