-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Balar mmio with Vanadis #2428
Open
William-An
wants to merge
62
commits into
sstsimulator:devel
Choose a base branch
from
William-An:balar-mmio-vanadis-llvm
base: devel
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Balar mmio with Vanadis #2428
Changes from all commits
Commits
Show all changes
62 commits
Select commit
Hold shift + click to select a range
b7f5568
balar: update readme links to official repos
William-An 89b8255
balar-mmio: add balar to vanadis device list
William-An 7ec7c66
balar-mmio: update custom cuda lib to map balar to vanadis's VM
William-An a492c36
balar-mmio: separate data and command interfaces for balar
William-An 2bc0f37
balar-mmio: finish config script for balar+vanadis via mmap
William-An a977443
balar-mmio: refactor config script to use builder class
William-An 7e956ff
balar-mmio: add vanadis test to testsuite
William-An 0384dc8
balar-mmio: update testsuite ref files
William-An e9e6cce
balar-mmio: update dist files
William-An 5243525
balar-mmio: add cuda files for llvm
William-An aa0d52b
balar-mmio: add support for LLVM CUDA binary
William-An 9b23271
balar-mmio: make scratch mem aligned to cache block, temp fix
William-An 89d5af3
balar-mmio: make a real vecadd summing sin^2 and cos^2
William-An 7f582a7
balar-mmio: encode CUDA version information in vanadis binary
William-An 9cbea2f
balar-mmio: use DMA engine for read/write cuda packets
William-An 8c933dc
balar-mmio: modify launch scripts for use of DMA engine
William-An 87b6b7a
balar-mmio: update refFile for testcpu
William-An ba247be
balar-mmio: update riscv-cuda cxxflags to avoid macro conflict
William-An 9768256
balar-mmio: add support to append app args
William-An 7cb8eeb
balar-mmio: add support for unaligned cudamemcpy
William-An 84854b0
balar-mmio: update unittest to run rodinia benchmark
William-An b0c6346
balar-mmio: update readme to cover balar+vanadis and new unittest
William-An 9ba38f9
balar-mmio: adding placeholder apis
William-An e678f13
balar-mmio: move CUDA packet definition to a single file
William-An 7ca59d3
balar-mmio: update readme
William-An db6b4f0
balar-mmio: add additional CUDA APIs for rodinia benchmark
William-An 94862fd
balar-mmio: fix some formatting warnings
William-An f8d4955
balar-mmio: update README on GPU_ARCH env
William-An 9792086
balar-mmio: remove include to sst/core/simulation.h
William-An 5c78d6a
balar-mmio: add mprotect for balar's addr
William-An 002910c
balar-mmio: use munmap to unload balar page
William-An 4e1841e
balar-mmio: fix a memory out of bound access bug
William-An 638e747
balar-mmio: add more logging statements
William-An dac9d18
balar-mmio: specify vaddr for dma access
William-An 34a97b1
balar-mmio: use inline syscalls to avoid speculative loads to balar's…
William-An fa465bf
bala-mmio: add more testcases
William-An 61db38b
balar-mmio: add ref files
William-An 2c29339
balar-mmio: change default isa to riscv and change compiler for hands…
William-An 98a41ac
balar-mmio: update reffile since we are using riscv64
William-An 61c20d2
balar-mmio: add more rodinia benchmarks
William-An b1e0b15
balar-mmio: add rodinia hotspot reffile
William-An c71ee0a
balar-mmio: update test time limits
William-An 4b12eab
balar-mmio: fix not returning value for cudaMallocHost
William-An 58f9098
balar-mmio: update time limit and reffile for lud 256
William-An 94f4262
balar-mmio: add rodinia pathfinder and srad reffiles
William-An f3fcc21
balar-mmio: split tests into different testsuites
William-An 3342466
balar-mmio: fix args passing in testcases
William-An d725066
balar-mmio: increase test run time limit
William-An 0d0554d
balar-mmio: limit nproc when making rodinia
William-An 6fbe7a8
balar-mmio: restructure testsuite
William-An b4b03dc
balar-mmio: add support for cudaThreadSynchronize
William-An 66bab78
balar-mmio: create subdirectory for each testcase
William-An df5a6a8
balar-mmio: add refFiles
William-An 33b012f
balar-mmio: add support for cudaMemcpyToSymbol
William-An f1efc17
balar-mmio: add rodinia heartwall
William-An 65b5917
balar-mmio: add texture api support
William-An 6526407
balar-mmio: add heartwell ref file
William-An 8459025
balar-mmio: remove some comments
William-An cf077bc
balar-mmio: clean up unused files and update dist
William-An b9f9964
balar-mmio: remove --disable-mpi and --disable-mem-pools
William-An 3aef532
balar-mmio: remove doc headers
William-An ef89fe0
balar-mmio: create constant macros to keep things consistent
William-An File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,37 +1,35 @@ | ||
# Balar | ||
|
||
SST GPGPU Simulation Components | ||
SST GPGPU-Sim integration component. | ||
Support running CUDA programs via trace-driven (require GPU for trace generation) and direct-execution (CUDA program runs with Vanadis RISC-V CPU component). | ||
|
||
## Installation | ||
|
||
(Noted some of the components are pointed to personal repositories and will be moved once PRs are done) | ||
Balar tested with the following settings: | ||
|
||
### Prerequisites | ||
|
||
Balar is tested with the following settings: | ||
|
||
- gcc 7.5.0 | ||
- Ubuntu 18.04 | ||
- CUDA 10.1 | ||
- gcc 11.4.0 | ||
- Ubuntu 22.04.4 LTS | ||
- CUDA 11.7 | ||
|
||
The following components are needed to run Balar: | ||
|
||
### [`sst-core`](https://github.com/sstsimulator/sst-core/tree/0f358dda178f96db3b0da88b2b965492c4be187d) | ||
### General | ||
|
||
#### [`sst-core`](https://github.com/sstsimulator/sst-core) | ||
|
||
- Tested on commit `0f358dda178f96db3b0da88b2b965492c4be187d` | ||
- Use `./configure --prefix=$SST_CORE_HOME --disable-mpi --disable-mem-pools` for sst-core config | ||
- Use `./configure --prefix=$SST_CORE_HOME` for sst-core config | ||
|
||
### [`sst-elements`](https://github.com/William-An/sst-elements/tree/balar-mmio) | ||
#### [`sst-elements`](https://github.com/sstsimulator/sst-elements) | ||
|
||
- Use `./configure --prefix=$SST_ELEMENTS_HOME --with-sst-core=$SST_CORE_HOME --with-cuda=$CUDA_INSTALL_PATH --with-gpgpusim=$GPGPUSIM_ROOT` for sst-elements config | ||
- `$CUDA_INSTALL_PATH` should point to CUDA toolkit path | ||
- `$GPGPUSIM_ROOT` will be set when sourcing the `setup_environment` script in `GPGPU-Sim`, which should point to its folder path | ||
|
||
### [`GPGPU-Sim`](https://github.com/William-An/gpgpu-sim_distribution/tree/sst-integration) | ||
#### [`GPGPU-Sim`](https://github.com/accel-sim/gpgpu-sim_distribution) | ||
|
||
```sh | ||
# Pull GPGPU-Sim repo | ||
git clone [email protected]:William-An/gpgpu-sim_distribution.git | ||
git clone [email protected]:accel-sim/gpgpu-sim_distribution.git | ||
cd gpgpu-sim_distribution | ||
|
||
# Make sure $CUDA_INSTALL_PATH is set | ||
|
@@ -49,13 +47,15 @@ make -j | |
> # GPGPU-Sim dependencies | ||
> sudo apt-get install build-essential xutils-dev bison zlib1g-dev flex libglu1-mesa-dev``` | ||
|
||
### [`cudaAPITracer`](https://github.com/William-An/accel-sim-framework/tree/cuda_api_tracer) | ||
### Trace-driven | ||
|
||
We put the CUDA api tracer tool inside [Accel-Sim](https://github.com/William-An/accel-sim-framework/tree/cuda_api_tracer) framework in folder `ACCEL-SIM/util/tracer_nvbit/others/cuda_api_tracer_tool`, to install it: | ||
#### [`cudaAPITracer`](https://github.com/accel-sim/accel-sim-framework) | ||
|
||
We put the CUDA api tracer tool inside [Accel-Sim](https://github.com/accel-sim/accel-sim-framework) framework in folder `ACCEL-SIM/util/tracer_nvbit/others/cuda_api_tracer_tool`, to install it: | ||
|
||
```shell | ||
# Get the Accel-Sim | ||
git pull https://github.com/William-An/accel-sim-framework/tree/cuda_api_tracer | ||
# Get the Accel-Sim framework | ||
git clone git@github.com:accel-sim/accel-sim-framework.git | ||
|
||
# cd into tracer tool folder | ||
cd accel-sim-framework/util/tracer_nvbit | ||
|
@@ -86,13 +86,74 @@ Which will generate the following files when exiting: | |
|
||
> Noted the API tracer need a machine with GPU | ||
|
||
### Direct-execution | ||
|
||
In order to run CUDA program directly, the CUDA source code needs to be recompiled with LLVM and RISC-V GCC toolchain. | ||
|
||
#### LLVM and RISCV GCC toolchain | ||
|
||
```bash | ||
## Build LLVM with RISC-V, x86, and CUDA support from source | ||
git clone https://github.com/llvm/llvm-project.git | ||
|
||
mkdir llvm-install | ||
cd llvm-project | ||
mkdir build && cd build | ||
cmake -DLLVM_TARGETS_TO_BUILD="RISCV;X86;NVPTX" -DLLVM_DEFAULT_TARGET_TRIPLE=riscv64-unknown-linux-gnu \ | ||
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lld" -DCMAKE_INSTALL_PREFIX=$LLVM_INSTALL_PATH ../llvm | ||
cmake --build . -j30 | ||
cmake --build . --target install | ||
cd .. | ||
|
||
## Build RISC-V GCC toolchain | ||
git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git | ||
|
||
mkdir riscv-gnu-install | ||
cd riscv-gnu-toolchain | ||
./configure --prefix=$RISCV_INSTALL_PATH | ||
make linux -j | ||
cd .. | ||
|
||
# Set up environment vars to LLVM and RISCV GCC | ||
export LLVM_INSTALL_PATH=$(pwd)/llvm-install | ||
export RISCV_TOOLCHAIN_INSTALL_PATH=$(pwd)/riscv-gnu-install | ||
# Match with the GPU config file we have | ||
export GPU_ARCH=sm_70 | ||
``` | ||
|
||
#### Compiling CUDA for Balar + Vanadis | ||
|
||
In order to run CUDA on Balar + Vanadis (direct-execution), aside from the compilers, the custom CUDA library `libcudart_vanadis` (inside `vanadisLLVMRISCV`) is also needed to intercept CUDA API calls and send them to Balar's MMIO address. This custom CUDA lib can be made via `make -C vanadisLLVMRISCV vanadis_cuda`, which would generate the `libcudart_vanadis.a/so`. | ||
|
||
For compiler and linker flags, you can refer to the `vecadd` target in `vanadisLLVMRISCV/Makefile`. | ||
|
||
#### GPU Application Collection | ||
|
||
We are working on getting a collection of GPU apps to run with Balar+Vanadis. Those benchmarks are from the [gpu-app-collection](https://github.com/accel-sim/gpu-app-collection) repo. | ||
|
||
```bash | ||
git clone [email protected]:accel-sim/gpu-app-collection.git | ||
cd gpu-app-collection | ||
git checkout sst_support | ||
|
||
# Setup environ vars for apps, need to have | ||
# env var LLVM_INSTALL_PATH and RISCV_TOOLCHAIN_INSTALL_PATH | ||
# If you plan to compile the apps directly, you will | ||
# also need to set SST_CUSTOM_CUDA_LIB_PATH to | ||
# the directory of the custom CUDA library, | ||
# which normally will be `SST-ELEMENT-SOURCE/src/sst/elements/balar/tests/vanadisLLVMRISCV` | ||
source ./src/setup_environment sst | ||
``` | ||
|
||
## Usage | ||
|
||
### Trace-driven Mode | ||
|
||
After successful compilation and installation of SST core and SST elements (with GPGPUSim and CUDA), run: | ||
|
||
```bash | ||
# cd into balar | ||
cd $SST_ELEMENTS_HOME/src/sst/balar | ||
cd SST_ELEMENTS_SRC/src/sst/elements/balar | ||
|
||
# balar tests | ||
cd tests/ | ||
|
@@ -115,3 +176,52 @@ make -C vanadisHandshake/ | |
# Run the handshake binary with vanadis core | ||
sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg' | ||
``` | ||
|
||
### Vanadis Mode | ||
|
||
The CUDA executable should be passed in `VANADIS_EXE` and `BALAR_CUDA_EXE_PATH`. If there are args to the program, they should be passed with `VANADIS_EXE_ARGS`. | ||
|
||
```bash | ||
# cd into balar tests | ||
cd SST_ELEMENTS_SRC/src/sst/elements/balar/tests/ | ||
|
||
# Compile test programs | ||
make -C vanadisLLVMRISCV | ||
|
||
# Run CPU only program | ||
VANADIS_EXE=./vanadisLLVMRISCV/helloworld VANADIS_ISA=RISCV64 sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg' | ||
|
||
# Run sample vecadd | ||
VANADIS_EXE=./vanadisLLVMRISCV/vecadd VANADIS_ISA=RISCV64 BALAR_CUDA_EXE_PATH=./vanadisLLVMRISCV/vecadd sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg' | ||
``` | ||
|
||
### Running GPU Benchmark | ||
|
||
Here is an example on running Rodinia 2.0 BFS with SampleGraph.txt input using CUDA 11.7. For different CUDA version, the binary path will differ in terms of version number. | ||
|
||
```bash | ||
# Let GPU app knows about the custom CUDA lib | ||
export SST_CUSTOM_CUDA_LIB_PATH=SST_ELEMENTS_SRC/src/sst/elements/balar/tests/vanadisLLVMRISCV | ||
|
||
# Make Rodinia 2.0 | ||
cd gpu-app-collection | ||
make rodinia_2.0-ft -i -j -C ./src | ||
make data -C ./src | ||
cd .. | ||
|
||
# Run BFS with sample graph input | ||
cd SST_ELEMENTS_SRC/src/sst/elements/balar/tests | ||
VANADIS_EXE=$GPUAPPS_ROOT/bin/11.7/release/bfs-rodinia-2.0-ft \ | ||
VANADIS_EXE_ARGS=$GPUAPPS_ROOT/data_dirs/cuda/rodinia/2.0-ft/bfs-rodinia-2.0-ft/data/SampleGraph.txt \ | ||
VANADIS_ISA=RISCV64 \ | ||
BALAR_CUDA_EXE_PATH=$GPUAPPS_ROOT/bin/11.7/release/bfs-rodinia-2.0-ft sst \ | ||
testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg' | ||
``` | ||
|
||
### Running Unittest | ||
|
||
Balar's unittest suites will automatically compile the GPU app collection with the LLVM and RISCV toolchain and run them. | ||
|
||
```bash | ||
sst-test-elements -w "*balar*" | ||
``` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gvoskuilen @feldergast Do we want to keep the prerequisites in this readme or remove them in favor of the list that we test against? Already discussed what testing we want in the nightlies versus weeklies.