Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Balar mmio with Vanadis #2428

Open
wants to merge 62 commits into
base: devel
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
b7f5568
balar: update readme links to official repos
William-An Apr 7, 2023
89b8255
balar-mmio: add balar to vanadis device list
William-An Dec 5, 2023
7ec7c66
balar-mmio: update custom cuda lib to map balar to vanadis's VM
William-An Dec 5, 2023
a492c36
balar-mmio: separate data and command interfaces for balar
William-An Dec 19, 2023
2bc0f37
balar-mmio: finish config script for balar+vanadis via mmap
William-An Jan 11, 2024
a977443
balar-mmio: refactor config script to use builder class
William-An Jan 11, 2024
7e956ff
balar-mmio: add vanadis test to testsuite
William-An Jan 11, 2024
0384dc8
balar-mmio: update testsuite ref files
William-An Jan 11, 2024
e9e6cce
balar-mmio: update dist files
William-An Jan 11, 2024
5243525
balar-mmio: add cuda files for llvm
William-An Jan 22, 2024
aa0d52b
balar-mmio: add support for LLVM CUDA binary
William-An Feb 1, 2024
9b23271
balar-mmio: make scratch mem aligned to cache block, temp fix
William-An Feb 1, 2024
89d5af3
balar-mmio: make a real vecadd summing sin^2 and cos^2
William-An Feb 1, 2024
7f582a7
balar-mmio: encode CUDA version information in vanadis binary
William-An Feb 22, 2024
9cbea2f
balar-mmio: use DMA engine for read/write cuda packets
William-An Mar 20, 2024
8c933dc
balar-mmio: modify launch scripts for use of DMA engine
William-An Mar 20, 2024
87b6b7a
balar-mmio: update refFile for testcpu
William-An Mar 20, 2024
ba247be
balar-mmio: update riscv-cuda cxxflags to avoid macro conflict
William-An May 7, 2024
9768256
balar-mmio: add support to append app args
William-An Jun 11, 2024
7cb8eeb
balar-mmio: add support for unaligned cudamemcpy
William-An Jun 12, 2024
84854b0
balar-mmio: update unittest to run rodinia benchmark
William-An Aug 7, 2024
b0c6346
balar-mmio: update readme to cover balar+vanadis and new unittest
William-An Aug 19, 2024
9ba38f9
balar-mmio: adding placeholder apis
William-An Aug 20, 2024
e678f13
balar-mmio: move CUDA packet definition to a single file
William-An Aug 20, 2024
7ca59d3
balar-mmio: update readme
William-An Aug 21, 2024
db6b4f0
balar-mmio: add additional CUDA APIs for rodinia benchmark
William-An Aug 21, 2024
94862fd
balar-mmio: fix some formatting warnings
William-An Aug 21, 2024
f8d4955
balar-mmio: update README on GPU_ARCH env
William-An Aug 23, 2024
9792086
balar-mmio: remove include to sst/core/simulation.h
William-An Aug 25, 2024
5c78d6a
balar-mmio: add mprotect for balar's addr
William-An Aug 31, 2024
002910c
balar-mmio: use munmap to unload balar page
William-An Sep 1, 2024
4e1841e
balar-mmio: fix a memory out of bound access bug
William-An Sep 1, 2024
638e747
balar-mmio: add more logging statements
William-An Sep 1, 2024
dac9d18
balar-mmio: specify vaddr for dma access
William-An Sep 2, 2024
34a97b1
balar-mmio: use inline syscalls to avoid speculative loads to balar's…
William-An Sep 2, 2024
fa465bf
bala-mmio: add more testcases
William-An Sep 2, 2024
61db38b
balar-mmio: add ref files
William-An Sep 2, 2024
2c29339
balar-mmio: change default isa to riscv and change compiler for hands…
William-An Sep 4, 2024
98a41ac
balar-mmio: update reffile since we are using riscv64
William-An Sep 4, 2024
61c20d2
balar-mmio: add more rodinia benchmarks
William-An Sep 4, 2024
b1e0b15
balar-mmio: add rodinia hotspot reffile
William-An Sep 4, 2024
c71ee0a
balar-mmio: update test time limits
William-An Sep 7, 2024
4b12eab
balar-mmio: fix not returning value for cudaMallocHost
William-An Sep 7, 2024
58f9098
balar-mmio: update time limit and reffile for lud 256
William-An Sep 7, 2024
94f4262
balar-mmio: add rodinia pathfinder and srad reffiles
William-An Sep 7, 2024
f3fcc21
balar-mmio: split tests into different testsuites
William-An Sep 8, 2024
3342466
balar-mmio: fix args passing in testcases
William-An Sep 11, 2024
d725066
balar-mmio: increase test run time limit
William-An Sep 12, 2024
0d0554d
balar-mmio: limit nproc when making rodinia
William-An Sep 12, 2024
6fbe7a8
balar-mmio: restructure testsuite
William-An Sep 17, 2024
b4b03dc
balar-mmio: add support for cudaThreadSynchronize
William-An Sep 23, 2024
66bab78
balar-mmio: create subdirectory for each testcase
William-An Sep 24, 2024
df5a6a8
balar-mmio: add refFiles
William-An Sep 24, 2024
33b012f
balar-mmio: add support for cudaMemcpyToSymbol
William-An Sep 24, 2024
f1efc17
balar-mmio: add rodinia heartwall
William-An Sep 25, 2024
65b5917
balar-mmio: add texture api support
William-An Sep 27, 2024
6526407
balar-mmio: add heartwell ref file
William-An Dec 12, 2024
8459025
balar-mmio: remove some comments
William-An Dec 12, 2024
cf077bc
balar-mmio: clean up unused files and update dist
William-An Dec 12, 2024
b9f9964
balar-mmio: remove --disable-mpi and --disable-mem-pools
William-An Jan 11, 2025
3aef532
balar-mmio: remove doc headers
William-An Jan 11, 2025
ef89fe0
balar-mmio: create constant macros to keep things consistent
William-An Jan 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion src/sst/elements/balar/.gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
*.out
*.tmp
*.log
sst_test_outputs/
Expand All @@ -12,3 +11,6 @@ _app_cuda_version_*
gpgpu_inst_stats.txt
*.dot
*.png
*.objdump
stderr-*
stdout-*
35 changes: 33 additions & 2 deletions src/sst/elements/balar/Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,11 @@ comp_LTLIBRARIES = libbalar.la

libbalar_la_SOURCES = \
balar_event.h \
balar_packet.h \
cuda_runtime_api.h \
util.cc \
util.h \
balar_consts.h \
balarMMIO.cc \
balarMMIO.h \
dmaEngine.cc \
Expand All @@ -23,18 +25,47 @@ libbalar_la_SOURCES = \
testcpu/balarTestCPU.cc

EXTRA_DIST = \
tests/gpgpusim.config \
tests/gpu-v100-mem.cfg \
tests/testBalar_testsuite_util.py \
tests/testBalar-vanadis.py \
tests/testBalar-testcpu.py \
tests/gpgpusim.config \
tests/testsuite_default_balar.py \
tests/testsuite_default_balar_long.py \
tests/testsuite_default_balar_medium.py \
tests/testsuite_default_balar_simple.py \
tests/utils.py \
tests/balarBlock.py \
tests/memory.py \
tests/vanadisBlock.py \
tests/vanadisOS.py \
tests/vectorAdd/vecAdd.cu \
tests/vectorAdd/Makefile \
tests/vanadisHandshake/vanadisHandshake.c \
tests/vanadisHandshake/cuda_runtime_api.c \
tests/vanadisHandshake/cuda_runtime_api.h \
tests/vanadisHandshake/Makefile \
tests/vanadisLLVMRISCV/balar_vanadis.h \
tests/vanadisLLVMRISCV/cuda_runtime_api_vanadis.cc \
tests/vanadisLLVMRISCV/helloworld.c \
tests/vanadisLLVMRISCV/vecadd.cu \
tests/vanadisLLVMRISCV/Makefile \
tests/refFiles/test_gpgpu_helloworld.out \
tests/refFiles/test_gpgpu_rodinia-2.0-backprop-1024.out \
tests/refFiles/test_gpgpu_rodinia-2.0-backprop-2048.out \
tests/refFiles/test_gpgpu_rodinia-2.0-backprop-short.out \
tests/refFiles/test_gpgpu_rodinia-2.0-bfs-graph4096.out \
tests/refFiles/test_gpgpu_rodinia-2.0-bfs-SampleGraph.out \
tests/refFiles/test_gpgpu_rodinia-2.0-heartwall-1.out \
tests/refFiles/test_gpgpu_rodinia-2.0-hotspot-30-6-40.out \
tests/refFiles/test_gpgpu_rodinia-2.0-lud-64.out \
tests/refFiles/test_gpgpu_rodinia-2.0-lud-256.out \
tests/refFiles/test_gpgpu_rodinia-2.0-nn-4-3-30-90.out \
tests/refFiles/test_gpgpu_rodinia-2.0-nw-128-10.out \
tests/refFiles/test_gpgpu_rodinia-2.0-pathfinder-1000-20-5.out \
tests/refFiles/test_gpgpu_rodinia-2.0-srad_v2-128x128.out \
tests/refFiles/test_gpgpu_rodinia-2.0-streamcluster-3_6_16_1024_1024_100_none_1.out \
tests/refFiles/test_gpgpu_vanadisHandshake.out \
tests/refFiles/test_gpgpu_vecadd.out \
tests/refFiles/test_gpgpu_vectorAdd.out

libbalar_la_LDFLAGS = \
Expand Down
150 changes: 130 additions & 20 deletions src/sst/elements/balar/README.md
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gvoskuilen @feldergast Do we want to keep the prerequisites in this readme or remove them in favor of the list that we test against? Already discussed what testing we want in the nightlies versus weeklies.

Original file line number Diff line number Diff line change
@@ -1,37 +1,35 @@
# Balar

SST GPGPU Simulation Components
SST GPGPU-Sim integration component.
Support running CUDA programs via trace-driven (require GPU for trace generation) and direct-execution (CUDA program runs with Vanadis RISC-V CPU component).

## Installation

(Noted some of the components are pointed to personal repositories and will be moved once PRs are done)
Balar tested with the following settings:

### Prerequisites

Balar is tested with the following settings:

- gcc 7.5.0
- Ubuntu 18.04
- CUDA 10.1
- gcc 11.4.0
- Ubuntu 22.04.4 LTS
- CUDA 11.7

The following components are needed to run Balar:

### [`sst-core`](https://github.com/sstsimulator/sst-core/tree/0f358dda178f96db3b0da88b2b965492c4be187d)
### General

#### [`sst-core`](https://github.com/sstsimulator/sst-core)

- Tested on commit `0f358dda178f96db3b0da88b2b965492c4be187d`
- Use `./configure --prefix=$SST_CORE_HOME --disable-mpi --disable-mem-pools` for sst-core config
- Use `./configure --prefix=$SST_CORE_HOME` for sst-core config

### [`sst-elements`](https://github.com/William-An/sst-elements/tree/balar-mmio)
#### [`sst-elements`](https://github.com/sstsimulator/sst-elements)

- Use `./configure --prefix=$SST_ELEMENTS_HOME --with-sst-core=$SST_CORE_HOME --with-cuda=$CUDA_INSTALL_PATH --with-gpgpusim=$GPGPUSIM_ROOT` for sst-elements config
- `$CUDA_INSTALL_PATH` should point to CUDA toolkit path
- `$GPGPUSIM_ROOT` will be set when sourcing the `setup_environment` script in `GPGPU-Sim`, which should point to its folder path

### [`GPGPU-Sim`](https://github.com/William-An/gpgpu-sim_distribution/tree/sst-integration)
#### [`GPGPU-Sim`](https://github.com/accel-sim/gpgpu-sim_distribution)

```sh
# Pull GPGPU-Sim repo
git clone [email protected]:William-An/gpgpu-sim_distribution.git
git clone [email protected]:accel-sim/gpgpu-sim_distribution.git
cd gpgpu-sim_distribution

# Make sure $CUDA_INSTALL_PATH is set
Expand All @@ -49,13 +47,15 @@ make -j
> # GPGPU-Sim dependencies
> sudo apt-get install build-essential xutils-dev bison zlib1g-dev flex libglu1-mesa-dev```

### [`cudaAPITracer`](https://github.com/William-An/accel-sim-framework/tree/cuda_api_tracer)
### Trace-driven

We put the CUDA api tracer tool inside [Accel-Sim](https://github.com/William-An/accel-sim-framework/tree/cuda_api_tracer) framework in folder `ACCEL-SIM/util/tracer_nvbit/others/cuda_api_tracer_tool`, to install it:
#### [`cudaAPITracer`](https://github.com/accel-sim/accel-sim-framework)

We put the CUDA api tracer tool inside [Accel-Sim](https://github.com/accel-sim/accel-sim-framework) framework in folder `ACCEL-SIM/util/tracer_nvbit/others/cuda_api_tracer_tool`, to install it:

```shell
# Get the Accel-Sim
git pull https://github.com/William-An/accel-sim-framework/tree/cuda_api_tracer
# Get the Accel-Sim framework
git clone git@github.com:accel-sim/accel-sim-framework.git

# cd into tracer tool folder
cd accel-sim-framework/util/tracer_nvbit
Expand Down Expand Up @@ -86,13 +86,74 @@ Which will generate the following files when exiting:

> Noted the API tracer need a machine with GPU

### Direct-execution

In order to run CUDA program directly, the CUDA source code needs to be recompiled with LLVM and RISC-V GCC toolchain.

#### LLVM and RISCV GCC toolchain

```bash
## Build LLVM with RISC-V, x86, and CUDA support from source
git clone https://github.com/llvm/llvm-project.git

mkdir llvm-install
cd llvm-project
mkdir build && cd build
cmake -DLLVM_TARGETS_TO_BUILD="RISCV;X86;NVPTX" -DLLVM_DEFAULT_TARGET_TRIPLE=riscv64-unknown-linux-gnu \
-DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lld" -DCMAKE_INSTALL_PREFIX=$LLVM_INSTALL_PATH ../llvm
cmake --build . -j30
cmake --build . --target install
cd ..

## Build RISC-V GCC toolchain
git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git

mkdir riscv-gnu-install
cd riscv-gnu-toolchain
./configure --prefix=$RISCV_INSTALL_PATH
make linux -j
cd ..

# Set up environment vars to LLVM and RISCV GCC
export LLVM_INSTALL_PATH=$(pwd)/llvm-install
export RISCV_TOOLCHAIN_INSTALL_PATH=$(pwd)/riscv-gnu-install
# Match with the GPU config file we have
export GPU_ARCH=sm_70
```

#### Compiling CUDA for Balar + Vanadis

In order to run CUDA on Balar + Vanadis (direct-execution), aside from the compilers, the custom CUDA library `libcudart_vanadis` (inside `vanadisLLVMRISCV`) is also needed to intercept CUDA API calls and send them to Balar's MMIO address. This custom CUDA lib can be made via `make -C vanadisLLVMRISCV vanadis_cuda`, which would generate the `libcudart_vanadis.a/so`.

For compiler and linker flags, you can refer to the `vecadd` target in `vanadisLLVMRISCV/Makefile`.

#### GPU Application Collection

We are working on getting a collection of GPU apps to run with Balar+Vanadis. Those benchmarks are from the [gpu-app-collection](https://github.com/accel-sim/gpu-app-collection) repo.

```bash
git clone [email protected]:accel-sim/gpu-app-collection.git
cd gpu-app-collection
git checkout sst_support

# Setup environ vars for apps, need to have
# env var LLVM_INSTALL_PATH and RISCV_TOOLCHAIN_INSTALL_PATH
# If you plan to compile the apps directly, you will
# also need to set SST_CUSTOM_CUDA_LIB_PATH to
# the directory of the custom CUDA library,
# which normally will be `SST-ELEMENT-SOURCE/src/sst/elements/balar/tests/vanadisLLVMRISCV`
source ./src/setup_environment sst
```

## Usage

### Trace-driven Mode

After successful compilation and installation of SST core and SST elements (with GPGPUSim and CUDA), run:

```bash
# cd into balar
cd $SST_ELEMENTS_HOME/src/sst/balar
cd SST_ELEMENTS_SRC/src/sst/elements/balar

# balar tests
cd tests/
Expand All @@ -115,3 +176,52 @@ make -C vanadisHandshake/
# Run the handshake binary with vanadis core
sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg'
```

### Vanadis Mode

The CUDA executable should be passed in `VANADIS_EXE` and `BALAR_CUDA_EXE_PATH`. If there are args to the program, they should be passed with `VANADIS_EXE_ARGS`.

```bash
# cd into balar tests
cd SST_ELEMENTS_SRC/src/sst/elements/balar/tests/

# Compile test programs
make -C vanadisLLVMRISCV

# Run CPU only program
VANADIS_EXE=./vanadisLLVMRISCV/helloworld VANADIS_ISA=RISCV64 sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg'

# Run sample vecadd
VANADIS_EXE=./vanadisLLVMRISCV/vecadd VANADIS_ISA=RISCV64 BALAR_CUDA_EXE_PATH=./vanadisLLVMRISCV/vecadd sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg'
```

### Running GPU Benchmark

Here is an example on running Rodinia 2.0 BFS with SampleGraph.txt input using CUDA 11.7. For different CUDA version, the binary path will differ in terms of version number.

```bash
# Let GPU app knows about the custom CUDA lib
export SST_CUSTOM_CUDA_LIB_PATH=SST_ELEMENTS_SRC/src/sst/elements/balar/tests/vanadisLLVMRISCV

# Make Rodinia 2.0
cd gpu-app-collection
make rodinia_2.0-ft -i -j -C ./src
make data -C ./src
cd ..

# Run BFS with sample graph input
cd SST_ELEMENTS_SRC/src/sst/elements/balar/tests
VANADIS_EXE=$GPUAPPS_ROOT/bin/11.7/release/bfs-rodinia-2.0-ft \
VANADIS_EXE_ARGS=$GPUAPPS_ROOT/data_dirs/cuda/rodinia/2.0-ft/bfs-rodinia-2.0-ft/data/SampleGraph.txt \
VANADIS_ISA=RISCV64 \
BALAR_CUDA_EXE_PATH=$GPUAPPS_ROOT/bin/11.7/release/bfs-rodinia-2.0-ft sst \
testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg'
```

### Running Unittest

Balar's unittest suites will automatically compile the GPU app collection with the LLVM and RISCV toolchain and run them.

```bash
sst-test-elements -w "*balar*"
```
Loading
Loading