sstsimulator · William-An · Apr 7, 2023 · Dec 5, 2023 · Dec 5, 2023 · Dec 19, 2023
diff --git a/src/sst/elements/balar/.gitignore b/src/sst/elements/balar/.gitignore
@@ -1,4 +1,3 @@
-*.out
 *.tmp
 *.log
 sst_test_outputs/
@@ -12,3 +11,6 @@ _app_cuda_version_*
 gpgpu_inst_stats.txt
 *.dot
 *.png
+*.objdump
+stderr-*
+stdout-*
diff --git a/src/sst/elements/balar/Makefile.am b/src/sst/elements/balar/Makefile.am
@@ -12,9 +12,11 @@ comp_LTLIBRARIES = libbalar.la
 
 libbalar_la_SOURCES = \
 	balar_event.h \
+	balar_packet.h \
 	cuda_runtime_api.h \
 	util.cc \
 	util.h \
+	balar_consts.h \
 	balarMMIO.cc \
 	balarMMIO.h \
 	dmaEngine.cc \
@@ -23,18 +25,47 @@ libbalar_la_SOURCES = \
 	testcpu/balarTestCPU.cc
 
 EXTRA_DIST = \
+	tests/gpgpusim.config \
 	tests/gpu-v100-mem.cfg \
+	tests/testBalar_testsuite_util.py \
 	tests/testBalar-vanadis.py \
 	tests/testBalar-testcpu.py \
-	tests/gpgpusim.config \
-	tests/testsuite_default_balar.py \
+	tests/testsuite_default_balar_long.py \
+	tests/testsuite_default_balar_medium.py \
+	tests/testsuite_default_balar_simple.py \
 	tests/utils.py \
+	tests/balarBlock.py \
+	tests/memory.py \
+	tests/vanadisBlock.py \
+	tests/vanadisOS.py \
 	tests/vectorAdd/vecAdd.cu \
 	tests/vectorAdd/Makefile \
 	tests/vanadisHandshake/vanadisHandshake.c \
 	tests/vanadisHandshake/cuda_runtime_api.c \
 	tests/vanadisHandshake/cuda_runtime_api.h \
 	tests/vanadisHandshake/Makefile \
+	tests/vanadisLLVMRISCV/balar_vanadis.h \
+	tests/vanadisLLVMRISCV/cuda_runtime_api_vanadis.cc \
+	tests/vanadisLLVMRISCV/helloworld.c \
+	tests/vanadisLLVMRISCV/vecadd.cu \
+	tests/vanadisLLVMRISCV/Makefile \
+	tests/refFiles/test_gpgpu_helloworld.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-backprop-1024.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-backprop-2048.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-backprop-short.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-bfs-graph4096.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-bfs-SampleGraph.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-heartwall-1.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-hotspot-30-6-40.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-lud-64.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-lud-256.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-nn-4-3-30-90.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-nw-128-10.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-pathfinder-1000-20-5.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-srad_v2-128x128.out \
+	tests/refFiles/test_gpgpu_rodinia-2.0-streamcluster-3_6_16_1024_1024_100_none_1.out \
+	tests/refFiles/test_gpgpu_vanadisHandshake.out \
+	tests/refFiles/test_gpgpu_vecadd.out \
 	tests/refFiles/test_gpgpu_vectorAdd.out
 
 libbalar_la_LDFLAGS = \

diff --git a/src/sst/elements/balar/README.md b/src/sst/elements/balar/README.md
@@ -1,37 +1,35 @@
 # Balar
 
-SST GPGPU Simulation Components
+SST GPGPU-Sim integration component.
+Support running CUDA programs via trace-driven (require GPU for trace generation) and direct-execution (CUDA program runs with Vanadis RISC-V CPU component).
 
 ## Installation
 
-(Noted some of the components are pointed to personal repositories and will be moved once PRs are done)
+Balar tested with the following settings:
 
-### Prerequisites
-
-Balar is tested with the following settings:
-
-- gcc 7.5.0
-- Ubuntu 18.04
-- CUDA 10.1
+- gcc 11.4.0
+- Ubuntu 22.04.4 LTS
+- CUDA 11.7
 
 The following components are needed to run Balar:
 
-### [`sst-core`](https://github.com/sstsimulator/sst-core/tree/0f358dda178f96db3b0da88b2b965492c4be187d)
+### General
+
+#### [`sst-core`](https://github.com/sstsimulator/sst-core)
 
-- Tested on commit `0f358dda178f96db3b0da88b2b965492c4be187d`
-- Use `./configure --prefix=$SST_CORE_HOME --disable-mpi --disable-mem-pools` for sst-core config
+- Use `./configure --prefix=$SST_CORE_HOME` for sst-core config
 
-### [`sst-elements`](https://github.com/William-An/sst-elements/tree/balar-mmio)
+#### [`sst-elements`](https://github.com/sstsimulator/sst-elements)
 
 - Use `./configure --prefix=$SST_ELEMENTS_HOME --with-sst-core=$SST_CORE_HOME --with-cuda=$CUDA_INSTALL_PATH --with-gpgpusim=$GPGPUSIM_ROOT` for sst-elements config
 - `$CUDA_INSTALL_PATH` should point to CUDA toolkit path
 - `$GPGPUSIM_ROOT` will be set when sourcing the `setup_environment` script in `GPGPU-Sim`, which should point to its folder path
 
-### [`GPGPU-Sim`](https://github.com/William-An/gpgpu-sim_distribution/tree/sst-integration)
+#### [`GPGPU-Sim`](https://github.com/accel-sim/gpgpu-sim_distribution)
 
 ```sh
 # Pull GPGPU-Sim repo
-git clone [email protected]:William-An/gpgpu-sim_distribution.git
+git clone [email protected]:accel-sim/gpgpu-sim_distribution.git
 cd gpgpu-sim_distribution
 
 # Make sure $CUDA_INSTALL_PATH is set
@@ -49,13 +47,15 @@ make -j
 > # GPGPU-Sim dependencies
 > sudo apt-get install build-essential xutils-dev bison zlib1g-dev flex libglu1-mesa-dev```
 
-### [`cudaAPITracer`](https://github.com/William-An/accel-sim-framework/tree/cuda_api_tracer)
+### Trace-driven
 
-We put the CUDA api tracer tool inside [Accel-Sim](https://github.com/William-An/accel-sim-framework/tree/cuda_api_tracer) framework in folder `ACCEL-SIM/util/tracer_nvbit/others/cuda_api_tracer_tool`, to install it:
+#### [`cudaAPITracer`](https://github.com/accel-sim/accel-sim-framework)
+
+We put the CUDA api tracer tool inside [Accel-Sim](https://github.com/accel-sim/accel-sim-framework) framework in folder `ACCEL-SIM/util/tracer_nvbit/others/cuda_api_tracer_tool`, to install it:
 
 ```shell
-# Get the Accel-Sim
-git pull https://github.com/William-An/accel-sim-framework/tree/cuda_api_tracer
+# Get the Accel-Sim framework
+git clone git@github.com:accel-sim/accel-sim-framework.git
 
 # cd into tracer tool folder
 cd accel-sim-framework/util/tracer_nvbit
@@ -86,13 +86,74 @@ Which will generate the following files when exiting:
 
 > Noted the API tracer need a machine with GPU
 
+### Direct-execution
+
+In order to run CUDA program directly, the CUDA source code needs to be recompiled with LLVM and RISC-V GCC toolchain.
+
+#### LLVM and RISCV GCC toolchain
+
+```bash
+## Build LLVM with RISC-V, x86, and CUDA support from source
+git clone https://github.com/llvm/llvm-project.git
+
+mkdir llvm-install
+cd llvm-project
+mkdir build && cd build
+cmake -DLLVM_TARGETS_TO_BUILD="RISCV;X86;NVPTX" -DLLVM_DEFAULT_TARGET_TRIPLE=riscv64-unknown-linux-gnu \
+      -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lld" -DCMAKE_INSTALL_PREFIX=$LLVM_INSTALL_PATH ../llvm
+cmake --build . -j30
+cmake --build . --target install
+cd ..
+
+## Build RISC-V GCC toolchain
+git clone https://github.com/riscv-collab/riscv-gnu-toolchain.git
+
+mkdir riscv-gnu-install
+cd riscv-gnu-toolchain
+./configure --prefix=$RISCV_INSTALL_PATH
+make linux -j
+cd ..
+
+# Set up environment vars to LLVM and RISCV GCC
+export LLVM_INSTALL_PATH=$(pwd)/llvm-install
+export RISCV_TOOLCHAIN_INSTALL_PATH=$(pwd)/riscv-gnu-install
+# Match with the GPU config file we have
+export GPU_ARCH=sm_70
+```
+
+#### Compiling CUDA for Balar + Vanadis
+
+In order to run CUDA on Balar + Vanadis (direct-execution), aside from the compilers, the custom CUDA library `libcudart_vanadis` (inside `vanadisLLVMRISCV`) is also needed to intercept CUDA API calls and send them to Balar's MMIO address. This custom CUDA lib can be made via `make -C vanadisLLVMRISCV vanadis_cuda`, which would generate the `libcudart_vanadis.a/so`.
+
+For compiler and linker flags, you can refer to the `vecadd` target in `vanadisLLVMRISCV/Makefile`.
+
+#### GPU Application Collection
+
+We are working on getting a collection of GPU apps to run with Balar+Vanadis. Those benchmarks are from the [gpu-app-collection](https://github.com/accel-sim/gpu-app-collection) repo.
+
+```bash
+git clone [email protected]:accel-sim/gpu-app-collection.git
+cd gpu-app-collection
+git checkout sst_support
+
+# Setup environ vars for apps, need to have
+# env var LLVM_INSTALL_PATH and RISCV_TOOLCHAIN_INSTALL_PATH
+# If you plan to compile the apps directly, you will 
+# also need to set SST_CUSTOM_CUDA_LIB_PATH to 
+# the directory of the custom CUDA library,
+# which normally will be `SST-ELEMENT-SOURCE/src/sst/elements/balar/tests/vanadisLLVMRISCV`
+source ./src/setup_environment sst
+```
+
 ## Usage
 
+### Trace-driven Mode
+
 After successful compilation and installation of SST core and SST elements (with GPGPUSim and CUDA), run:
 
 ```bash
 # cd into balar
-cd $SST_ELEMENTS_HOME/src/sst/balar
+cd SST_ELEMENTS_SRC/src/sst/elements/balar
 
 # balar tests
 cd tests/
@@ -115,3 +176,52 @@ make -C vanadisHandshake/
 # Run the handshake binary with vanadis core
 sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg'
 ```
+
+### Vanadis Mode
+
+The CUDA executable should be passed in `VANADIS_EXE` and `BALAR_CUDA_EXE_PATH`. If there are args to the program, they should be passed with `VANADIS_EXE_ARGS`.
+
+```bash
+# cd into balar tests
+cd SST_ELEMENTS_SRC/src/sst/elements/balar/tests/
+
+# Compile test programs
+make -C vanadisLLVMRISCV
+
+# Run CPU only program
+VANADIS_EXE=./vanadisLLVMRISCV/helloworld VANADIS_ISA=RISCV64 sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg'
+
+# Run sample vecadd
+VANADIS_EXE=./vanadisLLVMRISCV/vecadd VANADIS_ISA=RISCV64 BALAR_CUDA_EXE_PATH=./vanadisLLVMRISCV/vecadd sst testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg'
+```
+
+### Running GPU Benchmark
+
+Here is an example on running Rodinia 2.0 BFS with SampleGraph.txt input using CUDA 11.7. For different CUDA version, the binary path will differ in terms of version number.
+
+```bash
+# Let GPU app knows about the custom CUDA lib
+export SST_CUSTOM_CUDA_LIB_PATH=SST_ELEMENTS_SRC/src/sst/elements/balar/tests/vanadisLLVMRISCV
+
+# Make Rodinia 2.0
+cd gpu-app-collection
+make rodinia_2.0-ft -i -j -C ./src
+make data -C ./src
+cd ..
+
+# Run BFS with sample graph input
+cd SST_ELEMENTS_SRC/src/sst/elements/balar/tests
+VANADIS_EXE=$GPUAPPS_ROOT/bin/11.7/release/bfs-rodinia-2.0-ft \
+VANADIS_EXE_ARGS=$GPUAPPS_ROOT/data_dirs/cuda/rodinia/2.0-ft/bfs-rodinia-2.0-ft/data/SampleGraph.txt \
+VANADIS_ISA=RISCV64 \
+BALAR_CUDA_EXE_PATH=$GPUAPPS_ROOT/bin/11.7/release/bfs-rodinia-2.0-ft sst \
+testBalar-vanadis.py --model-options='-c gpu-v100-mem.cfg'
+```
+
+### Running Unittest
+
+Balar's unittest suites will automatically compile the GPU app collection with the LLVM and RISCV toolchain and run them.
+
+```bash
+sst-test-elements -w "*balar*"
+```