Final artifact evaluation scripts

Signed-off-by: Lazar Cvetković <[email protected]>
eth-easl · Aug 20, 2024 · 48ecc17 · 48ecc17
1 parent 3878b99
commit 48ecc17
Show file tree

Hide file tree

Showing 14 changed files with 23 additions and 41 deletions.
diff --git a/artifact_evaluation/.gitignore b/artifact_evaluation/.gitignore
@@ -1 +1,2 @@
-*.png
+*.png
+!sample_results/*.png
diff --git a/artifact_evaluation/README.md b/artifact_evaluation/README.md
@@ -1,11 +1,11 @@
 ## Dirigent Artifact Evaluation Instructions
 
-The following experiments aim to repeat results from Figures 7, 9, and 10, i.e., main results from sections 5.3 and 5.2.1, which constitute the biggest contribution of the paper.
+The following experiments aim to repeat results from Figures 7, 9, and 10, i.e., main results from sections 5.3 and 5.2.1, which constitute the biggest contribution of the paper. We chose not to describe Firecracker experiments as they are very complicated and time-consuming.
 
-Time burden: We expect you will need at most a day of active work to run all the experiments.
+Time burden: We expect you will need at 3-5 hours to run the experiments we describe below.
 
 Prerequisites:
-- Cloudlab cluster of at least 20 xl170 machines instantiated using `maestro_sosp24ae` Cloudlab profile (`https://www.cloudlab.us/p/faas-sched/maestro_sosp24ae`).
+- Cloudlab cluster of at least 20 xl170 machines instantiated using `maestro_sosp24ae` Cloudlab profile (`https://www.cloudlab.us/p/faas-sched/maestro_sosp24ae`). We recommend using a 27-node cluster.
 - Chrome Cloudlab extension - install from https://github.com/eth-easl/cloudlab_extension
 
 Order of experiments to run experiments:
@@ -17,27 +17,20 @@ Order of experiments to run experiments:
 - **Reload the cluster through Cloudlab interface**
 
 
-- Cold start sweep - Firecracker (instructions in `cold_start_sweep/dirigent`)
-- Azure 500 - Firecracker (instructions in `azure_500/dirigent`)
-- **Plot the data and verify** (run `run_plotting_scripts.sh`)
-- **Reload the cluster through Cloudlab interface**
-
-
 - Azure 500 - Knative/K8s (instructions in `azure_500/knative`)
 - Cold start sweep - Knative/K8s (instructions in `cold_start_sweep/knative`)
 - **Plot the data and verify** (run `run_plotting_scripts.sh`)
 
 Notes:
 - For simplicity and because we cannot guarantee artifact evaluators a huge Cloudlab cluster, we will be running all experiments in mode where Dirigent and Knative/K8s components are not replicated. We also ran experiment in an environment where components run in high-availability mode, but we did not notice significant performance differences. Our deployment scripts put load generator on `node0`, control plane on `node1`, data plane on `node2`, whereas all the other nodes serve as worker nodes.
 - All the plotting scripts are configured to work out of the box provided you placed experiment results in correct folders.
-- For Firecracker experiments on Cloudlab, we noticed some disk operation delays while creating Firecracker snapshots. First 10 minutes of experiments on a new cluster may experience a lot of timeouts. You should discard these measurements. The problems resolve after ~10 minutes on their own, assuming snapshots creation was triggered on each node. To make sure a snapshot was created on each node run Firecracker cold start sweep for a couple of minutes.
 - Traces for the experiments described here are stored on Git LFS. Make sure you pull these files before proceeding further.
 - Default Cloudlab shell should be `bash`. You can configure when logged in here `https://www.cloudlab.us/myaccount.php`.
 
 Instructions to set up a Dirigent cluster:
 - Make sure the cluster is in a reloaded state, i.e., that neither Dirigent nor Knative is not running. 
 - Clone Dirigent locally (`git clone https://github.com/eth-easl/dirigent.git`)
-- Set sandbox runtime (`containerd` or `firecracker`) by editing `WORKER_RUNTIME` in `./scripts/setup.cfg`
+- Set sandbox runtime (`containerd`) by editing `WORKER_RUNTIME` in `./scripts/setup.cfg`
 - Open Cloudlab experiment, open Cloudlab extension, and copy list of all addresses (RAW) using the extension. This puts the list of all nodes in your clipboard in format requested by the scripts below.
 - Run locally `./scripts/remote_install.sh`. Arguments should be the copied list of addresses from the previous step. For example, `./scripts/remote_install.sh user@node0 user@node1 user@node2`. This script should be executed only once.
 - Run locally `./scripts/remote_start_cluster.sh user@node0 user@node1 user@node2`. After this step Dirigent cluster should be operational. This script can be executed to restart Dirigent cluster in case you experience issues without reloading the Cloudlab cluster.
@@ -50,6 +43,7 @@ Instructions to set up Knative/K8s baseline cluster:
 - After a couple of minutes, once the script has completed executing, the cluster should be running, and you can ssh into `node0`. Execute `kubectl get pods -A` and verify that installation has completed successfully by checking that all pods are in `Running` or `Completed` state.
 
 Results expectation/interpretation:
-- Since we cannot guarantee artifact evaluators access to a 100-node cluster over a 2-week artifact evaluation period, there will be some performance degradation than what we show in the paper.
-  - For cold start sweep, the throughput we show in Figure 7 will be reduced, as worker nodes become the bottleneck. What you should verify is that the cold start throughput conforms to the following inequalities -- `Knative/K8s throughtput << Maestro - containerd throughtput < Maestro - Firecracker throughtput` and `Knative/K8s latency >> Maestro latency`.
-  - For Azure 500 trace experiments, per-function slowdown of containerd and Firecracker should almost be identical. The workload on Knative/K8s should be worse, and should suffer from a long tail. Per-invocation scheduling latency for Dirigent should be better almost all the time, and the average per-function scheduling latency of Dirigent should be by a couple of orders of magnitude better than with Knative/K8s.
+- Since we cannot guarantee artifact evaluators access to a 100-node cluster over a 2-week artifact evaluation period, the performance on the smaller cluster will be slightly degraded.
+  - For cold start sweep, the throughput we show in Figure 7 will be reduced, as worker nodes become the bottleneck. What you should verify is that the cold start throughput conforms to the following inequalities -- `Knative/K8s throughtput << Dirigent - containerd` and `Knative/K8s latency >> Dirigent - containerd latency`.
+  - For Azure 500 trace experiments, the workload on Knative/K8s should be worse, and should suffer from a long tail. Per-invocation scheduling latency for Dirigent should be better almost all the time, and the average per-function scheduling latency of Dirigent should be by a couple of orders of magnitude better than with Knative/K8s.
+  - You can see the results we got on a 17-node cluster in `sample_results` directory.
diff --git a/artifact_evaluation/azure_500/dirigent/.gitignore b/artifact_evaluation/azure_500/dirigent/.gitignore
@@ -1,2 +1,2 @@
-results_azure_500
-results_azure_500_firecracker
+results_azure_500*
+results_azure_500_firecracker*
diff --git a/artifact_evaluation/azure_500/dirigent/INSTRUCTIONS.md b/artifact_evaluation/azure_500/dirigent/INSTRUCTIONS.md
@@ -2,16 +2,16 @@
 
 Time required: 10 min to set up environment and 30 min per experiment
 
-Description: This experiment runs the downsampled Azure trace with 500 functions. Instructions for running trace with containerd and Firecracker are the same, except that the cluster is deployed differently, as given in the `README.md` in the main folder of artifact evaluation. For Firecracker, make sure to use the trace with suffix `_firecracker`. We recommend you follow the order of experiments as given in the `README.md`. 
+Description: This experiment runs the downsampled Azure trace with 500 functions. We recommend you follow the order of experiments as given in the `README.md`. 
 
 Instructions:
 - Start Dirigent cluster as per instructions located in the root folder of artifact evaluation instructions.
-- On the `node0` execute `mkdir -p ~/invitro/data/traces/azure_500` or `mkdir -p ~/invitro/data/traces/azure_500_firecracker`, depending on which runtime you use. 
-- Copy traces from this folder to `node0` using `scp azure_500/* user@node0:~/invitro/data/traces/azure_500/` or `scp azure_500_firecracker/* user@node0:~/invitro/data/traces/azure_500_firecracker/`.
-- Make sure on `node0` `~/invitro` branch is `rps_mode`. With text editor open `~/invitro/cmd/config_dirigent_trace.json` and change TracePath to match `azure_500` or `azure_500_firecracker`.
+- On the `node0` execute `mkdir -p ~/invitro/data/traces/azure_500`.
+- Copy traces from this folder to `node0` using `scp azure_500/* user@node0:~/invitro/data/traces/azure_500/`.
+- Make sure on `node0` `~/invitro` branch is `rps_mode`. With text editor open `~/invitro/cmd/config_dirigent_trace.json` and change TracePath to match `data/traces/azure_500`.
 - On your local machine run `./scripts/start_resource_monitoring.sh user@node0 user@node1 user@node2`. 
 - Run the load generator in screen/tmux on `node0` with `cd ~/invitro; go run cmd/loader.go --config cmd/config_dirigent_trace.json`. Wait until the experiment completed (~30 minutes). There should be ~170K invocations, with a negligible failure rate.
 - Gather experiment results. Make sure you do not overwrite data from the other experiment, and you place results in correct folders.
-  - Create folders for storing results with `mkdir -p ./artifact_evaluation/azure_500/dirigent/results_azure_500` and `mkdir -p ./artifact_evaluation/azure_500/dirigent/results_azure_500_firecracker`
+  - Create folders for storing results with `mkdir -p ./artifact_evaluation/azure_500/dirigent/results_azure_500`.
   - Copy load generator output with `scp user@node0:~/invitro/data/out/experiment_duration_30.csv results_azure_500/`
   - Copy resource utilization data with `mkdir -p ./artifact_evaluation/azure_500/dirigent/results_azure_500/cpu_mem_usage && ./scripts/collect_resource_monitoring.sh ./artifact_evaluation/azure_500/dirigent/results_azure_500/cpu_mem_usage user@node0 user@node1 user@node2`.
diff --git a/artifact_evaluation/azure_500/dirigent/azure_500_firecracker/dirigent.csv b/artifact_evaluation/azure_500/dirigent/azure_500_firecracker/dirigent.csv
diff --git a/artifact_evaluation/azure_500/dirigent/azure_500_firecracker/durations.csv b/artifact_evaluation/azure_500/dirigent/azure_500_firecracker/durations.csv
diff --git a/artifact_evaluation/azure_500/dirigent/azure_500_firecracker/invocations.csv b/artifact_evaluation/azure_500/dirigent/azure_500_firecracker/invocations.csv
diff --git a/artifact_evaluation/azure_500/dirigent/azure_500_firecracker/memory.csv b/artifact_evaluation/azure_500/dirigent/azure_500_firecracker/memory.csv
diff --git a/artifact_evaluation/cold_start_sweep/dirigent/.gitignore b/artifact_evaluation/cold_start_sweep/dirigent/.gitignore
@@ -1,2 +1,2 @@
-results_containerd
-results_firecracker
+results_containerd*
+results_firecracker*
diff --git a/artifact_evaluation/cold_start_sweep/dirigent/INSTRUCTIONS.md b/artifact_evaluation/cold_start_sweep/dirigent/INSTRUCTIONS.md
@@ -2,12 +2,12 @@
 
 Time required: 10 min to set up environment and 2-3 min per data point
 
-Description: This experiment triggers cold start in Maestro cluster. You should sweep the load until the cluster saturates, which will be visible on the latency plot. We suggest running experiments with 1, 10, 100, 250, 500, 750, 1000, ... RPS and observing the latency after conducting experiment for each data point. Low RPS (<10 RPS) rates should be run for 3-5 minutes, because of warmup, while any higher load can be run for just a minute. Always discard the results of the first experiment when starting a new cluster, as these measurements include image pull latency, which pollutes the measurements (can be seen as high p99 at low RPS). The instruction is for running experiments is the same for containerd and Firecracker, except the deployment method explained in `README.md` and `RpsImage` load generator field.
+Description: This experiment triggers cold start in Maestro cluster. You should sweep the load until the cluster saturates, which will be visible on the latency plot. We suggest running experiments with 1, 10, 100, 200, 300, ... RPS and observing the latency after conducting experiment for each data point. Low RPS (<10 RPS) rates should be run for 3-5 minutes, because of warmup, while any higher load can be run for just a minute. Always discard the results of the first experiment when starting a new cluster, as these measurements include image pull latency, which pollutes the measurements (can be seen as high p99 at low RPS).
 
 Instructions:
 - Start Dirigent cluster according to instructions located in the root folder of artifact evaluation instructions (`README.md`). You can reuse the existing cluster running Dirigent containerd.
-- On remote machine `node0` open `~/invitro/cmd/config_dirigent_rps.json`. Set `RpsColdStartRatioPercentage` to `100`, and sweep the load with `RpsTarget` while configuring `ExperimentDuration` according to instructions above. For higher RPS (>1000), it might be necessary to increase `RpsCooldownSeconds`, which controls the number of functions that are deployed in the cluster to achieve the requested RPS. Set `GRPCFunctionTimeoutSeconds` to `15`. For containerd experiments make sure `RpsImage` is set to `docker.io/cvetkovic/dirigent_empty_function:latest`, whereas for Firecracker experiments this field should be set to `empty`.
+- On remote machine `node0` open `~/invitro/cmd/config_dirigent_rps.json`. Set `RpsColdStartRatioPercentage` to `100`, and sweep the load with `RpsTarget` while configuring `ExperimentDuration` according to instructions above. For higher RPS (>1000), it might be necessary to increase `RpsCooldownSeconds`, which controls the number of functions that are deployed in the cluster to achieve the requested RPS. Set `GRPCFunctionTimeoutSeconds` to `15`. For containerd experiments make sure `RpsImage` is set to `docker.io/cvetkovic/dirigent_empty_function:latest`.
 - Start RPS experiment by running `cd ~/invitro; go run cmd/loader.go --config cmd/config_dirigent_rps.json`.
-- Create folder storing results with `mkdir -p ./artifact_evaluation/cold_start_sweep/dirigent/results_containerd` or `mkdir -p ./artifact_evaluation/cold_start_sweep/dirigent/results_firecracker`.
+- Create folder storing results with `mkdir -p ./artifact_evaluation/cold_start_sweep/dirigent/results_containerd`.
 - Gather results located in `data/out/experiment_duration_X.csv` and copy them to your local machine in format `rps_X.csv` to the folder you created in the previous step.
 - Repeat for different RPS values until the cluster saturates, which you can see by plotting the data with the provided script.
diff --git a/artifact_evaluation/sample_results/azure_500_per_function_slowdown.png b/artifact_evaluation/sample_results/azure_500_per_function_slowdown.png
diff --git a/artifact_evaluation/sample_results/azure_500_scheduling_latency.png b/artifact_evaluation/sample_results/azure_500_scheduling_latency.png
diff --git a/artifact_evaluation/sample_results/cold_start_sweep.png b/artifact_evaluation/sample_results/cold_start_sweep.png
diff --git a/scripts/remote_install.sh b/scripts/remote_install.sh
@@ -18,8 +18,7 @@ function AddSshKeys() {
 }
 
 function SetupNode() {
-    AddSshKeys $1
-    RemoteExec $1 'if [ ! -d ~/cluster_manager ];then git clone [email protected]:eth-easl/dirigent.git cluster_manager; fi'
+    RemoteExec $1 'if [ ! -d ~/cluster_manager ];then git clone https://github.com/eth-easl/dirigent.git cluster_manager; fi'
     RemoteExec $1 "bash ~/cluster_manager/scripts/setup_node.sh $2 $WORKER_RUNTIME"
     # LFS pull for VM kernel image and rootfs
     RemoteExec $1 'cd ~/cluster_manager; git pull; git lfs pull'