-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Signed-off-by: Lazar Cvetković <[email protected]>
- Loading branch information
Showing
5 changed files
with
42 additions
and
29 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,16 @@ | ||
## Azure 500 on Dirigent | ||
|
||
Time required: 10 min to set up environment and 30 min per experiment | ||
|
||
Description: This experiment runs the downsampled Azure trace with 500 functions. First run all the experiments with containerd, as given in the main `README.md`, and then deploy the cluster again, just that time with Firecracker. The procedure for running experiments is the same, just the trace with suffix `_firecracker` should be used. | ||
Description: This experiment runs the downsampled Azure trace with 500 functions. Instructions for running trace with containerd and Firecracker are the same, except that the cluster is deployed differently, as given in the `README.md` in the main folder of artifact evaluation. For Firecracker, make sure to use the trace with suffix `_firecracker`. We recommend you follow the order of experiments as given in the `README.md`. | ||
|
||
Instructions: | ||
- Start Dirigent cluster as per instructions located in the root folder of artifact evaluation instructions | ||
- On the `node0` execute `mkdir -p ~/invitro/data/traces/azure_500` and `mkdir -p ~/invitro/data/traces/azure_500_firecracker` Copy traces `scp azure_500/* user@node0:~/invitro/data/traces/azure_500/` and `scp azure_500_firecracker/* user@node0:~/invitro/data/traces/azure_500_firecracker/` | ||
- Make sure `~/invitro` branch is `rps_mode`. With text editor open `cmd/config_dirigent_trace.json` and change TracePath to match `azure_500` or `azure_500_firecracker` | ||
- Run locally `./scripts/start_resource_monitoring.sh user@node1 user@node2 user@node3`. | ||
- Run the load generator in `screen` on `node0` with `cd ~/invitro; go run cmd/loader.go --config cmd/config_dirigent_trace.json`. Wait for 30 minutes. There should be ~170K invocations, with a negligible failure rate. | ||
- Gather experiment results. Make sure you do not overwrite data from the other experiment. | ||
- Start Dirigent cluster as per instructions located in the root folder of artifact evaluation instructions. | ||
- On the `node0` execute `mkdir -p ~/invitro/data/traces/azure_500` or `mkdir -p ~/invitro/data/traces/azure_500_firecracker`, depending on which runtime you use. | ||
- Copy traces to `node0` using `scp azure_500/* user@node0:~/invitro/data/traces/azure_500/` or `scp azure_500_firecracker/* user@node0:~/invitro/data/traces/azure_500_firecracker/`. | ||
- Make sure on `node0` `~/invitro` branch is `rps_mode`. With text editor open `cmd/config_dirigent_trace.json` and change TracePath to match `azure_500` or `azure_500_firecracker`. | ||
- On your local machine run `./scripts/start_resource_monitoring.sh user@node0 user@node1 user@node2`. | ||
- Run the load generator in `screen` on `node0` with `cd ~/invitro; go run cmd/loader.go --config cmd/config_dirigent_trace.json`. Wait until the experiment completed (~30 minutes). There should be ~170K invocations, with a negligible failure rate. | ||
- Gather experiment results. Make sure you do not overwrite data from the other experiment, and you place results in correct folders. | ||
- Copy load generator output with `scp user@node0:~/invitro/data/out/experiment_duration_30.csv results_azure_500/` | ||
- Copy resource utilization data with `mkdir -p ./artifact_evaluation/azure_500/dirigent/results_azure_500/cpu_mem_usage && ./scripts/collect_resource_monitoring.sh ./artifact_evaluation/azure_500/dirigent/results_azure_500/cpu_mem_usage user@node0 user@node1 user@node2`. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,17 @@ | ||
## Azure 500 on Knative/K8s | ||
|
||
Time required: 10 min to set up environment and 30-60 min for the experiment | ||
|
||
Description: This experiment runs the downsampled Azure trace with 500 functions. Do not reuse Knative/K8s cluster if you configured the cluster for cold start sweep experiment. | ||
Description: This experiment runs the downsampled Azure trace with 500 functions. Do not reuse Knative/K8s cluster if you configured the cluster for cold start sweep experiment. | ||
|
||
Important: Do not reuse Knative/K8s cluster if you previously ran cold start sweep experiments, as the autoscaling configuration was changed and could affect the results severely. | ||
|
||
Instructions: | ||
- SSH into `node0`and on that node clone the load generator repo. Then checkout to `rps_mode` branch. The command is `git clone --branch=rps_mode https://github.com/vhive-serverless/invitro`. | ||
- On `node0` create a directory where trace will be stored `cd invitro; mkdir data/traces/azure_500` | ||
- Copy the trace from this folder to `node0` using the following command `scp azure_500/*.csv user@node0:~/invitro/data/traces/azure_500` | ||
- Run locally `./scripts/start_resource_monitoring.sh user@node1 user@node2 user@node3`. | ||
- On `node0` run `screen` and inside the screen run `go run cmd/loader.go --config cmd/config_knative.json`. Function deployment will take 10-20 minutes, and then experiment for additional 30 minutes. | ||
- Gather experiment results. Make sure you do not overwrite data from the other experiment. | ||
- SSH into `node0` and on that node clone the load generator repo. Then checkout to `rps_mode` branch. The command is `git clone --branch=rps_mode https://github.com/vhive-serverless/invitro`. | ||
- On `node0` create a directory where trace will be stored `cd invitro; mkdir data/traces/azure_500`. | ||
- Copy the trace from folder where this instruction file is located to the folder you previously created on `node0` using the following command `scp azure_500/*.csv user@node0:~/invitro/data/traces/azure_500`. | ||
- On your local machine run `./scripts/start_resource_monitoring.sh user@node0 user@node1 user@node2`. | ||
- On `node0` run `screen` and inside the `screen` run `go run cmd/loader.go --config cmd/config_knative.json`. Function deployment will take 10-20 minutes, and then experiment will run for additional 30 minutes. | ||
- Gather experiment results. Make sure you do not overwrite data from the other experiment, and you place results in correct folders. | ||
- Copy load generator output with `scp user@node0:~/invitro/data/out/experiment_duration_30.csv results_azure_500/` | ||
- Copy resource utilization data with `mkdir -p ./artifact_evaluation/azure_500/knative/results_azure_500/cpu_mem_usage && ./scripts/collect_resource_monitoring.sh ./artifact_evaluation/azure_500/knative/results_azure_500/cpu_mem_usage user@node0 user@node1 user@node2`. |
17 changes: 8 additions & 9 deletions
17
artifact_evaluation/cold_start_sweep/dirigent/INSTRUCTIONS.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,13 @@ | ||
## Cold start sweep on Dirigent | ||
|
||
Time required: 10 min to set up environment and 2-3 min per data point | ||
|
||
Description: This experiment triggers cold start in Maestro cluster. You should sweep the load until the cluster saturates, which will be visible on the latency plot. We suggest running experiments with 1, 10, 100, 500, 1000, 1250, 1500, ... RPS and observing the latency after conducting experiment for each data point. Low RPS (<10 RPS) rates should be run for 3-5 minutes, because of warmup, while all other loads can be run for just 1 minute. Always discard the results of the first experiment when starting a new cluster, as these measurements include image pull latency, which we should not include in the results. | ||
Description: This experiment triggers cold start in Maestro cluster. You should sweep the load until the cluster saturates, which will be visible on the latency plot. We suggest running experiments with 1, 10, 100, 250, 500, 750, 1000, ... RPS and observing the latency after conducting experiment for each data point. Low RPS (<10 RPS) rates should be run for 3-5 minutes, because of warmup, while any higher load can be run for just a minute. Always discard the results of the first experiment when starting a new cluster, as these measurements include image pull latency, which pollutes the measurements (can be seen as high p99 at low RPS). The instruction is for running experiments is the same for containerd and Firecracker, except the deployment method explained in `README.md` and `RpsImage` load generator field. | ||
|
||
Instructions: | ||
- Start Dirigent cluster according to instructions located in the root folder of artifact evaluation instructions. You can reuse the existing cluster running Dirigent containerd. | ||
- On remote machine `node0` open `~/invitro/cmd/config_dirigent_rps.json`. Set `RpsColdStartRatioPercentage` to `100`, and sweep the load with `RpsTarget` while configuring `ExperimentDuration` according to instructions above. For higher RPS, it might be necessary to increase `RpsCooldownSeconds`, which controls the number of functions that are deployed in the cluster to achieve the requested RPS. Set `GRPCFunctionTimeoutSeconds` to `15`. For containerd experiments make sure `RpsImage` is set to `docker.io/cvetkovic/dirigent_empty_function:latest`, whereas for Firecracker experiments this field should be set to `empty`. | ||
- Start RPS experiment by running `cd ~/invitro; go run cmd/loader.go --config cmd/config_dirigent_rps.json` | ||
- Create folder storing results with `mkdir -p ./artifact_evaluation/cold_start_sweep/dirigent/results_containerd` | ||
- Start Dirigent cluster according to instructions located in the root folder of artifact evaluation instructions (`README.md`). You can reuse the existing cluster running Dirigent containerd. | ||
- On remote machine `node0` open `~/invitro/cmd/config_dirigent_rps.json`. Set `RpsColdStartRatioPercentage` to `100`, and sweep the load with `RpsTarget` while configuring `ExperimentDuration` according to instructions above. For higher RPS (>1000), it might be necessary to increase `RpsCooldownSeconds`, which controls the number of functions that are deployed in the cluster to achieve the requested RPS. Set `GRPCFunctionTimeoutSeconds` to `15`. For containerd experiments make sure `RpsImage` is set to `docker.io/cvetkovic/dirigent_empty_function:latest`, whereas for Firecracker experiments this field should be set to `empty`. | ||
- Start RPS experiment by running `cd ~/invitro; go run cmd/loader.go --config cmd/config_dirigent_rps.json`. | ||
- Create folder storing results with `mkdir -p ./artifact_evaluation/cold_start_sweep/dirigent/results_containerd` or `mkdir -p ./artifact_evaluation/cold_start_sweep/dirigent/results_firecracker`. | ||
- Gather results located in `data/out/experiment_duration_X.csv` and copy them to your local machine in format `rps_X.csv` to the folder you created in the previous step. | ||
- Repeat for different RPS values until the cluster saturates, which you can see by plotting the data with the provided script | ||
|
||
Results expectation/interpretation: | ||
- Since we cannot provide access to a 100-node cluster over a 2-week artifact evaluation period, the throughput we show in Figure 7 is lower on smaller cluster, as worker nodes become the bottleneck. However, it is important to note that cold start throughput of Knative/K8s << Maestro - containerd < Maestro - Firecracker. | ||
- Repeat for different RPS values until the cluster saturates, which you can see by plotting the data with the provided script. |
10 changes: 6 additions & 4 deletions
10
artifact_evaluation/cold_start_sweep/knative/INSTRUCTIONS.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters