This guide describes how to set up an N-node vHive serverless cluster. See here to learn where to find table of contents.
- Host platform requirements
- Setup a Serverless (Knative) Cluster
- Setup a Single-Node Cluster
- Deploying and Invoking Functions in vHive
- Two x64 servers in the same network.
- We have not tried vHive with Arm but it may not be hard to port because Firecracker supports Arm64 ISA.
- Hardware support for virtualization and KVM.
- Nested virtualization is supported provided that KVM is available.
- The root partition of the host filesystem should be mounted on an SSD. That is critical for snapshot-based cold-starts.
- We expect vHive to work on machines that use HDDs but there could be timeout-related issues with large Docker images (>1GB).
- Ubuntu/Debian with sudo access and
apt
package manager on the host (tested on Ubuntu 18.04, v4.15).- Other OS-es require changes in our setup scripts, but should work in principle.
- Passwordless SSH. Copy the SSH keys that you use to authenticate on GitHub to all the nodes and
type
eval "$(ssh-agent -s)" && ssh-add
to allow ssh authentication in the background.
We suggest renting nodes on CloudLab as their service is available to researchers world-wide.
You can use our CloudLab profile RPerf/vHive-cluster-env.
It is recommended to use a base Ubuntu 18.04 image for each node and connect the nodes in a LAN.
We tested the following instructions by setting up a 2-node cluster on Cloudlab, using all of the following SSD-equipped machines: xl170
on Utah, rs440
on Mass, m400
on OneLab. xl170
are normally less occupied than the other two, and users can consider other SSD-based machines too.
SSD-equipped nodes are highly recommended. Full list of CloudLab nodes can be found here.
On each node (both master and workers), execute the following instructions below as a non-root user with sudo rights using bash:
- Clone the vHive repository
git clone --depth=1 https://github.com/ease-lab/vhive.git
- Change your working directory to the root of the repository:
cd vhive
- Create a directory for vHive logs:
mkdir -p /tmp/vhive-logs
- Run the node setup script:
./scripts/cloudlab/setup_node.sh > >(tee -a /tmp/vhive-logs/setup_node.stdout) 2> >(tee -a /tmp/vhive-logs/setup_node.stderr >&2)
BEWARE:
This script can print
Command failed
when creating the devmapper at the end. This can be safely ignored.
On each worker node, execute the following instructions below as a non-root user with sudo rights using bash:
-
Run the script that setups kubelet:
./scripts/cluster/setup_worker_kubelet.sh > >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stdout) 2> >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stderr >&2)
-
Start
containerd
in a background terminal namedcontainerd
:sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
Note:
screen
is a terminal multiplexer similar totmux
but widely available by default.Starting long-running daemons in the background using
screen
allows you to use a single terminal (an SSH session most likely) by keeping it unoccupied and ensures that daemons will not be terminated when you logout (voluntarily, or because of connection issues).- To (re-)attach a background terminal:
sudo screen -rd <name>
- To detach (from an attached terminal):
Ctrl+A then D - To kill a background terminal:
sudo screen -XS <name> quit
- To list all the sessions:
sudo screen -ls
- To (re-)attach a background terminal:
-
Start
firecracker-containerd
in a background namedfirecracker
:sudo PATH=$PATH screen -dmS firecracker bash -c "/usr/local/bin/firecracker-containerd --config /etc/firecracker-containerd/config.toml > >(tee -a /tmp/vhive-logs/firecracker.stdout) 2> >(tee -a /tmp/vhive-logs/firecracker.stderr >&2)"
-
Build vHive host orchestrator:
source /etc/profile && go build
-
Start
vHive
in a background terminal namedvhive
:# EITHER sudo screen -dmS vhive bash -c "./vhive > >(tee -a /tmp/vhive-logs/vhive.stdout) 2> >(tee -a /tmp/vhive-logs/vhive.stderr >&2)" # OR sudo screen -dmS vhive bash -c "./vhive -snapshots > >(tee -a /tmp/vhive-logs/vhive.stdout) 2> >(tee -a /tmp/vhive-logs/vhive.stderr >&2)" # OR sudo screen -dmS vhive bash -c "./vhive -snapshots -upf > >(tee -a /tmp/vhive-logs/vhive.stdout) 2> >(tee -a /tmp/vhive-logs/vhive.stderr >&2)"
Note:
By default, the microVMs are booted,
-snapshots
enables snapshots after the 2nd invocation of each function.If
-snapshots
and-upf
are specified, the snapshots are accelerated with the Record-and-Prefetch (REAP) technique that we described in our ASPLOS'21 paper (extended abstract, full paper).
On the master node, execute the following instructions below as a non-root user with sudo rights using bash:
- Start
containerd
in a background terminal namedcontainerd
:sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
- Run the script that creates the multinode cluster:
./scripts/cluster/create_multinode_cluster.sh > >(tee -a /tmp/vhive-logs/create_multinode_cluster.stdout) 2> >(tee -a /tmp/vhive-logs/create_multinode_cluster.stderr >&2)
BEWARE:
The script will ask you the following:
All nodes need to be joined in the cluster. Have you joined all nodes? (y/n)
Leave this hanging in the terminal as we will go back to this later.
However, in the same terminal you will see a command in following format:
kubeadm join 128.110.154.221:6443 --token <token> \ --discovery-token-ca-cert-hash sha256:<hash>
Please copy the both lines of this command.
On each worker node, execute the following instructions below as a non-root user with sudo rights using bash:
- Add the current worker to the Kubernetes cluster, by executing the command you have copied in step (3.2) using sudo:
sudo kubeadm join IP:PORT --token <token> --discovery-token-ca-cert-hash sha256:<hash> > >(tee -a /tmp/vhive-logs/kubeadm_join.stdout) 2> >(tee -a /tmp/vhive-logs/kubeadm_join.stderr >&2)
Note:
On success, you should see the following message:
This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details.
On the master node, execute the following instructions below as a non-root user with sudo rights using bash:
- As all worker nodes have been joined, and answer with
y
to the prompt we have left hanging in the terminal. - As the cluster is setting up now, wait until all pods show as
Running
orCompleted
:watch kubectl get pods --all-namespaces
Congrats, your Knative cluster is ready!
In essence, you will execute the same commands for master and worker setups but on a single node.
5 seconds delay has been added between the commands to ensure that components have enough time to initialize.
Execute the following below as a non-root user with sudo rights using bash:
- Run the node setup script:
./scripts/cloudlab/setup_node.sh;
- Start
containerd
in a background terminal namedcontainerd
:sudo screen -dmS containerd containerd; sleep 5;
Note:
Regarding
screen
and starting daemons in background terminals, see the note in step 2 of subsection II.2 Setup Worker Nodes. - Start
firecracker-containerd
in a background namedfirecracker
:sudo PATH=$PATH screen -dmS firecracker /usr/local/bin/firecracker-containerd --config /etc/firecracker-containerd/config.toml; sleep 5;
- Build vHive host orchestrator:
source /etc/profile && go build;
- Start
vHive
in a background terminal namedvhive
:sudo screen -dmS vhive ./vhive; sleep 5;
- Run the single node cluster setup script:
./scripts/cluster/create_one_node_cluster.sh
./scripts/github_runner/clean_cri_runner.sh
This script stops the existing cluster if any, cleans up and then starts a fresh single-node cluster.
export GITHUB_VHIVE_ARGS="[-dbg] [-snapshots] [-upf]" # specify if to enable debug logs; cold starts: snapshots, REAP snapshots (optional)
scripts/cloudlab/start_onenode_vhive_cluster.sh
This section is only for synchronous (i.e., Knative Serving) functions. Please refer to Adding Benchmarks to vHive/Knative and Stock Knative for benchmarking asynchronous (i.e., Knative Eventing) case and more details about both.
On the master node, execute the following instructions below using bash:
-
Optionally, configure the types and the number of functions to deploy in
examples/deployer/functions.json
. -
Run the deployer client:
source /etc/profile && go run examples/deployer/client.go
BEWARE:
Deployer cannot be used for Knative eventing (i.e., asynchronous) workflows. You need to deploy them manually instead.
Note:
There are runtime arguments that you can specify if necessary.
The script writes the deployed functions' endpoints in a file (
endpoints.json
by default).
On any node, execute the following instructions below using bash:
-
Run the invoker client:
go run examples/invoker/client.go
Note:
There are runtime arguments (e.g., RPS or requests-per-second target, experiment duration) that you can specify if necessary.
After invoking the functions from the input file (
endpoints.json
by default), the script writes the measured latencies to an output file (rps<RPS>_lat.csv
by default, where<RPS>
is the observed requests-per-sec value) for further analysis.
On the master node, execute the following instructions below using bash:
- Delete all deployed functions:
kn service delete --all