Deploying SR-IOV in Kubernetes cluster

This post describes how to deploy SR-IOV in Kubernetes, and how SR-IOV Network Device Plugin works, we start the experiment from Kubernetes v1.13.1 cluster with Ubuntu 18.04 LTS.

Kubernetes have a lot of Container Network
Interface (short as CNI) supported now, we can build services by the simple configurations, and in this post, we are going to use Multus, Flannel, and SR-IOV to build a multiple networking interfaces pod for our services.

Before we start

If you don’t know CNI and SR-IOV, why we need these components and what Virtual Functions (VFs) and Physical Functions (PFs) is, you can start from these articles:

For simple words, SR-IOV technology makes us can have multiple virtualized interfaces mapping to one physical networking card, and we can mount these virtualized interfaces (virtual functions) into container’s network namespace. The container can transmit network traffic to physical network interface directly, and this behavior speeds up the transmit efficiently.

Prerequisites

Your network devices may not support with SR-IOV, please visit vendor’s website for SR-IOV supports, for example:

Install SR-IOV to server

First, you need to install SR-IOV driver, and we use Intel i40e driver to enable SR-IOV in here, select the driver version supports with your hardware. In my environment, I used Intel Corporation I350 Gigabit Network Connection for this experiment.

Install i40e driver

1
2
3
4
sudo apt install -y make gcc libelf-dev
I40E_VER=2.4.10
wget https://downloadmirror.intel.com/28306/eng/i40e-${I40E_VER}.tar.gz && \
tar xvzf i40e-${I40E_VER}.tar.gz && cd i40e-${I40E_VER}/src && sudo make install && cd -

Update GRUB Settings

1
2
3
sudo sed -i '/GRUB_CMDLINE_LINUX_DEFAULT/c\GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on"' /etc/default/grub
sudo sed -i '/GRUB_CMDLINE_LINUX/c\GRUB_CMDLINE_LINUX="intel_iommu=on"' /etc/default/grub
sudo update-grub

Setup vfio-pci module auto-load on boot

1
2
3
echo 'vfio-pci' | sudo tee /etc/modules-load.d/vfio-pci.conf
wget -qO- https://fast.dpdk.org/rel/dpdk-17.11.2.tar.xz | sudo tar -xJC /opt
sudo mv /opt/dpdk-* /opt/dpdk

Create SR-IOV Script for systemctl

We write the source code into /opt/scripts/sriov.sh, and we need this file to execute when we boot the server.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sudo mkdir -p /sriov-cni /opt/scripts
sudo su
cat << "EOF" > /opt/scripts/sriov.sh
#!/bin/bash
# Copied from infra/sriov.sh
# Usage: ./sriov.sh ens785f0

NUM_VFS=$(cat /sys/class/net/$1/device/sriov_totalvfs)
echo 0 | sudo tee /sys/class/net/$1/device/sriov_numvfs
echo $NUM_VFS | sudo tee /sys/class/net/$1/device/sriov_numvfs
sudo ip link set $1 up
for ((i = 0 ; i < ${NUM_VFS} ; i++ )); do ip link set $1 vf $i spoofchk off; done
for ((i = 0 ; i < ${NUM_VFS} ; i++ )); do ip link set dev $1 vf $i state enable; done
EOF
exit
# Script perms
sudo chmod 744 /opt/scripts/sriov.sh

After we have the script, we can write a sriov.service to define a service and control this service by sudo systemctl enable sriov.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sudo su
# Systemd unit to run the above script
cat << "EOF" > /etc/systemd/system/sriov.service
[Unit]
Description=Create VFs for ens802f0

[Service]
Type=oneshot
ExecStart=/opt/scripts/sriov.sh ens802f0

[Install]
WantedBy=default.target
EOF
exit

# Enable the SRIOV systemd unit
sudo systemctl enable sriov

Now we are good to go, reboot system to make changes affect.

Concept of building SR-IOV supported Kubernetes

Repositories to use

We need SR-IOV Container Network Interface to build sriov binary, and SR-IOV Network Device Plugin to build sriovdp binary, I will explain what is the task for these binaries later.

SR-IOV CNI

SR-IOV CNI has simple work to do, like as:

  • VF network plumbing
  • VF allocation to POD network namespace
  • VF deallocation from POD network namespace

SR-IOV CNI uses physical function name to select virtual functions, and allocates the virtual function to be PODs interface, look the example from SR-IOV’s GitHub page:

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"name": "mynet",
"type": "sriov",
"master": "enp1s0f1",
"ipam": {
"type": "host-local",
"subnet": "10.55.206.0/26",
"routes": [
{ "dst": "0.0.0.0/0" }
],
"gateway": "10.55.206.1"
}
}

SR-IOV Network Device Plugin

When we want to allocate VFs to a container, if we don’t have enough VFs to use, SR-IOV CNI can’t make the scheduler aware of the virtual functions are exhausted, so SR-IOV network device plugin helps on:

  1. Discover SR-IOV network interfaces in the node.
  2. Monitor on virtual functions healthy.
  3. Have the resource allocation limitation on Kubernete cluster.
  4. Apply POD specific network configuration to the allocated VFs.

SR-IOV network device plugin uses /etc/pcidp/config.json to identify which PCI address is, and run GO program to allocate VFs to the container.

Illustration of SR-IOV and SR-IOV Network Device Plugin

I draw the following picture to illustrate how kubelet collaborate with Multus, SR-IOV CNI, and SR-IOV network device plugin. This diagram assumes administrator configured all components, included Network CRD (Custom Resource Definitions), Multus configuration, and SR-IOV network definition.

So the story is when we start the SR-IOV network device plugin as a DaemonSet on each node, it registers intel.com/sriov as a resource into Kubernetes, and Kubernetes can check SR-IOV resource on nodes via SR-IOV network device plugin. When we are creating a service, depends on deployment’s resource request, Kubernetes will decide to allocate deployment on which node by available resource on nodes.

After node to provision POD decided, Kubernetes sends network setup request to Multus. Then Multus tells SR-IOV CNI about POD id and network configuration. SR-IOV CNI setup the networking to VF, and allocate VF into the container as an interface.

SR-IOV works with Kubernetes

Deploying the cluster

With the concept, you know the difference between SR-IOV CNI and SR-IOV Network Device Plugin, make sure you have following requirement satisfied.

  1. SR-IOV enabled on the network interface.

    1
    2
    3
    4
    5
    6
    7
    8
    [email protected]:~$ ip link show dev ens802f0
    4: ens802f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:1e:67:d2:ee:ea brd ff:ff:ff:ff:ff:ff
    vf 0 MAC 00:00:00:00:00:00, spoof checking off, link-state auto, trust off, query_rss off
    vf 1 MAC 16:17:47:f9:43:9a, spoof checking off, link-state auto, trust off, query_rss off
    vf 2 MAC fe:96:60:5f:d3:50, spoof checking off, link-state auto, trust off, query_rss off
    vf 3 MAC 36:91:06:87:0d:c6, spoof checking off, link-state auto, trust off, query_rss off
    ... snip
  2. SR-IOV CNI located on CNI binary folder

    1
    2
    [email protected]:~$ ls /opt/cni/bin
    bridge centralip cnishim dhcp flannel host-local ipvlan loopback macvlan multus portmap ptp sample sriov tuning vlan
  3. SR-IOV Network Device Plugin runs

    1
    2
    $ kubectl -n kube-system logs sriov-device-plugin-5qfxc
    I0107 23:19:30.469685 13614 server.go:132] ListAndWatch(sriov): send updated devices &ListAndWatchResponse{Devices:[&Device{ID:0000:04:10.0,Health:Healthy,} &Device{ID:0000:04:10.2,Health:Healthy,} &Device{ID:0000:04:10.4,Health:Healthy,} &Device{ID:0000:04:10.6,Health:Healthy,} ... snip],}

Check resource exists and discovers by Kubernetes

1
2
3
4
5
6
7
8
9
$ kubectl get node node2 -o json | jq '.status.allocatable'
{
"cpu": "40",
"ephemeral-storage": "452695013856",
"hugepages-1Gi": "32Gi",
"intel.com/sriov": "63",
"memory": "32210384Ki",
"pods": "110"
}

Conclusion

This article describes how SR-IOV CNI and SR-IOV Network Device Plugin work with Kubernetes, not focus on implementing very much, only have a brief introduction about the installation steps. SR-IOV is helpful when we want to accelerate network traffic go through the container, enjoy.