M-CORD Installation

This document describes the steps to build M-CORD with SR-IOV enabled from fresh machine, use multiple servers(nodes) to form a kubernetes cluster.

Table of Contents

[TOC]

Version of all components

  1. OS: Ubuntu bionic, Ubuntu 18.04.1 LTS
  2. Go: 1.11.2 on linux/amd64
  3. Kubernetes: v1.13.1, eec55b9ba98609a46fee712359c7b5b365bdd920
  4. Docker: 18.06.1-ce, 4957679
  5. SR-IOV Network Device plugin: master, fffe18effa17b51252514cfd14acaebad34ebad1
  6. Container Network Interface
    1. Multus: master, f157f424b5e6f806e83288335396212bd8d21ff2
    2. Flannel: >0.10.0, bc79dd1505b0c8681ece4de4c0d86c5cd2643275

      Flannel have bug on Kubernetes >= 1.12, so use patch instead

    3. SR-IOV CNI: dev/k8s-deviceid-model, 4c979552b89833e8f3bf39cf6081ff8fb78a0f9c

Setup for increasing performance

Hugepages

  1. (optional) Install Hugepages administration tool

    1
    sudo apt install hugepages
  2. Configure hugepages and IOMMU, make sure server’s BIOS have VT-d enabled.

    1
    2
    3
    sudo sed -i '/GRUB_CMDLINE_LINUX_DEFAULT/c\GRUB_CMDLINE_LINUX_DEFAULT="intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=32"' /etc/default/grub
    sudo sed -i '/GRUB_CMDLINE_LINUX/c\GRUB_CMDLINE_LINUX="intel_iommu=on default_hugepagesz=1G hugepagesz=1G hugepages=32"' /etc/default/grub
    sudo update-grub
  3. Mount hugepages on bootup.

    1
    echo 'nodev /dev/hugepages hugetlbfs pagesize=1GB 0 0' | sudo tee -a /etc/fstab
  1. Reboot and use hugeadm to verify (or you can reboot until SR-IOV part is completed).

    1
    2
    3
    [email protected]:~$ hugeadm --pool-list
    Size Minimum Current Maximum Default
    1073741824 32 32 32 *

SR-IOV

  1. Install latest i40e driver and firmware, firmware not described at here.

    1
    2
    3
    4
    sudo apt install -y make gcc libelf-dev
    I40E_VER=2.4.10
    wget https://downloadmirror.intel.com/28306/eng/i40e-${I40E_VER}.tar.gz && \
    tar xvzf i40e-${I40E_VER}.tar.gz && cd i40e-${I40E_VER}/src && sudo make install && cd -
  2. Setup vfio-pci module load on boot, create DPDK and SR-IOV script

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    echo 'vfio-pci' | sudo tee /etc/modules-load.d/vfio-pci.conf
    wget -qO- https://fast.dpdk.org/rel/dpdk-17.11.2.tar.xz | sudo tar -xJC /opt
    sudo mv /opt/dpdk-* /opt/dpdk

    sudo mkdir -p /sriov-cni /opt/scripts
    sudo su
    cat << "EOF" > /opt/scripts/sriov.sh
    #!/bin/bash
    # Copied from infra/sriov.sh
    # Usage: ./sriov.sh ens785f0

    NUM_VFS=$(cat /sys/class/net/$1/device/sriov_totalvfs)
    echo 0 | sudo tee /sys/class/net/$1/device/sriov_numvfs
    echo $NUM_VFS | sudo tee /sys/class/net/$1/device/sriov_numvfs
    sudo ip link set $1 up
    for ((i = 0 ; i < ${NUM_VFS} ; i++ )); do ip link set $1 vf $i spoofchk off; done
    for ((i = 0 ; i < ${NUM_VFS} ; i++ )); do ip link set dev $1 vf $i state enable; done
    EOF
    exit
    # Script perms
    sudo chmod 744 /opt/scripts/sriov.sh
  3. Setup SR-IOV on specific interfaces (use ens802f0 as example)

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    sudo su
    # Systemd unit to run the above script
    cat << "EOF" > /etc/systemd/system/sriov.service
    [Unit]
    Description=Create VFs for ens802f0

    [Service]
    Type=oneshot
    ExecStart=/opt/scripts/sriov.sh ens802f0

    [Install]
    WantedBy=default.target
    EOF
    exit

    # Enable the SRIOV systemd unit
    sudo systemctl enable sriov
  4. Reboot

Setup Kubernetes Cluster

  1. requirements (on all nodes)

    • regular user can sudo without password
    • make sure all node can communicate to each other
    • (optional) management interface boot up with DHCP
    • Install DockerCE

      1
      2
      curl -fsSL https://get.docker.com/ | VERSION=18.06 sh
      sudo usermod -aG docker $(whoami)
    • Turn off swap (k8s’s requirement)

      1
      2
      sudo swapoff -a && sudo sysctl -w vm.swappiness=0
      sudo sed -i 's/^\(.*swap.*\)$/#\1/g' /etc/fstab
    • add kubernetes package to apt source list

      1
      2
      curl -s "https://packages.cloud.google.com/apt/doc/apt-key.gpg" | sudo apt-key add -
      echo "deb http://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
    • Install kubernetes pre-requirement

      1
      2
      sudo apt update
      sudo apt install -y apt-transport-https kubelet=1.13.* kubeadm=1.13.* kubectl
    • Install Kubernetes-related Tools

      1
      2
      sudo snap install jq
      sudo snap install helm --classic
  2. Install kubernetes cluster

    • (master) Use kubeadm to install master

      1
      sudo kubeadm init --pod-network-cidr 10.244.0.0/16 --kubernetes-version stable-1.13
    • (master) Copy credentials

      1
      2
      3
      rm -rf $HOME/.kube && mkdir -p $HOME/.kube
      sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
      sudo chown $(id -u):$(id -g) $HOME/.kube/config
    • (slave) enter kubeadm command’s output to other nodes to join cluster group (depends on output, not same as below)

      kubeadm join 10.90.0.215:6443 –token 5oq9rl.uideijil7jjv42x0 –discovery-token-ca-cert-hash sha256:c82ae1391bb3dc6e903f290614c5ead5cf9dd3426c797c984fdf469e4f44f066

* if you want to grab the `join` command again, use:

    
1
kubeadm token create --print-join-command

CNI Installation

  1. Flannel (Node will be ready after apply)

It should use flannel=v0.10.0 as suggested, but flannel have bug on kubernetes>=1.12, so we use a temporary solution for this.

1
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/bc79dd1505b0c8681ece4de4c0d86c5cd2643275/Documentation/kube-flannel.yml

Original:

1
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/v0.10.0/Documentation/kube-flannel.yml

Automatic Installation (default)

We have pre-built Container Network Interface plugin in ngick8stesting/aio-cni:k8s-1.13 container, it will be deployed by 06-sriov-device-plugin.yaml, and you can find build procedure in build/Dockerfile.

Manual Installation

  1. Go Installation
  • golang Installation for compile CNIs

    1
    2
    3
    4
    5
    GOVER=1.11.2
    cd /tmp && wget https://dl.google.com/go/go$GOVER.linux-amd64.tar.gz
    sudo tar -C /usr/local -xzf go$GOVER.linux-amd64.tar.gz
    echo "export PATH=$PATH:/usr/local/go/bin" >> $HOME/.bashrc
    source $HOME/.bashrc
  1. Multus

    1
    2
    3
    4
    5
    6
    cd /tmp
    git clone https://github.com/intel/multus-cni.git && cd multus-cni
    ./build

    # And copy to other nodes (the node runs SR-IOV)
    cp bin/multus /opt/cni/bin
  2. SR-IOV

    • SR-IOV CNI Installtion

      1
      2
      3
      4
      5
      6
      7
      cd /tmp
      git clone https://github.com/intel/sriov-cni.git && cd sriov-cni
      git checkout dev/k8s-deviceid-model
      ./build

      # And copy to other nodes (the node runs SR-IOV)
      cp build/sriov /opt/cni/bin
* SR-IOV Network device plugin Installation

    
1
2
3
4
5
6
cd /tmp
git clone https://github.com/intel/sriov-network-device-plugin sriov-ndp && cd sriov-ndp
make

# Optional Step: image can be build from source or fetch from dockerhub
make image
  1. CentralIP

    1
    2
    3
    4
    5
    git clone https://github.com/John-Lin/ovs-cni.git /tmp/ovs-cni
    ./build.sh

    # And copy to other nodes (the node runs SR-IOV)
    cp bin/centralip /opt/cni/bin

CNI deployment

We have a lot of configuration files to make CNI can work, here is the list

  1. 01-standalone-etcd.yaml
    • Create standalone etcd service as daemonset, deployed on all node except master, standalone etcd will allocate IP address for s1u-net and sgi-net.
  2. 02-cni-service-account.yaml
    • Create 2 service accounts: multus and sriov-dp for CNI operation.
  3. 03-network-crd.yaml
    • Deploy Network as a custom-defined resource in Kubernetes cluster.
  4. 04-network-definition.yaml
    • Define 2 networks, s1u-net for eNB to SPGW-U’s connection, and sgi-net for mobile network egress interface.
  5. 05-sriov-device-plugin-configmap.yaml
    • Config for SR-IOV pod to configure nodes, explain later.
  6. 06-sriov-device-plugin.yaml
    • Daemonset of SR-IOV pod, defined following operations:
      1. init-sriov-dp: Copy credential for multus to control node’s network.
      2. systemd-restart: Mount node’s /var/run/dbus and /run/systemd to same path on container, make container can restart node’s services, we restart containerd, crio, kubelet to make CNI active.
      3. sriov-device-plugin: load config from ConfigMap, and use it to allocate VF into pod.
  7. 07-tiller.yaml
    • Create a tiller account with system privilege, and helm client can install helm-chart with this account.

Apply it all with following command:

1
kubectl apply -f ./

Before Deploy

  • Node1: master node, we don’t use it to provision

    1
    2
    3
    4
    5
    6
    7
    8
    kc get node node1 -o json | jq '.status.allocatable'
    {
    "cpu": "40",
    "ephemeral-storage": "452695013856",
    "hugepages-1Gi": "32Gi",
    "memory": "32210384Ki",
    "pods": "110"
    }
  • Node2: Hugepage 32Gi & SR-IOV 63 Virtual Functions

    1
    2
    3
    4
    5
    6
    7
    8
    9
    kc get node node2 -o json | jq '.status.allocatable'
    {
    "cpu": "40",
    "ephemeral-storage": "452695013856",
    "hugepages-1Gi": "32Gi",
    "intel.com/sriov": "63",
    "memory": "32210384Ki",
    "pods": "110"
    }
  • Node3: Hugepage 32Gi & SR-IOV 63 Virtual Functions

    1
    2
    3
    4
    5
    6
    7
    8
    9
    kc get node node3 -o json | jq '.status.allocatable'
    {
    "cpu": "40",
    "ephemeral-storage": "452695013856",
    "hugepages-1Gi": "32Gi",
    "intel.com/sriov": "63",
    "memory": "32210384Ki",
    "pods": "110"
    }
  • Install Helm POD
    1
    helm init --service-account tiller

Deploy M-CORD

We are all set to deploy M-CORD by helm-chart, use M-CORD Charts to deploy mobile network services to cluster.

1
helm install -n epc mcord-vepc-helm

If you see following output, you have install M-CORD successfully.

1
2
3
4
5
6
[email protected]:~$ kubectl -n epc get pods
NAME READY STATUS RESTARTS AGE
cassandra-0 1/1 Running 0 20h
hss-0 1/1 Running 0 20h
mme-0 1/1 Running 0 20h
ngic-0 2/2 Running 0 20h

Useful Trick

Because the installation need to be done repeatly on each nodes, so you can use tmux synchronize pane function to type command once, execute every nodes.

  • Split Panel

    1
    2
    <C-b> %: veritical split
    <C-b> ": horizonal split
  • Synchronize mode on

    1
    <C-b> :set synchronize-panes on

Trouble Shooting

Debug Multus work or not

We can use following 2 yaml files to create a custom-defined network and a test pod, if you can see 2 interfaces listed in this pod, then Multus works.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# bridge.yaml
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: mynet
spec:
config: '{
"name": "mynet",
"type": "bridge",
"ipam": {
"type": "host-local",
"subnet": "2.2.2.0/24"
}
}'
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# testpod.yaml
---
apiVersion: v1
kind: Pod
metadata:
name: testpod1
annotations:
k8s.v1.cni.cncf.io/networks: '[
{ "name": "mynet", "interfaceRequest": "mynet" }
]'
spec:
nodeSelector:
kubernetes.io/hostname: node2
containers:
- name: test1
image: busybox
command: [ "top" ]

Check VFs are shown on nodes

VF(Virtual Function) may not enabled or bind to interface, check with lspci command, make sure Virtual Function is created.

1
2
3
4
5
[email protected]:~$ lspci | grep -i 'Virtual Function'
04:10.0 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
04:10.2 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
04:10.4 Ethernet controller: Intel Corporation 82599 Ethernet Controller Virtual Function (rev 01)
...

ip link is also a command can list VFs.

1
2
3
4
5
6
[email protected]:~$ ip link show dev ens802f0
4: ens802f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
link/ether 00:1e:67:d2:ce:52 brd ff:ff:ff:ff:ff:ff
vf 0 MAC 46:20:3a:e8:63:2d, spoof checking off, link-state auto, trust off, query_rss off
vf 1 MAC 9e:7e:78:8c:ed:fe, spoof checking off, link-state auto, trust off, query_rss off
vf 2 MAC ea:5c:8b:8b:b0:57, spoof checking off, link-state auto, trust off, query_rss off

if VFs are not present, try to enable SR-IOV with sudo systemctl enable sriov.

SR-IOV resource didn’t show up when getting node resource

the config.json for sriov-pod as following:

1
2
3
4
5
6
7
8
9
10
11
{
"resourceList":
[
{
"resourceName": "sriov",
"rootDevices": ["04:00.0"],
"sriovMode": true,
"deviceType": "netdevice"
}
]
}

we use netdevice can make OS to bind VFs on interface automatically, and if use vfio here, that will assume administrator bind VFs to interface manually.

And "rootDevices": ["04:00.0"] is for the device enabled with SR-IOV, you can find your pci series number by following command:

1
2
[email protected]:~$ sudo lshw -c network -businfo | grep ens802f0
[email protected]:04:00.0 ens802f0 network 82599ES 10-Gigabit SFI/SFP+ Network Connection