KVM NUMA affinity and hyperthreading

This topic explains how to experiment with NUMA pinning and Hyper-Threading Technology for Pexip Infinity Conferencing Node VMs, in order to achieve up to 50% additional capacity.

If you are taking advantage of hyperthreading to deploy two vCPUs per physical core (i.e. one per logical thread), you must first enable NUMA affinity; if you don't, the Conferencing Node VM will end up spanning multiple NUMA nodes, resulting in a loss of performance.

Affinity does NOT guarantee or reserve resources, it simply forces a VM to use only the socket you define, so mixing Pexip Conferencing Node VMs that are configured with NUMA affinity together with other VMs on the same server is not recommended.

NUMA affinity is not practical in all data center use cases, as it forces a given VM to run on a certain CPU socket (in this example), but is very useful for high-density Pexip deployments with dedicated capacity.

This information is aimed at administrators with a strong understanding of KVM, who have very good control of their VM environment, and who understand the consequences of conducting these changes.

Please ensure you have read and implemented our recommendations in Achieving high density deployments with NUMA before you continue.

Prerequisites

NUMA affinity for Pexip Conferencing Node VMs should only be used if the following conditions apply:

Live migration is NOT used. (Using this may result in having two nodes both locked to a single socket, meaning both will be attempting to access the same processor, with neither using the other processor.)
You fully understand what you are doing, and you are happy to revert back to the standard settings, if requested by Pexip support, to investigate any potential issues that may result.

Checking NUMA support

You can confirm that your system supports NUMA by running the following command:

# cat /sys/devices/system/node/node*/cpulist

If it returns a single node e.g.
0-15
then NUMA is not supported.

If it returns multiple lines e.g.
0-19,40-59
20-39,60-79
this indicates that NUMA is supported and that in this case there are 2 NUMA nodes.

Overview of process

The examples in this topic configure 2 Conferencing Nodes on 2 NUMA domains:

Conferencing Node 1 on NUMA domain 0
Conferencing Node 2 on NUMA domain 1

In KVM we need to configure 3 elements per VM:

Memory NUMA domain
The CPU set available to the vCPU threads
The CPU set available to the QEMU emulator

Increasing vCPUs

You must increase the number of vCPUs assigned to your Conferencing Nodes, to make use of the hyperthreaded cores. (Hyperthreading must always be enabled, and is generally enabled by default.)

Count logical processors

First you must check how many logical processors each CPU has by running the lscpu command.

Example results:

# lscpu
Architecture:                x86_64
  CPU op-mode(s):            32-bit, 64-bit
  Address sizes:             46 bits physical, 48 bits virtual
  Byte Order:                Little Endian
CPU(s):                      80
  On-line CPU(s) list:       0-79
Vendor ID:                   GenuineIntel
  Model name:                Intel(R) Xeon(R) Gold 6248 CPU @ 2.50GHz
    CPU family:              6
    Model:                   85
    Thread(s) per core:      2
    Core(s) per socket:      20
    Socket(s):               2
    Stepping:                7
    CPU(s) scaling MHz:      27%
    CPU max MHz:             3900.0000
    CPU min MHz:             1000.0000

This example has 2 sockets with 20 physical cores each, with 2 threads per core (i.e. 2 threads per core = hyperthreading). This gives us a total of 80 logical processors.

Therefore each Conferencing Node will have 40 cores.

Assign vCPU and RAM

You must increase the number of vCPUs and RAM assigned to each VM.

Power off the VMs.
Increase the number of vCPUs by running the following two commands for each VM:

# virsh setvcpus <vm name> <number of cores> --config --maximum

# virsh setvcpus <vm name> <number of cores> --config

In our example we would assign 40 cores to each VM.
Increase the amount of memory by running the following two commands for each VM:

# virsh setmaxmem <vm name> <amount of RAM>GiB --config

# virsh setmem <vm name> <amount of RAM>GiB --config

You should assign 1 GB RAM per vCPU. So, in our example we would assign 40GiB to each VM.

Setting NUMA affinity

The following steps set up NUMA on your VMs.

Check NUMA layout

First check the NUMA layout of your system by running the following command:

# cat /sys/devices/system/node/node*/cpulist

Example output:

0-19,40-59
20-39,60-79

This example reports that CPUs 0-19 and 40-59 are on NUMA node 0 (i.e. CPU 0), and that 20-39 and 60-79 are on NUMA node 1 (i.e. CPU 1).

Configuring the domain

You need to modify the domain XML of the Conferencing Nodes to specify the appropriate CPU and NUMA settings.

Edit the domain XML of the first Conferencing Node (using virsh edit <vm_name> again).

In our example, it looks like this:

<domain type='kvm'>
  <name>pexip-conf1</name>
  <uuid>53a14645-f9ef-4424-a544-32a54fca600e</uuid>
  <memory unit='KiB'>41943040</memory>
  <currentMemory unit='KiB'>41943040</currentMemory>
  <vcpu placement='static'>40</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
  </os>
  …
</domain>

Make the following changes— highlighted in bold below — where the cpuset matches the first layout output by the previous cpulist command:

<domain type='kvm'>
  <name>pexip-conf1</name>
  <uuid>53a14645-f9ef-4424-a544-32a54fca600e</uuid>
  <memory unit='KiB'>41943040</memory>
  <currentMemory unit='KiB'>41943040</currentMemory>
  <vcpu placement='static' cpuset='0-19,40-59'>40</vcpu>
  <cputune><emulatorpin cpuset='0-19,40-59'></emulatorpin></cputune>
  <numatune><memory mode='strict' nodeset='0'/></numatune>
  <os>
    <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
  </os>
  …
</domain>

Repeat the process to edit the domain XML of the next Conferencing Node with virsh edit <vm_name>

In this case:

The cpuset should match the second layout output by the cpulist command.
The value of nodeset (within <numatune>) is now 1 as this is the second NUMA node.

<domain type='kvm'>
  <name>pexip-conf2</name>
  <uuid>29ce119a-cc42-4c32-8e33-04c6f0ace7c7</uuid>
  <memory unit='KiB'>41943040</memory>
  <currentMemory unit='KiB'>41943040</currentMemory>
  <vcpu placement='static' cpuset='20-39,60-79'>40</vcpu>
  <cputune><emulatorpin cpuset='20-39,60-79'></emulatorpin></cputune>
  <numatune><memory mode='strict' nodeset='1'/></numatune>
  <os>
    <type arch='x86_64' machine='pc-q35-7.2'>hvm</type>
  </os>
  …
</domain>

Repeat the process, following the same pattern, for any further Conferencing Nodes.

Boot the VMs

You can now boot all of the VMs using the following command for each VM:

# virsh start <vm_name>

Verifying your NUMA configuration

To verify your NUMA configuration you should run the vcpuinfo command:

# virsh vcpuinfo <vm_name>

Example output (for vCPU0):

VCPU:           0
CPU:            22
State:          running
CPU time:       40.7s
CPU Affinity:   --------------------yyyyyyyyyyyyyyyyyyyy--------------------yyyyyyyyyyyyyyyyyyyy

…

You get this set of 5 lines for every vCPU.

This example shows the 2nd Conferencing Node (on NUMA node 1). For the 1st Conferencing Node (on NUMA node 0), the dashes and ys will be reversed.

Viewing updated capacity

To view the updated capacity of the Conferencing Nodes, log in to the Pexip Management Node, select Status > Conferencing Nodes and then select one of the nodes you have just updated. The Maximum capacity - HD connections field should now show more total capacity than before, but less on a per-total-GHz basis.

Checking for warnings

You should check for warnings by searching the administrator log (History & Logs > Administrator log) for "sampling".

A successful run of the above example should return something like:

2015-04-05T18:25:40.390+00:00 softlayer-lon02-cnf02 2015-04-05 18:25:40,389 Level="INFO" Name="administrator.system" Message="Performance sampling finished" Detail="HD=31 SD=60 Audio=240"

An unsuccessful run, where the Conferencing Node has been split over multiple NUMA nodes, would return the following warning in addition to the result of the performance sampling:

2015-04-06T17:42:17.084+00:00 softlayer-lon02-cnf02 2015-04-06 17:42:17,083 Level="WARNING" Name="administrator.system" Message="Multiple numa nodes detected during sampling" Detail="We strongly recommend that a Pexip Infinity Conferencing Node is deployed on a single NUMA node"

2015-04-06T17:42:17.087+00:00 softlayer-lon02-cnf02 2015-04-06 17:42:17,086 Level="INFO" Name="administrator.system" Message="Performance sampling finished" Detail="HD=21 SD=42 Audio=168"