Optimizing NFV for mobile infrastructure (Reader Forum)

All of the major mobile operators are beginning to use network functions virtualization (NFV) to implement fully-virtualized network infrastructures by 2020, but traditional server-based networking models limit performance and increase costs. NFV applications are I/O-intensive, and mobile operators can overspend on servers to deliver the needed I/O performance because network processing on a server can use half of the available CPU cores. By offloading server-based network processing to SmartNICs, however, mobile operators can slash the number of servers needed for NFV infrastructure and accelerate NFV infrastructure (NFVI) processing.

Why mobile operators want NFV

Mobile operators want NFV because it gives them more agility and reduces costs. It moves them away from purpose-built hardware, using commodity off-the-shelf (COTS) servers instead. COTS servers are far less expensive than purpose-built hardware, and by implementing key network functions in software under SDN control, mobile operators gain agility.

Operating agility has become increasingly important for mobile operators because of the rapid and large increase in customer demand for services. Operators need to be able to scale up and scale down services as demand fluctuates, and to quickly deploy services in hotspots, as when serving a major sporting event. Agility also means that operators can add new services quickly in a race to beat the competition to market.

NFV functions (VNFs)
NFV functions (known as virtual network functions, or VNFs), are available to handle all of the key functions of mobile core and edge operations. Examples include virtualized evolved packet core (vEPC) operations such as the packet gateway function (P-GW), the serving gateway function (S-GW), and the mobility management entity (MME). All of these components go into the evolved packet core, and they can be instantiated as VNFs. When implemented on a server, VNFs can leverage virtual switching in a hypervisor to get rich networking services such as overlay network tunneling, security policy, and fine-grained statistics and metering.

NFV challenges
VNFs are typically very I/O intensive, so they need to process a lot of packets going in and out. NFV infrastructure uses a virtual switching layer, and it is typically a bottleneck because that layer is implemented on the server as part of the hypervisor. The hypervisor sits directly in the data path of the packets that need to be delivered from the network to the computer’s memory.

With a virtual machine that is running a VNF, the packets go from the network to the virtual switching layer (OVS, contrail vRouter, VMware, etc.), and then to the virtual machine. There are three issues with this.

VNF starvation – Servers are not very good at processing packets at high data rates, which causes the virtual switch (vSwitch) in the hypervisor to become a bottleneck, so the VNFs get starved for packets. For example, there might be a VNF like a packet gateway, and that VNF is capable of processing 10 million packets per second (pps). But if every packet has to take a hop through the virtual switch in the hypervisor, then the VNF can only deliver the performance that vSwitch is capable of. In most cases, the virtual switch can process only a fraction of the VNF’s capacity – perhaps two million pps.

Decreased server utilization – Server CPUs are also not very efficient at processing packets, which means it takes a lot of server CPU cores to handle I/O processing. Typically, to support high data rates, a network manager may have to allocate 8-12 physical CPU cores out of a total of 24 cores (in a dual-socket, 12-core CPU server), so that would be 50% of total available CPU resources just to implement virtual switching functions in the hypervisor. This means that because the data center is using 50 percent of CPU cores to process packets, it has lost 50% of the available application processing power.

Latency – The extra steps needed to process I/O on a virtual switch create latency. Low latency is very important, especially in mobile networks, and if the packets must take that extra step to run through the hypervisor for virtual switching, those incur additional latency. By eliminating those extra steps, you can reduce latency.

To make matters worse, the amount of CPU resources consumed by networking functions has been steadily increasing over time. This is due to three main factors:

1) The complexity of server-based networking functions has been increasing. A prime example of this is more complex tunneling protocols such as VXLAN and GRE that are needed to support overlay networks.
2) Network port speeds continue to increase, from 1G to 10G, and now 25G, 40G, 50G, and even 100G server port speeds are not uncommon. This results in a commensurate increase in packet per second rates, dramatically pushing up the server workload.
3) Server chip technologies are hitting the limits of Moore’s Law, which means that available server CPU cycles are not growing at a sufficient pace to compensate for 1 and 2.

Addressing Virtual Switching Challenges

The best way to address these problems is to offload the virtual switching from the server CPUs. SmartNICs make this possible. A SmartNIC is a network interface card that includes a programmable network processor that can be used to handle virtual switching and related applications.

In a SmartNIC architecture, you take the virtual switching out of the hypervisor on the server and move it down onto the SmartNIC. As the packet traverses the SmartNIC it is switched and goes directly across the PCI bus and into the server memory where it can be processed. The SmartNIC thus eliminates the extra hop into the hypervisor, addressing the latency problem.

Further, since you’re processing packets on the SmartNIC, you are not using server CPU cores to do that processing. You can thus reclaim the 12 cores you were using to handle switching. Figure 1 shows a comparison of server utilization for virtual switching, with and without SmartNICs.

Figure 1: SmartNICs significantly improve server utilization by offloading virtual switch data plane processing from server CPU cores to the SmartNIC. For I/O-intensive NFV applications, this can reclaim more than 50% of the overall server resources.

Finally, you remove the potential processing bottleneck because the SmartNIC can process packets at a much higher data rates than can servers. In fact, SmartNICs have been demonstrated to perform 20 times faster processing than on a server. Figure 2 compares packet processing performance on a server versus a SmartNIC.

Figure 2: SmartNICs outperform servers at packet processing.

As mobile operators scale their NFV deployments with COTS servers, server-related I/O bottlenecks will become a major problem for network performance, latency and server usage efficiency. SmartNICs address these problems. Mobile operators are doing testing and early deployments of SmartNICs now, and this activity will accelerate during the coming months.

Optimizing NFV for mobile infrastructure (Reader Forum)

ABOUT US

FOLLOW US

Optimizing NFV for mobile infrastructure (Reader Forum)

RELATED POSTS

ABOUT US

FOLLOW US