As operators embrace multi-cloud and cloud-native architectures, cloud-native function validation is proving to be an ongoing requirement than a one-time check
As operators have migrated to multi-cloud, the reliability of network functions has become a new source of worry. Many experience container failures, jitters, latency, and resource constraints settling into this new cloud-native base.
The primary cause of these failures is unfamiliarity with the new container-based technologies. The previous architecture featured monolithic applications, whereas the new cloud-native functions (CNFs) are built as microservices which break each function into potentially thousands of individual software components. When these are run on cloud which is essentially optimized for performance and cost, but not reliability, the operation becomes precarious.
Furthermore, the disaggregated and dynamic multi-vendor model, which on one hand provides freedom from vendor lock-ins and lucrative cost savings, on the other hand, introduces new levels of complexity.
“The fundamental difference is [operators] are still getting the cloud-native functions from their [previous] vendor, but they’re getting the cloud from a different vendor, and usually getting the hardware from even a different vendor. What that means is now operators themselves assume the responsibility of systems integration,” said Bill Clark, principal product manager, 5G Cloud-native Deployment Validation at Spirent.
This requires fundamentally different approaches — including testing and service assurance — as well as different expertise.
But few prepare for failures moving into a platform that is famous for performance and resiliency, an oversight that has a high cost.
“In this new world, there’re things like pods and nodes and containers. None of those things ever existed before Kubernetes and cloud native…They have to be very conscious about what if something fails in the middle of the cloud,” cautioned Clark, adding that
the earlier organizational structures often change and blur in this new environment.
Functional failure is quite common in the cloud-native network. So, when a CNF failure happens in this stack, identifying the source becomes a hunting-and-pecking game.
CNF validation has emerged as a valuable mechanism for ensuring quick recovery and service continuity. Validating CNFs’ resiliency helps diagnose unexpected behavior and ensures proper functioning in production in the public cloud. What-if scenarios are simulated in a safe lab environment to test CNFs’ ability to come back from a failure and resume service. The tests offer visibility into key fault indicators that point to hidden risks, and verify redundancy, failover, and policy responses to ensure high availability (HA).
But testing an application in pre-production can only do so much. One also needs to
understand how cloud issues can impact the applications and services in the live environment. And for this, the second dimension needs to be considered: the timing of the validation.
Companies are increasingly adopting proactive and continuous testing, a concept that is best described as ‘Lab-to-Live’. Lab-to-Live replaces siloed testing activities with a more holistic set of testing methods integrated in the DevOps lifecycle, providing a higher degree of protection.
It involves continuous validation of CNFs against all real failure scenarios, whether that’s at the container-, pod- or node-level, latency and packet loss issues, or resource constraints. Starting at pre-deployment and continuing through the lifecycle of the application, continuous testing helps answer whether the cloud provider is meeting the CNF’s performance demands in a way that meets service-level agreements (SLAs), and what needs to be done to resolve it if it does not.
Spirent offers a platform-agnostic solution that performs CNF validation right out-of-box. Spirent positions the solution as a more affordable alternative to open-source testing tools that are way more resource-intensive. Clark describes Landslide as “something that hasn’t really existed before”. In practice, the solution allows CNF validation to move from the pre-deployment stage to being an ongoing exercise across the application lifecycle.
Although CNF validation has a nuanced vocabulary — different engineering teams use different terms — its significance cannot be overemphasized in the disaggregated and dynamic world of multi-cloud.
