It’s a time of transition for mobile network operators: LTE to 5G NonStandalone, then to 5G Standalone; from Voice over LTE to Voice over New Radio; implementation of mobile edge computing and dabbling in Open RAN, and a more complete realization of cloud-native network functions.
As they navigate those shifts in technology and architecture, the need for visibility into how their newly upgraded networks are functioning is ever more important—but it’s also way more complicated, and the migrations vary depending on the stage of technology development when each operator decided to jump in and adopt it (so early 5G SA implementations will be different than later ones).
Rick Fulwiler, chief solutions architect in the CTO office at Netscout, explains that “The 5G environment that starts addressing Standalone is migrating from the traditional network elements that are running on hardware versus virtual machines running in, say, OpenStack—i.e., what T-Mobile is turning up—then migrating to virtual functions running in a Kubernetes containers kind of environment with microservices. That’s a transition by everybody: It’s a transition by the network equipment manufacturers, it’s a transition to the operators themselves, to understanding what does it mean to put now these virtual functions inside a … cloud-based environment now running inside of my prem, and it’s also an interesting challenge to us as service assurance, to have visibility into that environment, too.”
Amid this, he said, carriers are facing the issue of how to put everything all together: To work directly with NEMs (who may still bring in their own hardware) to bring the new network to fruition, versus a Kubernetes-as-a-service architecture approach where NEMS drop in their network functions and it’s on the carrier to bring it across the finish line. Operators are debating “how much they want to bite off in terms of making that transition into supporting a different architecture,” Fulwiler says.
“The next obvious transition is thinking about slicing,” he said. Slicing is clearly on the minds of some carriers and NEMs, with Ericsson and Telefonica just announcing this week that they successfully demonstrated end-to-end slicing to the tune of being able to bring up a slice in less than an hour through automation. (Read more on that here.) But, Fulwiler continues, “I think before we think about slicing, we probably need to think a little bit about the MEC.”
And carriers clearly are, as seen in Verizon’s partnerships with AWS and Azure related to edge compute support in a 5G context; a report commissioned by AT&T earlier this year declared that “Public sector edge computing has arrived”. As carriers make those MEC partnerships, Fulwiler says, one of the first series of questions for applications that go into such zones is all about assurance: Can the operator ensure that they are providing the promised service levels, particularly around latency and application resiliency? Network testing and assurance companies have been preparing for this. Netscout’s visibility tools are already available in hyperscalers’ marketplaces for erasing “visibility borders” inside various cloud environments, Fulwiler notes. And he says that Netscout has also already gone one better in preparing for the type of SLAs that operators will have to assure as they turn up new 5G applications in those MEC zones.
“Latency is a huge, paramount thing to measure, especially now as we’re linking the core part of the network with the packet gateways back into the MEC,” he explains. “We want to make sure that we are providing a strict SLA related to that latency. So we developed the capability to look at real-time latency; rather than looking at it on cycles, we now are looking at it in a real-time basis, all the way from the hyperscaler environment back into any of the linkages to sub-hyperscalers, such as remote hyperscalers, or back into the core itself. We have an end-to-end, bi-directional view looking at real-time latency.”
As carriers start turning up latency-sensitive applications, “we have now that level of visibility to a really super-critical type of SLA that carriers need to make sure they ensure,” he continues. “What we’re seeing is as applications start coming up, latency may be creeping up over time. This is why real-time latency is so important.”
Next up will be slicing, which he says Netscout thinks of as very similar to segmenting the network into multiple virtual private networks (VPNs) and providing visibility and KPIs from RAN to core in a way that he says is available already, even though slicing itself is new.
When it comes to slicing, he says, “the biggest struggle is, ‘Hey, I’ve got these [virtual] network functions now I need to make sure that they’re going to be running for some period of time and not causing problems, and now I need to run slicing on top of that’ and that starts to get very complicated.” He thinks that defined, static slices will be the first step for operators, followed eventually by a more dynamic slicing environment. “I think that’s probably going to be years out, for a couple different reasons,” he adds. “One of the major reasons is, obviously as you start looking at dynamic slicing, you may be putting more load on the network, which means you need to bring up more instances of network functions. That’s very complex coordination into those network functions and upper level orchestration systems. …
“I think most carriers probably won’t get there until probably a few more years [in the] future, just because of everything else that they have to make sure that’s still up and running today,” he says.
As Netscout looks at the future of service assurance mid the escalating level of complexity in already-complex networks, Fulwiler says that it is turning to automation.
“Developing more KPIs and more screens and more things to look at really, at the end of the day, doesn’t really help the operator out,” he says. “It’s just more stuff to look at.” As Netscout works with customers, he said, they hear that the biggest obstacle to troubleshooting is often the long process that involves drawing in multiple experts in the different pieces and parts of the network in order to go through the data that its tools provide. In its Omnis Automation offering, he explains, Netscout is using AI/ML tech to apply telecom knowledge: essentially “teaching machines what to look for”, what type of anomalies are important and where they usually lead. If you have call drops at the start of a call, that’s a very different scenario than call drops in the middle or later in a call, he explains. At this stage of AI/ML, being able to chain together the common bits of knowledge that humans have accumulated on how to identify problems may not give an exact lever to pull or button to push to fix something instantly, he acknowledges — but it can probably make a big dent in how long it takes to figure out which lever or button you need.
“We think this is going to save a tremendous amount of time and legwork for our carrier customers,” Fulwiler said. If it’s comparable to the solutions’ utility in security, that could be quite a bit: Netscout is also leveraging Omnis’ automation capabilities for security response, and last month released numbers from a Forrester commissioned study that found that use of the solution against DDoS attacks both provided better security coverage and saved more than 2,000 hours of operational time over three years while cutting down the time to detect and respond to DDoS attacks by 144 hours over that period.