YOU ARE AT:Big Data AnalyticsReader Forum: An alternative approach to real-time big data analytics (Pt. 1)

Reader Forum: An alternative approach to real-time big data analytics (Pt. 1)

Editor’s Note: Welcome to our weekly Reader Forum section. In an attempt to broaden our interaction with our readers we have created this forum for those with something meaningful to say to the wireless industry. We want to keep this as open as possible, but we maintain some editorial control to keep it free of commercials or attacks. Please send along submissions for this section to our editors at: [email protected].

Real-time big data analytics has become a frequent buzzword among big data discussions. The term, “real-time big data analytics,” was coined by Mark Barlow in a report entitled “Real-time big data analytics: Emerging architecture”. The report revealed that RTBDA is an essential aspect and value proposition of big data analytics, specifically the ability to make decisions in real time based on the analysis of available information. Several Internet/over-the-top companies such as Amazon and Google use this strategy.

These OTT players are a source of both inspiration and frustration for telecom carriers who must come to terms with the growing volume of traffic generated across the network, despite the minimum revenue contribution.

In this two-part article, we will take a closer look at RTBDA, specifically in the context of telecom networks. Fortunately, the technologies required to implement such a strategy are available and in use; however they are not as effective as they could be.

RTBDA 101

In its simplest form, big data analytics can be broken down into two parts that differentiate it from business intelligence or data warehousing and mining:
–Distributed, parallel processing.

–The ability to act in real time.

Although big data analytics tackles several challenges, it addresses the need to process large disparate data sets that cannot usually be accommodated by a lone database or server.

One way to tackle this problem is to use distributed, parallel processing where large data sets are distributed amongst multiple servers, through which each server processes a segment of the data set, in parallel. Whether you are working with structured or unstructured data, big data analytics can be implemented, as it does not require a specific structure for the data. Using Hadoop with MapReduce is an example of such an approach and can be credited as a driving force behind the current interest in big data.

Today there exist solutions for processing large amounts of data; however, the big data perspective delivers a unique advantage, in that processing can be completed within a defined time frame. That time frame is now increasingly being associated with “real-time.”

RTBDA is relatively new, but it addresses the need to act proactively or re-actively in real time. It is inspired by the capabilities of Internet content and services providers to understand what is happening, analyze the situation and take action in real time.

What does ‘real-time’ mean for telecom

Defining “real time” depends on the context of the environment you are working in and what your goals are. In some cases, seconds or microseconds are sufficient, and for others, real-time must be faster.

This is an interesting question for telecoms. It exposes a potential problem with existing practices in telecom that need to be addressed if carriers are to successfully tackle the challenges that OTT traffic is posing. The current definition of “real time” in telecom may no longer be sufficient.

Traditionally, telecom networks were based on connection-oriented technology. Protocols and changes could only be applied centrally in a highly structured process. Also, the network did not alter very much from one minute – or even one hour – to the next.

In this situation, gathering information from the network at recurring intervals was sufficient. The protocols that were used had substantial management information, so a great deal of insight could be collected from just one protocol header. In this instance “real time” can be defined in seconds or even minutes, which is why collecting call detail records every five to 15 minutes was enough to gain full insight.

This is no longer applicable to today’s environment. The transition to LTE has forced telecom carriers to evolve to packet networks based on Ethernet and IP, which function in a completely different way compared to connection-oriented technologies and protocols.

The primary principle of IP networks is that the network takes care of itself. The network outlines the flow of traffic and reroutes that path in the event that there is congestion or other conditions. This allows the network to quickly react to changes. The downside is that you cannot predict with certainty where traffic will be flowing. This is not made any easier by the fact that Ethernet and IP protocols are not designed to contain the same level of management information overhead that connection-oriented protocols provide.

Packet networks are by default, dynamic and “bursty.” They are intended to support numerous services consumed by several users sharing the same infrastructure. Over a long period of time, the network utilization can look quite low, but the reality is that traffic is transmitted in bursts, which can expend the entire bandwidth. In these conditions, the IP network is expected to react and guarantee that traffic is routed in a balanced way through the network. Ultimately, changes can occur in the network from one Ethernet frame or IP packet to the next.

The main problem with how telecom network management and data analytics are being performed today is that they both rely on CDRs, event detail records and IP detail records to understand what is happening in real time.

However, this definition of “real time” is fastened in the archetype of the past when sampling every few minutes was enough. Ethernet frames in a 10-gigabit-per-second network can be transmitted with as little as 67 nanoseconds between each frame. This helps us begin to understand what “real time” means in a packet network. It is not minutes or even seconds, but nanoseconds.

Decision making in real time

Using CDRs, EDRs and IPDRs for big data analytics is a good idea, but it depends on what you are trying to accomplish. Big data analytics can be applied to two broad categories of decision-making:

–Real-time decision making.

–Enhanced planning and optimization of services and networks centered on trends and predictive analysis.

Implementing the use of detailed records for enhanced planning and optimization alongside other structured and unstructured data sources is crucial. These records host a wealth of information and useful trends and predictions can be produced based on this data. This information provides an incomplete picture until it is complemented by real-time information from packet networks that can deliver precise details on what happened and when.

Naturally, detailed records cannot be used for real-time decision-making. It is not compatible with our understanding of what real-time should be in packet networks since they are only collected every five to 15 minutes. For true real-time decision-making, it is necessary to continuously collect, store and analyze network information. In order to fully understand what is happening, all the pertinent Ethernet frames and IP packets must to be surveyed in real time.

By capturing and storing network data in this way, we not only facilitate the ability to analyze and make decisions in real time but also act as a source of detailed, reliable information on what and when an event occurred in the network to complement other big data analytic activities.

In part two of this article, we will explore how to apply RTBDA in the telecom network, including implementation, storage and decision making in a real-time environment.

ABOUT AUTHOR