Telcobright SMS Platform is a highly scalable distributed carrier-grade SMS platform with no single point of failure. The platform can be used as a cost-effective Short Message Service Center (SMSC), an SMS Gateway between two operators or a bulk SMS platform through which users and applications can send large volume of A2P SMS. Today, most of the SMS solutions available in the market have an outdated monolithic design which is difficult to change, integrate and scale.
We support all necessary signaling protocols required for SMS communication including GSM MAP, SMPP, REST over HTTP and others. We use asynchronous non-blocking API for sending SMS through the operators’ gateway. It means, high throughput because the messages in the queue will not block until the response of the previous SMS is received.
Other than Kafka as mentioned above, we have used other battle-tested DevOPS tools through which leading tech-companies achieve unmatched visibility, carrier-grade reliability and unmatched visibility.
The platform will allow an enterprise to manage its massive SMS sending requirements efficiently. Traditionally at the sending end, it will connect with mobile operators over SMPP/HTTP or SIGTRAN protocols to send SMS for mobile subscribers. Whereas the platform can be used in two modes at the receiving end or client facing side.
End users can use built-in web interface to manage bulk SMS sending task. They can send single or multiple SMS tasks through the web interface or they can upload excel files to create campaigns. The campaigns can be flexibly scheduled to run immediately or on a future date with flexible options.
The platform supports excellent API integration facility to connect with various automated applications over REST, SOAP or other protocols. Client applications can send large volume of traffic at a very high rate and the platform durably processes those at high speed. It’s challenging to achieve speed and absolute reliability at high very high load, Kafka makes the job easier. We can achieve thousands of TPS/VM quite easily but we are likely to operate at the line rate offered by the MNOs.
In both modes, the platform interfaces with the clients and the suppliers’ gateway using any protocol, for example, it can receive an SMS over GSM/MAP SMPP protocol and send it out to an egress route over HTTP/REST and vice-verse. The protocol conversion is seamless- users, admins and other processes interact with the system in the same manner. For instance, the same business logic for accounting and routing will be executed regardless of ingress, egress protocol.
Telcobright SMS platform operates in 3 modes as described in the figure on the right.
Some our important value propositions are summarized below:
Java is used as the main backend programming language, top open source tools in the Java ecosystem have been used. For example, Spring Boot- the now de-facto standard for enterprise web application development is used as the backbone of the application. We have used spring to serve the purpose of writing web applications, exposing RESTFUL web services, communicating with the database, writing the main business logics. Also, spring security plays an important role to provide the JWT based security mechanism throughout the application. The use of Java Persistence API through spring data makes the application database agnostic, any standard SQL database can be used. A summary of the programming languages and associated tools are given below.
Software design principles and Enterprise designs patterns have been followed in adherence with the recommendations from the following famous books:
Example of Recommendations followed from this book:
Repository and Service Patterns, Dependency Inversion etc.
SOLID Principle:
have been followed.
The proposed solution is compatible with the communication protocols used by the Mobile Network Operators (MNO) in Bangladesh. It will also support other operators that use the same protocols or close variants. The solution adheres to important recommendations from the guidelines listed below, promoted by the renowned Standardization bodies.
SMPP v3.4 Protocol Implementation guide for GSM / UMTS
Version 1.0
We use a modern, distributed microservices based architecture where each service is deployed in a decoupled, fault tolerant manner. The system is highly scalable, processing capacity can be easily be extended by adding new nodes or VM without breaking any software logic. Top-class Open-source tools/software are used to create the building blocks of the platform. The same tools and design patterns are followed by the leading enterprises in the world. For example, we use Apache KAFKA as the heart of messaging or data streaming platform, which is used by 80% of the top fortune 100 companies in the world to solve problems associated with massive data flow.
Kafka is a real-time data streaming platform which allows to create high speed, high throughput data pipeline spread over any number of distributed servers located anywhere. This enables us to build a genuinely distributed fault tolerant system which is infinitely scalable and very, very reliable.
A typical SMS platform works in a request-response manner, SMS messages are processed immediately in-memory and the system tries to route the messages through destination gateways. But, this approach often fails due to sudden surge in traffic, failure or capacity bottleneck on the supplier side. As a result, it’s very difficult to enforce efficient retry policy or else enough visibility to formulate a durable system upon that can be used in a large enterprise or business. It’s not always predictable how the system will behave or recover from a fault under heavy load situations.
But, we apache Kafka as a message broker in the middle of everything, regardless of how many SMS we receive through any interfaces, at any speed- each task is persisted in the cluster immediately like writing entries in the database. Each SMS becomes a task created by a producer, such as an API Gateway for SMS. Whereas, there are multiple number of “SMS Router” service instances, which are consumer of the tasks that persist in the Kafka database.
Every single SMS task persists in the database and survives any kind of fault or emergencies because the system state is saved in a reliable manner in the disks. As the whole system works in an event driven manner, the event flow or playback starts whenever the system resumes after a crush or fault. It’s guaranteed that the system survives any failure or emergency because it’s always in a valid state. And whenever services resume, they system starts performing its task reliably, right after the point where it used to operate successfully before the failure. The risk of data loss is minimum in this approach. Also, high availability is ensured through multiple servers
in the cluster. The state of each processing tasks can be seen, tracked and controlled at any point in time, ensuring tremendous manageability and visibility.
We recommend 3 servers to create a carrier-grade fault tolerant SMS processing cluster. It’s a common best practice applied by the leading tech companies in the world to keep 3 copies of their mission critical data. Based on the traffic portfolio, we provision the required number of VM and containers in a private cloud.
However, the system can run equally well in a fully virtualized environment. In the absence of a dedicated server environment, VM and container allocation is performed through workshops with our clients.
general guideline without specific knowledge of the traffic profile, the following server configurations should be enough to meet the computing and storage requirements for processing 5 Million SMS per day.
The whole platform is divided into 4 modules as illustrated below. Each module contains multiple small, distributed microservices.
Provides near real-time reporting and analytics through a rich reporting engine/portal. Slowing down of reports with large and increasing volume is a very common problem in telco systems. Often it creates problem identifying faults and their causes in the network, causing severe inconvenience and often loss of service. Telcobright manages a blazing fast reporting engine regardless of the data volume through various optimization techniques namely:
Partitioning the CDR table- CDR tables are date wise partitioned in the database. Queries use partition pruning technique to check for records in only limited number of partitions, in other words, only certain locations in the storage disks.
Summary Data Generation- Telcobright generates day and hour wise meta data for all the SMS. Most used reports are fetched from summarized data, instead of searching through millions of CDR records in the storage disks. This significantly increases performance and enables almost instant result of reports even over a few months period.
Example screenshots of a report with various filter criteria is given below.
The SMS sending mechanism applies an asynchronous non-blocking architecture in general, which means at not point in any service, an SMS must wait until the acknowledgement of the previous message has arrived. This allows high throughput and low latency. The mechanism becomes practically very effective and useful when shooting the messages out of the provider MNO gateways. Very often, MNOs cannot send the acknowledgement (Delivery) status instantly, they may supply it on a later time based on various characteristics of the mobile network. Our platform must not wait for the delivery reports, instead it sends the messages out to the gateways and listens for acknowledgements for each SMS.
Whenever the acknowledgement arrives, it gets queued in the Kafka cluster for further processing. Kafka sends the message to the right consumer e.g. an accounting process for closure. The task gets marked complete by the main SMS process and a CDR is generated.
When sending SMS over SMMPP, our platform works like an External Short Messaging Entity (ESME) which interface with the Message Centers (MC) of the mobile operators. We support the standard and most common mode of operation when sending SMS toward mobile operators which is described below and show in the picture beside.
The SMPP protocol is a set of operations, each one taking the form of a request and response Protocol Data Unit (PDU) containing an SMPP command. For example, if an ESME wishes to submit a short message, it may send a submit_sm PDU to the MC. The MC responds with a submit_sm_resp PDU, indicating the success or failure of the request. Likewise, if an MC wishes to deliver a message to an ESME, it may send a deliver_sm PDU to an ESME, which in turn responds with a deliver_sm_resp PDU as a means of acknowledging the delivery.
In general, we use popular REST API for connecting to clients and suppliers over HTTP or HTTPS. As REST API specifications does not dictate any standard request/reply construct, we offer a standard set of APIs to the clients and follow the API doc given by the provider or suppliers. We have experience in connecting the MNOs in Bangladesh using custom API.
The core engine of the solution is a SS7 Signal Transfer Point (STP) software from a renowned vendor which is the main enabler for the communications with the wireless networks for exchanging SMS. In terms of reliability, the solution itself is carrier-grade, but the most notable fact is that the solution is not a “Blackbox” solution usually sold in the telco market. The software offers Software Development Kit (SDK) and hundreds of API to customize almost everything at all levels of communication.
The Dialogic DSI G5V is a programmable STP or signaling platform which is highly extensible allows building rich telecom appliances rapidly. In simple terms, it’s difficult and expensive to add a feature in a blackbox telco product, but the DSI is meant for this. It allows easy interface to manipulate the protocol messages belonging to any layer of the SS7 signaling stack. Which gives tremendous flexibility in customization. As a result, new services addition and solving compatibility issues in a multi-vendor environment can be addressed so easily and without expensive change requests to vendors.
Link: https://www.dialogic.com/signaling- and-ss7-components/download/dsi- interface-protocol-stacks
We as a System Integrator and software company follow the latest trend and best practices in the software development world. We will use the microservices architecture which is the most popular architecture now for building highly scalable and fault tolerant mission critical IT systems. Companies like Netflix, eBay, Amazon, Twitter, PayPal have evolved to migrate their core systems to Microservices for much greater benefit. More info on architecture is given in later sections of this document, but a quick overview can be found through the original microservices docs at https://microservices.io/.
As shown in figure-1: there are 3 core modules in the system, each having multiple components or sub-modules.
The following table illustrates an overview of the features and functionality of the submodules under the 3 main modules.
All operators are using GSM MAP protocol on top of SS7 stack at the application layer. The signaling or process flow is depicted clearly in the SMS Hubbing Architecture Guidelines on page 35, section 7.1 Current bi-lateral Client Operator to Client Operator Architecture. The figure below with the process description are copied from the guideline.
The scenario described in the RFP is mentioned on page 37, section 7.2 SS7 Based Hubbing, which is also copied below.
Among all the vast features of the DSI G5V STP, we discuss a few important functionalities that will be required by the project.
The V-HUB should be implemented in a manner such that it remains as transparent as possible while carrying SMS to reduce complexity and ensure maximum performance. However, as per the standard guidelines and for generating accounting records – signaling address manipulation may be required at multiple layers.
The RFP has mentioned SMS screening, blocking, or firewalling. The scope could be very broad, and only a programmable STP capable of inspecting and manipulating SMS traffic at the application layer can meet the requirements. The offered DSI G5V programmable STP is perfectly capable of doing this job. Also, the microservices architecture suits the scalability requirements.
SMS Hubbing will be realized through multiple transactions at the TCAP layer of the signaling stack. The uppermost protocol MAP will use TCAP to reliably perform necessary tasks for sending SMS, such as- looking up the location of the recipient or finally forwarding the SMS for delivery. Transactions are specified to be a key criterion for dimensioning the system. It’s discussed in detail under this section later.
Call Detail Record (CDR) generation is required for accounting or charging purpose. On the other hand, performance statistics or counters reflect the Key Performance Indicator (KPI) of the system. The methods for generating these records are similar, both can be generated by a designated Java process that listens to appropriate events from the SS7 core module via a microservices component called “Messaging Queue (MQ)”. The process is described further in the description of this section later in this chapter.
Signal Transfer
GSM MAP, along with TCAP protocols will be used at the V-HUB to connect with the ANS and exchange SMS traffic. These protocols run on top of other SS7 protocols which altogether makes the communication alive. SS7 was originally a protocol for the TDM network and the signaling traffic were used to carry through low speed signaling links. But as IP network evolved and became more popular, application-level protocols are now transported over high-speed IP networks.
The RFP asks to implement SIGTRAN for transporting SMS traffic (MAP+TCAP) over IP network. The very high-level topology will look like this:
Accordingly, the protocol stack to be implemented is illustrated in the figure below. More discussions on protocol stack and SIGTRAN implementation strategy is covered later in section “5.3. SIGTRAN Specifications and Implementation Proposal”.
This is the STP functionality of the HUB and is equivalent to a router or a proxy server in an IP network. The V-HUB will screen or discard illegal packets (called MSSU for SS7), for example, packets from unknown source address or a service that is not allowed. Allowed packets from source operators (ANS and PSTN) will be forwarded to the next hop ANS.
The easiest way to route packets for the HUB is to look at the SCCP Called Party GT and forward to the destination point code that has been provisioned for the destination GT in the database. Neverthless, the source and
destination point codes info at the M3UA layer will be replaced by the STP with its own.
Address Manipulation by acting as VHLR, VMSC
In addition, the STP is capable of manipulating address information at any layer of signaling if deemed necessary for the interconnection. It can implement the examples and guidelines from GSMA SMS Hubbing guidelines in section
6.1.26 SS7 Transparency – Address Manipulation and associated chapters. This is how the STP can work as a VHLR or VMSC, because it’s simulating the behavior of wireless network equipment.
An appropriate and optimized address manipulation ruleset will be implemented in the V-HUB based on the decisions made during the implementation phase of the project. However, the STP software is quite flexible in this regard and can implement the various scenarios those are described in the GSMA SMS Hubbing guidelines. The picture below is taken from section “6.1.26.3 Detailed diagram of the message flow”, it illustrates some address manipulation technique (in red font) at the intermediary hubs, which can be successfully implemented in the V-HUB.
SIGTRAN Specifications and Implementation Proposal