âNetwork error: Thereâs a problem connecting to the application.â
Other than the dreaded âblue screen of death,â a network error warning is quite possibly a hard-working employeeâs most aggravating computer messageâespecially because it always seems to happen at the absolute worst moment, like some kind of cruel joke. Maybe itâs also because in our fast-paced, always-available world, every moment feels urgent. And if weâre going to makeÂ ourselvesÂ available at all times, we expect no less from our appsâwhether for business or leisure.
For cloud-based business communications and collaboration solutions, the importance of continuous availability only increases. Because communication is at the heart of any successful organisation, communications solutions need to withstand a multitude of obstacles. These include natural disasters, seasonal surges (such as the first day of school or holiday buying), unexpected surges (such asÂ what weâve experienced with COVID-19), or company-specific issues (such as hosting a large all-hands session online). In addition to these variables,Â Unified Communications as a Service (UCaaS)Â andÂ Contact centre as a Service (CCaaS)Â providers also need to remain available across many different devices (laptop, mobile, or tablet) and connectivity options (WiFi, 3G/4G/5G, or a switch from one to the other) that customers might use to connect.
What does âFive 9sâ mean (also known as âFive Ninesâ)?
The availability of a cloud solution is usually expressed as a percentage of the amount of time that solution is up and running (known as uptime) in a given year. Most enterprise communications solution providers offer Service Level Agreements (SLAs) that commit to a certain minimum percentage of uptime in a given period (or conversely maximum downtime).
In the figure below, you can see how availability percentages equate to downtime over the course of days, weeks, months, and years. In a perfect world, a cloud solution would be available 100% of the time. Unfortunately, we donât live in a perfect world, but the good news is that when it comes to uptime, weâre not far off. For example, some companies offerÂ 99.999% availabilityÂ (also known as âFive 9sâ), which translates to 5.26 minutes of downtime for that appÂ per year. Of course, not every company can guarantee that level of uptime, and lower guarantees can translate to possibly significant downtime. For example, 95% availabilityâwhich sounds like a high numberâactually equates to up to 18 days of downtime annually.
Increased downtime for cloud communications apps can actually have potentially devastating consequences, particularly in certain industries. For example:
- Healthcare: Patients canât reach doctors for critical information
- Education: Teachers canât teach remotely
- Public sector: Citizens canât reach critical government services
- Sales: Sales teams donât have access to the tools to close deals
- Support: Customer requests go unanswered and customer satisfaction suffers
How cloud providers ensure high availability
There are some critical elements that all highly available Software as a Service (SaaS) companies need to get right, starting with building a scalable, redundant, and secure infrastructure. Here are a few of the hallmarks of highly available solutions:
- Itâs critical to host cloud solutions in top tier data centres with geographic redundancy, meaning in the event of an outage in one data centre, another data centre in another location is already set up to automatically handle the load with no issues.
- Providers must also ensure this kind of capability within each data centre by using similar architectures that feature multiple layers of redundancy in case problems arise.
- Maintaining high levels of uptime requires providers who build advanced system monitoring capabilities that allow them to identify issues before they happen and quickly resolve and remediate them when they do.
- Highly available solutions providers have strong internal controls and policies in place to minimise risk and ensure uptime.
How RingCentral builds Five 9s availability
RingCentralâs cloud architecture is built on whatâs known as a multi-cloud, multi-network, point-of-delivery (PoD) design. In other words, it uses a modular approach that allows it to intelligently scale and manage increases in usage acrossÂ messaging,Â video meetings, andÂ phoneÂ solutions, while also providing resiliency and redundancy. The multi-tenant network is designed with built-in 2x capacity, which means customers can double their usage overnight without an issue. Also, systems are designed with concurrent usage in mind. This ensures that the service is always available even when there are usage fluctuations at the customerâs end.
RingCentral maintains âgeo-redundant data centres,â which means theyâre similarly configured across multiple regions to ensure that service continues despite possible outages. In the event of a data centre failure, RingCentralâs automated systems (built with active-active design), in conjunction with an always-on and world-class network operations centre (NOC), ensure a rapid transition to back-up systems as needed to maintain uninterrupted service availability. Simply put, should an issue arise in any one data centre, another data centre automatically assumes the load with no downtime.
RingCentral employs three layers of network and service redundancy to ensure that customersâ phone systems remain up and running:
- Our data centres provide the first layer of redundancy. Data between bi-coastal locations is synchronised consistently, with latency of less than one minute. Each component has a redundant power supply, which delivers seamless operation and 99.999% availability in case of geographic outages or any natural disaster. In fact, RingCentral has delivered eight consecutive quarters of 99.999% uptime SLA for our flagship product RingCentral Office.Â The data centres share hosted facilities space with some of the worldâs largest Internet companies and financial institutions. In addition, theyâre in close physical proximity to the worldâs top 20 Internet exchange points.
- Our architecture is vendor-agnostic and commodity-based, meaning itâs fully replaceable and fault-tolerant, providing a second layer of redundancy.
- Our third layer of redundancy utilises both load balancing and failover technology to keep our systems continuously up and running. For example, primary and secondary servers contain multiple servers that back each other up.
Beyond Five 9s: A commitment to relentless innovation
In addition to the architecture of RingCentral systems, we also continue to make significant investments in research and development for our applications. There have been several areas in particular where RingCentral has concentrated its attention in an effort to continuously improve our availability:
Agile development:Â With decades of stable, mature operational procedures, our proven architecture enables agile development with the ability to support our growing global customer base and partners.
Application Lifecycle Management:Â Our investments here help minimise errors, disruptions, and the risk of failure. Our engineering, cloud operations, and support teams work in concert with customers to deploy new innovations while minimising potential impacts. Our PoD deployment architecture, combined with our rigorous testing, Q&A, and staging processes, ensures that changes get synchronised while isolating updates and changes as theyâre rolled into production. This very controlled synchronisation of updates means that changes donât inadvertently create delays, outages, or downtime. Itâs also important that we work closely with customers to consider critical situational factors (e.g., surges in usage for the first day of virtual school) and evaluate the most appropriate times for change. Itâs critical to ensure that any changes have been made and tested well before these major events.
Sophisticated machine learning (ML) and artificial intelligence (AI) automation:Â When it comes to insights, collecting data is the easy part. RingCentral has built the supporting technology infrastructure and combined that knowledge with decades of industry expertise in messaging, video, and phone to create meaningful and actionable insights. OurÂ ML and AI layers are built on a single data lake that aggregates all operational, usage, and simulated testing data to identify events, correlate them, respond, and remediate. RingCentralâs sophisticated architecture is the key to enabling a data-driven approach to product development, engineering, operations, and support. RingCentral monitors and manages every aspect of the service from top to bottomâfrom edge to coreâto ensure the highest quality,Â reliability, and security. This architecture has also enabled RingCentral to provide customers with high quality-of-serviceÂ analytics and insightsÂ in a single pane of glass across messaging, video meetings, and phone with tremendous detail.
Team building and a culture of trust:Â RingCentral teams prepare for everything using rigorous testing to build tribal knowledge. Everybody brings a different opinion and skillset. Such exercises build trust in each otherâs capabilities so teams can rely on one another in every situation.
Questions you should ask your service provider
As we discussed earlier, providersâ SLAs vary, with differing levels of commitment to uptime. When evaluating cloud communication and collaboration solutions, be sure to get detailed responses to the following questions about uptime:
- How is the service provider ensuring data redundancy?
- How is the infrastructure prepared for events and surges your business might experience?
- Does the provider conduct in-depth and frequent disruptive testing ( the process of simulating failures in real-world situations), disaster recovery tests? Are the test results and findings shared with customers?
- What are the providerâs business continuity plans? Be sure to go beyond whether the provider has a business continuity plan to determine how often they test and revise it, for example.
- Ask for supporting third-party test reports and accreditations, wherever applicable.
Originally published 01 Sep, 2020, updated 23 Sep, 2020