What is VoIP, What VoIP is Not, and How VoIP Works: A VoIP Technology Primer

Table of Contents

I. VoIP Overview
II. What is VoIP?
III. VoIP communications vs. PSTN communications
IV. What VoIP is not
V. How VoIP works
VI. VoIP’s part in RingCentral’s cloud phone system
VII. How RingCentral provides secure VoIP service
VIII. Is your business ready for VoIP and RingCentral?
IX. Switching your business to VoIP

I. VoIP Overview

Some businesses are hesitant to move from a traditional private brand exchange (PBX) with landlines to voice over internet protocol (VoIP) because of the lack of understanding of the technology.

It’s not that they have a hard time grasping it. There are just a lot of confusing information about VoIP on the internet. For example, the term VoIP is being used as a catch-all phrase for various communication technology, including the cloud, IP PBX, SIP, and other digital communication technologies. These terms may be related to VoIP but they are different concepts, altogether.

This VoIP technology primer attempts to bring clarity to the cloud-based technologies that offer digital collaboration.

We hope this resource can help organizations understand what VoIP is, what it is not, and how it works.

II. What is VoIP?

VoIP stands for voice over internet protocol. As the name implies, the technology allows users to transmit voice communications over IP networks in real time. In general, VoIP is a group of telephony protocols that works through packet-switching technology where voice travels to its destination via individual network packets across various networks, including the internet. VoIP enables computer networks to function like traditional phone lines across the public switched telephone network (PSTN). The PSTN uses circuit-switched phone technology to transmit voice.

VoIP has only become a reliable business communications option in the last two decades, but the technology has been around for much longer.

Development of VoIP started around February 1995 when VocalTec Ltd., a company in Israel, came up with the idea to allow one user to call another through their computers with a microphone and speakers. This technology was initially limited. It only worked if the caller and the receiver had the same software setup.

The technology improved in the years that followed. These improvements gave rise to products like PC-to-phone and phone-to-phone solutions. VoIP was soon running across IP switches and routers. This made VoIP a standard feature on modern-day networking equipment.

VoIP has become a vital communications component of the modern business communications industry. Today’s communications solutions, including RingCentral’s, combine cloud voice solutions using VoIP and can interconnect with legacy phone systems through gateway connections to the PSTN.

III. VoIP communications vs. PSTN communications

To better understand the differences between VoIP and PSTN, here’s a side-by-side comparison.

Public Switched Telephone Network (PSTN)


Voice over Internet Protocol (VoIP)

Dedicated phone lines

When placing a phone call using PSTN, you are connecting over a circuit-switched path. The PSTN sets up a dedicated channel (or circuit) exclusively for the call. This connection remains for the duration of the phone call. Your voice will follow the same dedicated path during the call.


Internet connection

VoIP uses packet switching where the speaker’s voice is sampled and broken up into lots of little packets which are then transmitted over the internet. These packet payloads (your voice) are reassembled by the receiver’s phone and played by the phone handset.

Requires 64 Kbps per call

A single PSTN connection has a dedicated bandwidth of 64 Kbps. A T1 circuit with 23 channels can, therefore, handle 23 concurrent calls. Because of this limitation, the network isn’t able to handle additional capacity (calls) until someone hangs up to release network resources.


Ethernet requires 87 Kbps per call

VoIP codecs use just under 100 Kbps. The 87 Kbps includes the protocol stack header bytes. Both the high-quality G.711 codec and the high definition G.722 codec only use 87 Kbps in each direction.

No free calls, costly international calls

PSTN calls use the telephone carrier’s network infrastructure. So, in essence, every call made over PSTN is charged based on the use of this resource. Long distance and international calls are even more expensive since they are charged on a per minute basis or through a bundled minute subscription.


Free VoIP-to-VoIP calls, nominal subscription fees

You’re already paying a fixed fee for your internet connection (perhaps on a monthly basis), so you won’t incur any charges when you make a call using VoIP. Long distance calls, if not entirely free, have very low per minute rates and are often included in a regular monthly price.

Available at an extra cost

Call management features such as call waiting, call forwarding, caller ID, and call transfers are usually only available at an additional cost.

Business features

Standard and included

Most VoIP providers typically offer call management capabilities like call waiting, call forwarding, caller ID, and call transfers as standard or included features in their subscription plans.

Requires purchase of new hardware

Opening a new office (for instance) will require purchasing more hardware, on top of hooking it up to the original system, which can be costly and complicated. Upgrading your existing equipment is also quite expensive and requires a lot of time and energy.


Requires only bandwidth and software upgrades

Lines can easily be added as your needs grow. If one user moves to another state, the number and the phone can go with that person as well. If you need more capacity, contact your internet service provider (ISP) to buy more bandwidth.

Requires additional physical lines

Adding new extensions typically requires paying for the use of dedicated lines for each phone and paying a professional to punch down the wires for an additional phone and is therefore expensive.


Standard and included

If you need to add extensions to your phone system, all you have to do is to make the necessary configurations on your admin control panel and assign it to a user.

Remains active during outages

The PSTN service remains active even during power outages because the hardwired landline phones receive their electrical power from the PSTN’s central office (CO). Note—The base station of cordless phones requires a local electrical supply and will be unusable during a power outage.

Business continuity

No internet means no service

If power is out or the internet connection is down, your VoIP service may not work, especially without a backup power solution. The VoIP service is unavailable when internet connectivity is unavailable.

Location is traceable

Emergency service responders can instantly trace your location via the phone making the 911 call since the phone is associated with a known geographic location through its connection to the PSTN.

Emergency calling

Location information may be untraceable or limited

Your e911 emergency calls cannot always be traced to a specific geographic location unless you, the subscriber, provide this information accurately to your VoIP provider.

IV. What VoIP is not

There are many misconceptions surrounding VoIP. This is especially true since the web uses the same term to refer to related but entirely different concepts.

Here are some common terms used interchangeably with VoIP, but are not the same thing:

  • Cloud PBX

    —Private branch exchange (PBX), refers to the private phone system used by organizations to manage and route calls. A cloud PBX offers similar functionality, but it is hosted in the cloud instead of on-premises at the customer’s business. Cloud-based PBX systems interoperate with traditional phone lines like PSTN through gateways.

  • SIP

    —Session initiation protocol (SIP) signals call initiation, changes, and terminations. Whereas SIP signals the phone calls, real-time transport protocol (RTP) carries the voice or media of the calls.

  • IP phones

    —The term has been used to describe both the technology (VoIP) and the hardware for it. For our purposes, IP phone will refer to the hardware or software device used to take and make VoIP calls. IP phones include hard phones and softphones.

  • WebRTC—VoIP and WebRTC allow users to communicate from anywhere via the internet. WebRTC enables VoIP communication through a browser. WebRTC offers real-time communications (RTC) through simple application programming interfaces (API).

V. How VoIP works

A. Basic VoIP concepts

Before we get into how VoIP works, let us familiarize ourselves with some important concepts.

The primary technology components include:

IP networking

—Voice and data are encapsulated within internet protocol (IP) packets. Source and destination IP addresses determine sender and receiver. IP packets are routed between different networks as they are passed from router to router across both private and public networks.


Encapsulation—When transmitting voice across a network, the voice is captured and segmented into small samples which are carried as payloads.

  • The voice payload is a small portion of the audio stream. This sample is encapsulated by various network headers. The voice sample is the largest part of the message.
  • A real-time transport protocol (RTP) header is attached to the front of the voice payload. Remember RTP carries media and SIP carries signaling.
  • A transmission control protocol (TCP) header or a user datagram protocol (UDP) header—UDP carries the RTP header and then the media payload. TCP carries the SIP header for signaling.
  • An internet protocol (IP) header holds the IP version, the source IP address, the destination IP address, packet size, and more.

Internet protocol (IP) address

—The network interface card (NIC) of every device to an IP network is given an IP address. IP version 4 addresses are built as four decimal numbers from zero to 255 that are separated by dots. For example, is a commonly used private IPv4 address. IP addresses identify both the source and destination, the sender and the receiver, or the caller and the called party in a VoIP connection.


Port address—Port addresses range from zero to 65535. Just as IP addresses reference a specific network device, port addresses reference a listening service or program on a given device. An SIP server will listen for incoming requests on port address 5060 by default.


Transmission control protocol (TCP) vs user datagram protocol (UDP)

The most common types of VoIP protocol delivery stacks are TCP over IP or UDP over IP.

  • Transmission control protocol (TCP)

    includes built-in handshaking before transmission and retransmission in case of error or message loss. Packets sent using TCP are guaranteed to be correct, and if it is lost it will be retransmitted. For these reasons, session initiation protocol (SIP) signaling is sent within a TCP message.

  • User datagram protocol (UDP)

    is sometimes referred to as the “send and pray” method. UDP is a much simpler delivery method. UDP carries messages that do not need guaranteed delivery like voice or video. When voice or video messages are sent, the receiving party does not want to receive late messages. For these reasons, real-time transport protocol is sent within a UDP message.

TCP can easily cause delays if data packets get corrupted. UDP prioritizes speed over error protection and retransmission.

Session border controllers (SBC) act like firewalls to manage traffic by only allowing authorized subscribers to pass through. SBCs ensure high quality of service for voice calls. RingCentral uses SBCs between internet service providers (ISP) and RingCentral and between common carriers and RingCentral.


Real-time transport protocol (RTP)

RTP carries audio and video media streams with minimal delay. The RTP header contains information about the audio media file being streamed, including:

  • Media content type
  • Whether or not there have been ‘talk spurts’
  • Sender identification
  • Synchronization, time stamping, and sequencing data
  • Loss detection
  • Segmentation and reassembly rules
  • Security (encryption)
  • No retransmission capabilities (lost or delayed RTP messages are not resent)

The RTP header carries information about media streams and holds the payload format for the media type. RTP provides timestamps for synchronization and includes sequencing and time stamping. But, RTP offers no quality of service (QoS).

The RTP stream also has a companion protocol called RTCP or real-time transport control protocol, which is initiated and travels alongside the RTP stream. It does not carry media but generates RTP stream quality statistics.

Real-time Transport Control Protocol (RTCP)
RTCP provides feedback on the following:

  • Quality of the media flow
  • Number of lost packets
  • Number of packets sent
  • Number of bytes sent
  • Jitter (difference in packet interarrival time) statistics
  • Round-trip delays (the time it takes for a signal or message to get from the sender to the receiver and back)

RTCP provides valuable information when troubleshooting VoIP calls.

Each RTP stream is unidirectional. For both the caller and the callee to talk, you need a duplex stream. An RTP stream must be initiated in both directions.

VoIP Codecs

Codecs, or coder-decoder, convert analog audio signals like your voice into digital samples. The received digital audio signal is converted back to an analog signal for the human ear upon arrival at the receiver’s phone.

Codec selection will determine the bandwidth required and the sound quality of the conversation.

Here are some of the most common codecs:

  • G.711—Its formal name is pulse-code modulation (PCM). G.711 samples your voice 8,000 times per second and uses around 87 Kbps of bandwidth on an IP network. G.711 provides high-
    quality voice end-to-end. The G.711 codec can be used on PSTN landline and VoIP packet samples with RTP.
  • G.722—G.722 samples your voice 16,000 times per second and uses around 87 Kbps of bandwidth on an IP network. G.722 offers better voice quality and clarity than G.711 and is classified as HD voice.
  • Opus—Opus offers a near-CD quality for voice with a bandwidth requirement between 60 and 80 Kbps. RingCentral uses Opus codec on their softphones.
  • G.729—G.729 samples your voice at a rate of 8,000 Kbps and uses around 31 Kbps of bandwidth on an IP network. G.729 is not recommended for VoIP since it does not degrade well across IP networks.

Session Initialization Protocol (SIP)

SIP signals VoIP calls and is responsible for helping VoIP emulate telephone-like attributes. As said above, a lot of people mistakenly refer to VoIP as SIP as if they are the same. But as you can now see, VoIP is a group of protocols, and SIP is just one component working in the background to help VoIP calls work.

SIP is an open standard signaling protocol that can establish, manage, and terminate real-time communications over IP networks. SIP can be encapsulated or carried by either TCP or UDP.

SIP follows the client/server model, where the server refers to the VoIP provider (like RingCentral), and the client is the requesting phone.

The main elements of SIP are:

  • SIP servers run specific functions. An SIP server is not necessarily an individual physical computer. SIP handles requests/methods made by clients and sends back responses to those client requests.
  • SIP clients make requests to SIP servers. SIP clients can be your hard phone, softphone installed on your desktop or laptop, or mobile VoIP app installed on your smartphone or tablet.
  • Session border controllers (SBC) authenticate client endpoints before allowing them access to the SIP servers.
  • An SIP gateway connects the internet to the PSTN. Gateways allow SIP devices to make calls to landlines and landlines to make calls to SIP subscribers.

Remember, SIP is not carrying voice. SIP signals devices to create RTP sessions. RTP carries the voice or media payload.

SIP offers the following functions:

  • User registration—binds address of record (AOR) and contact addresses. This connects the user to their uniform resource identifier (URI). AOR is a URI that can be mapped with another URI where a user may be available.
  • User location—endpoints or clients (IP phone, softphone, mobile VoIP app) notify proxy servers of their current location. This user location determines how to contact endpoints and which
    endpoints can participate in the call.
  • User availability—willingness to communicate with each endpoint.
  • User capabilities—negotiate media and media parameters, such as supported audio codec for example.
  • Session setup—session establishment, tells which endpoint of client it should be ringing.
  • Session management—used for call transfer, call termination, and parameter modification mid-session or mid-call.

The SIP standard also offers multiple ways to give an SIP endpoint an address. RingCentral uses the user’s phone number or email and then their password to authenticate.

SIP uses a uniform resource identifier (URI) to find endpoints.

  • This URI can be sip colon username at domain. e.g., sip:bob@example.com
  • It can also be sip colon telephone number at domain. e.g., sip:bob@3034997111.bob.com
  • It can be the username at the IP address as well. e.g., sip:bob@
  • It can be the phone number at the IP address too. e.g., sip:3034997111@

E.164 specification
The E.164 specification tells the system how global phone numbers should be configured.

This global telephone number standard is specified by the International Telecommunications Union–Telecommunications Standardization Sector or ITU-T.

An example global phone number is +13034997111

SIP uses requests (methods) and responses to establish a VoIP connection.

The five most common SIP methods or requests are:

  • REGISTER—endpoints register to a registrar server, which tracks current endpoint locations.
  • INVITE—starts an SIP call.
  • ACK—acknowledges a request.
  • CANCEL—drops the call before a connection is completed.
  • BYE—hangs up the call.

There are currently 14 different SIP requests/methods, including the ones mentioned above.

The SIP packet header also carries the session description protocol (SDP).

SDP carries the following information:

  • The name of the SIP session and its purpose.
  • Total time the session is active.
  • The media used during the session (like the agreed upon codec).
  • The network information needed to pass media, like IP addresses, port addresses, media formats, codecs, and more.

B. How VoIP call works

Using UDP or TCP as a traffic delivery method, there are two ways to make VoIP call connections. They are either registered and unregistered.

Unregistered—An SIP channel is opened directly between devices when a call is made.


A very basic SIP call flow goes like this:

  1. Calling party makes a call—an INVITE method is sent from the client.
  2. Called party phone is ringing and therefore sends a 180 Ringing, provisional response. A connection is established, but the phone is not answered yet.
  3. Called party answers the phone and the connection has been made. The called party sends a 200 Success, final response.
  4. Calling party sends an acknowledgment—ACK.
  5. Calling party and called party talk—media flows between endpoints being carried by RTP.
  6. Called party or calling party ends call—one endpoint in the session sends a BYE request.
  7. Call ended—the far endpoint responds with a 200 OK success final response.

Registered—There is an SIP channel constantly being maintained between the SIP client and an SIP server. Most users, especially businesses, use registered SIP connections.

Remember that servers are defined as functions, not as individual boxes.

Some of the most commonly used SIP servers are:

  • Proxy server—A proxy server forwards both SIP methods and SIP responses and offers routing functions by routing call requests to the user. Proxies offer security functions through user
    authentication and authorization. Proxies also provide features for users like offering calls to endpoints and implementing call routing features for carriers. SIP proxies act as servers for incoming endpoint requests and act as clients for creating requests on behalf of an SIP client. There are two proxy modes:

    • Stateless—is used in heavy load scenarios with lots of concurrent SIP calls. Stateless proxies receive a request, route the call, then forget the call.
    • Stateful—offers additional services and “remembers” the entire call transaction.
  • Registrar server—A registrar server tracks current endpoint locations and endpoint registration messages. The registrar stores information on an endpoint’s current location. A registrar SIP server accepts REGISTER method requests and forwards these REGISTRAR requests to a location server. It also binds an SIP URI to a domain name system or DNS address of record (AOR). This binding is for a specific time, and when that time expires the user with that URI can no longer be contacted.
  • Redirect server—A redirector acts as a user agent by locating other endpoints. This redirect server returns SIP addresses or URI’s and sends invitations to other SIP domains. It may generate 3XX responses if the user is not where they were expected to be. Redirect servers help an SIP network to scale in size by taking some of the load off the SIP proxy servers.
  • Presence or location—A presence server holds information about a user’s willingness and ability to communicate. SIP calls this source of presence information “presentity”. Presentity allows interested parties to be notified of “buddy” status and changes to that status.

ENUM (E.164 number to URI mapping) uses DNS to translate telephone numbers to uniform resource
identifiers (URI) for facilitating internet communications, along with methods PUBLISH, SUBSCRIBE, and
NOTIFY, and used to track presentity.

Using SIP signaling, a call flow can go like this (please refer to the graph for visuals of call flow):

  1. Calling party wants to make a call to the called party. Both the called party and the calling party must preregister with the SIP registrar server.
  2. Endpoints registered to registrar server—200 OK response
  3. The calling party can then send an INVITE method to the SIP proxy server.
  4. The proxy asks the registrar server for the called party’s location.
  5. INVITEs are sent by the proxy from the calling party to the called party.
  6. Connection has been made to called party—called party sends 200 OK response to SIP proxy server.
  7. SIP proxy server then sends 200 OK response back to calling party.
  8. Calling party sends acknowledgement back to SIP proxy server—ACK.
  9. SIP proxy server sends an acknowledgment (ACK) to called party.
  10. Called party and calling party are talking—audio carried by RTP is then passed between the calling party and the called party.
  11. Either called party or calling party wants to end call. In diagram, it is the called party that initiated ending the call. Called party sends a BYE method to close the connection.
  12. Call ended—called party approves BYE through 200 OK response.

C. How VoIP connects to traditional lines

The discussion above focused more on to parties connected via IP networks or through the Internet. You may be wondering how VoIP to PSTN works instead.

As discussed in the SIP section, one of the elements of SIP is a gateway. VoIP connects to a gateway to convert traffic to the appropriate form for either IP or PSTN.

VoIP gateway operations

  • When calls are made from a PSTN network to an IP network, the gateway will convert the multiplexed voice sample into a packetized voice sample. It will then follow the same VoIP call process discussed above.

  • If the call is made from a VoIP network to the PSTN network, the packets passing through the gateway will be converted from the packet payloads into multiplexed voice samples.

Whereas the internet packets are all digital, the PSTN is digital until the local subscriber’s plain old telephone service (POTS) landlines, which are analog connections to the digital PSTN, switches.

VI. VoIP’s part in RingCentral’s cloud phone system

All phone calls made by a RingCentral subscriber must pass from the user’s network, across the internet, and into the RingCentral system before they connect back out to another RingCentral user or pass through a PSTN gateway to reach a landline.

RingCentral uses a group of servers called a POD to control call functions like call routing, auto-attendant, and other call management features.

By leveraging SIP servers and combining the internet with the PSTN, RingCentral can offer individuals and businesses a feature-rich communication management solution that not only lets you make and receive calls but manages how you communicate as well.

VoIP and cloud PBX are the core technologies that comprise RingCentral’s unified communicationssolution. In turn, RingCentral is able to offer businesses a communications system that solves most, if not all, of their communication needs.

VII. How RingCentral provides secure VoIP service

Internally, RingCentral employs an overall comprehensive strategy that includes multiple layers and various components that cover everything from end-to-end administration, application, network, and infrastructure to methodologies and policies implemented to protect the overall system and its users.

Some of the security standards that RingCentral employs to secure their cloud phone system include:

  • Data center with monthly audits that are standards compliant with secure Statement on Standards for Attestation Engagements (SSAE) 16 Service Organization Controls (SOC 2 and SOC 3)
  • Support of multi-factor authentication (MFA) and Single Sign-on (SSO) user access
  • Built-in service layer fraud protection and continuous monitoring of anomalies
  • Advanced account management and administration from anywhere, at anytime
  • Robust network security protection

To protect the transmission of voice packets over the internet, RingCentral employs strong high-level encryption methods including:

  • Encrypted end-to-end voice transmission
  • “To support the security of customer data on endpoints, mobile and desktop applications are offered that support encryption of customer data-at-rest.”—RingCentral Security Overview

RingCentral meets the regulatory compliance requirements and standards for the U.S. Health Insurance Portability and Accountability Act (HIPAA), EU General Data Protection Regulation (GDPR), and Payment Card Industry Data Security Standards (PCI DSS).

Compliance with these standards ensures that communications made through VoIP are protected from interception at any point in its migration from point-to-point.

VIII. Is your business ready for VoIP and RingCentral?

While there is little extra hardware needed, like with any technology that you want to adopt for your business, VoIP and the RingCentral cloud phone system has technical requirements. These requirements will help you achieve high call quality and performance.

In the most basic sense, your business needs the following:

1. A reliable internet connection—Ensure you have the adequate quality bandwidth required to support VoIP communications. Consider the number of concurrent calls being made by your business. Adding voice and data bandwidth requirements will determine the location’s minimum bandwidth requirements.

Test your internet connection capacity here: https://www.ringcentral.com/support/capacity.html

2. A stable network—VoIP use the local area network (LAN) to connect the SIP devices to the internet. If your company LAN is heavily used and bandwidth is at a premium, VoIP calls may suffer if your router is not configured correctly.

Most businesses create a network dedicated to VoIP by configuring a voice virtual local area network (VLAN) and a data VLAN on their Ethernet switches.

You can simulate VoIP calls to get an idea of your network quality of service here: https://www.ringcentral.com/support/qos.html


QoS routers

—Businesses that employ VoIP opt for routers that have either traffic prioritization and port triggering or better, use the access control list (ACL) allow and deny rules for RingCentral IP address and port ranges.

Find routers tested by RingCentral here: https://www.ringcentral.com/support/qos-router.html

These three are the most essential components of a successful VoIP deployment. Coordinate with RingCentral to review the readiness of your business for the service.

IX. Switching your business to VoIP

Now that you have a better understanding of what VoIP is and how it works, consider how it can be applied to your business. If cloud-based VoIP sounds like a communication solution your business can use, please consider a RingCentral solution.

RingCentral is an award-winning, industry-leading provider of cloud-based communications solutions.

For more information on how you can take advantage of VoIP and cloud communications, reach out to RingCentral at https://www.ringcentral.com/

About the author


Mark Dacanay is a Digital Marketing Professional who has been working with a B2B company offering cloud-based phone systems for more than 5 years. He is obsessed with anything about the cloud – the technology, not the fluffy stuff in the sky.