Talk to an expert

Building a Video App:

The Complete Buyer’s Checklist

Your detailed checklist for determining whether to build or buy a video application.

While video was once an alternative communication option, over the last few years, it’s become expected by consumers. We are in the middle of a digital communication transformation that has only been accelerated by the COVID-19 pandemic. With 96% of employers in our recent State of Customer Engagement report acknowledging an acceleration, and 78% of respondents reporting that they use video communication with their customers, it won’t be long before every organization uses video in some way.

Video communications are quickly becoming the norm and we can clearly see why. Video increases empathy, and face-to-face communication creates stronger connections. Connection is often the catalyst to turn regular customers into happy, paying customers.

Build a memorable video experience

What will it take for your organization to build a memorable video experience? Arriving at your answer will require deep consideration and understanding of the application you seek to build:

  • What functionality will your video app need? 

  • Do you want to build an embedded, customized experience? 

  • Do you need access to developer resources and tooling?

  • Do you want to build an omnichannel engagement solution?

  • Will you need to build your own solution? Or can you leverage a SaaS/API offering?


To help with this seemingly overwhelming task, we’ve created an ultimate checklist of everything you need to consider when productizing video in your app. Read the full checklist, or use one of the links below to jump to a specific section.

Choosing an API/SaaS Provider

Does the provider have the APIs and SDKs to match the capabilities and needs of your app?

Not all APIs or SaaS offerings can provide the integration and customization that your application may require. It’s important to define your application’s non-negotiable requirements; from there, you’ll be better positioned to evaluate which APIs and SDKs can fit your build specifications.

Twilio has backend SDKs (C#, Java, Node.js, PHP, Python, Ruby) as well as SDKs for multiple platforms (web, iOS, Android). It also provides quickstart guides that demonstrate how to build a video application using its Programmable Video offering, which provides an implementation of a UI.

What types of video interactions does the platform offer?

Twilio’s Guide for the Impatient contains a grid of the various different types of interactions available and the recommended settings for them across desktop browsers, mobile browsers, and mobile SDKs (such as apps).

What are the provider’s capabilities and constraints regarding recordings?

Be aware of where your SaaS provider will store the recordings. Typically, the provider themselves will host the video and allow access to it programmatically. However, Twilio offers the ability to store recordings directly into AWS S3.

What are the provider’s capabilities for encryption of video streams and stored recordings?

Twilio utilizes an approach where all communication streams traverse encrypted channels. They are encrypted at rest but may need to be decrypted in memory for composition or transcoding the video for quality purposes.

Twilio also allows for configuring encryption keys so that content is always stored encrypted. Encrypted content can only be decrypted by the holder of encryption keys: you. For security, and especially if your video chat is to be integrated into a telehealth offering, this strategy for encryption may be appropriate for the needs of your application.

How easy is it to integrate additional features like whiteboards, chat, or phone-based dial-in?

Users have been exposed to new and innovative tools that work in conjunction with video chat. Whiteboards for group collaboration, screen sharing, and text chat are just some of the add-ons that users are accustomed to seeing with their video chat applications. Some providers offer integrations for these collaboration add-ons. For example, Twilio’s DataTrack API and Conversations API, or pre-designed layout APIs from Jitsi can help you move forward with these add-ons.


Learn how AirTutors built a remote learning platform that captivates students and empowers educators.

Read More →

App Requirements

What platform will your project target—web, mobile, or both?

The end user platform refers to how your customers will use your application. Will it be through the web (desktop computer and browser), mobile (native application or mobile browser-based), or both?

The choice you make here impacts different aspects of your project and timeline. Whichever end-user platform(s) you choose, make sure your development platform supports your application needs with SDKs that work with the languages used by your engineering teams.

How will your users register

and authenticate?

With any type of web application—video chat notwithstanding—you’ll need to work through the basics of user account registration, login, and authentication. Your implementation of authentication and authorization is closely tied to your system’s security.

Most video platforms handle authentication similarly, requiring a server-side component that generates a token, which is then consumed by a client when connecting to a session. The difference between platforms lies in how server-side token generation works with secrets. Twilio, for example, uses an API Key Secret. In addition, Twilio provides subaccounts and different key levels for more granular authorization and permissions control.

What is your video use case?

Your business needs will dictate the different kinds of video interaction that your application will offer. In some cases (for example, a customer service interaction), a 1:1 video chat is usually sufficient. In other cases (for example, a focus group session), group chat is necessary. If your application facilitates group video chat, will it be a single presenter speaking to an audience, or will there be roundtable discussions and group interaction?

Do you have encryption requirements for video chats and recordings? Is HIPAA a concern?

Consider whether your application needs to support end-to-end encryption. Keep in mind that you cannot offer full end-to-end encryption and record on the server simultaneously, since connections between users and the server must be decrypted in memory in order to save to disk.

Your organization may deal with protected health information (PHI) or be subject to HIPAA regulations. In that case, Twilio will sign a Business Associate Addendum (BAA) to ensure that you can build a HIPAA-compliant application. To learn more about how to build a HIPAA compliant workflow using Twilio’s offerings, please refer to Architecting for HIPAA on Twilio.

How will you ensure that all users—regardless of device and network bandwidth—have an incredible experience?

Developing high-quality video applications is critical to creating memorable experiences. Twilio provides many different APIs to help provide the experience you need to deliver.


Twilio’s Network Bandwidth Profile API abstracts away the heavy lifting required to use resources more efficiently, giving higher priority to those video tracks that are more important. You can specify render dimensions, maximum bitrate, and maximum number of video tracks, thereby giving users an optimal video experience.

In addition, Twilio’s Dominant Speaker Detection API, Track Priority API, and Network Quality API all contribute to providing the perfect balance between optimal resource usage and optimal user experience. Plus, Twilio’s Video Insights gives you access to usage and quality metrics across all your rooms and participants, so you can observe your application, discover trends, detect issues, and troubleshoot.

Will you rely on the platform’s noise and echo cancellation features?

New noise cancellation technologies are emerging, driving us further away from the once underwhelming video experience of many apps. While operating systems, browsers, and video chat platforms themselves all include different layers of noise/echo cancellation, there are ways to utilize machine learning to build an unparalleled aural experience. Twilio offers AI-based Noise Cancellation for all Video Group Rooms customers so you can provide best-in-class audio experiences directly in your video application.


Deliver high quality video experiences with AI-powered Noise Cancellation!


Watch Live Demo →


What specifications do you have for video chat layout?

Regarding layout, you may build a custom layout based on the needs and comfort levels of your users.


Your video use case and communication style will inform your design. For example, the layout and usage of screen real estate varies between 1:1 video, group chats, and sessions with a presenter and screen share. A layout and implementation that focuses on the active speaker also contributes toward a memorable video experience for your users.


Your users’ devices also impact video layout choices. Connecting from a web browser on a desktop computer may dictate a fixed layout, whereas connecting from a web browser on a tablet may need to support screen rotations and different layouts. When users connect from mobile devices with smaller screens, your options for how to use screen real estate become even more limited.


How will you track customer satisfaction metrics, like CSAT and NPS?

Only by tracking customer service scores (CSAT) can you measure whether a customer feels their needs have been fulfilled. CSAT surveys can be administered via SMS or email or by asking for ratings after interactions with customer support.


An NPS survey is different; rather than asking a customer to evaluate their current feelings toward the company or product, it is focused on the future—would they recommend you to their friends or colleagues?

Tracking both of these metrics—and how they apply to your video experience—will help your organization surface the challenges customers face with the product. After identifying these challenges, you can establish a roadmap for how to address challenges and delight customers.

Day Two Concerns

What telemetry will your system expose, and how easy will it be to access and integrate this telemetry with an APM provider?

Your system’s metrics, logs, and traces provide a wealth of data that must be used if you are to gain insight into usage patterns, issues, and opportunities for improvement. However, the sheer amount of data available can be overwhelming unless you have tools to help you capture, filter through, and understand that data.

How can events be correlated across internal systems and the video chat platform?

When an event (for example, a user’s sign on to a video chat session) has a ripple effect from the client application to the server and across a network of distributed resources, it’s important to associate all of those ripples with the originating event.

What dashboards need to be built to provide visibility into usage and resource consumption?

With your metrics, logs, and traces all in a centralized location, your next task is to build visualizations to help your team make sense of the data. Dashboards help you to see cross-referenced resources consumption and usage patterns. Whether you need to see the number of active users, the different types of video sessions currently running, platforms and devices used, or even core resource consumption—dashboards will be your go-to tool for finding signals amidst the noise. 


Luckily, Twilio’s Video Insights allows you to provide analytics and aggregations for observing your application, discovering trends, and troubleshooting rooms and participants. Video Insights is free for Group, Peer-to-Peer, and WebRTC Go Room developers and is available for any applications built with Twilio Video’s Javascript, iOS, and/or Android SDKs.

Particularly for video chat applications, you can take advantage of webhooks to collect data on key user events like room creation, session joining, or recording failures.

“Video Insights has been a much-welcomed troubleshooting tool and ultimately, has equipped us with the data needed to more efficiently help our users improve their telehealth experience. In particular, Video Insights has become instrumental in our investigation of quality issues. By utilizing the threshold data that Twilio provides to identify a stable and healthy connection, we're able to quickly identify the source of the issue and assist the customer in taking precise adjustments to improve their connection and overall experience with the platform.” 

Brennan Fahselt,

Technical Product Specialist Lead

How will the team track storage usage? When will old recordings be put into cold storage or deleted?

Disk storage is unlimited—as long as your budget is unlimited. Of course, the rest of us need to be judicious about our recording storage strategy!


Earlier in our checklist, we covered some of the basic numbers regarding necessary storage capacity. Some host-based providers offer a storage limit per host. Purchasing additional storage or incurring overage fees can quickly bottom out a budget. Twilio, on the other hand, charges a low flat rate of GB stored per day.


Your team would be wise to implement a plan for time-based auto-deletion or cold storage archiving. Doing so will help manage storage needs and costs.


Common Use Cases

Do you need to offer group room

video sessions?

Businesses that leverage group room technology often need to facilitate meetings or conferences. Group learning environments—such as in educational institutions (like LearnCube), fitness programs (like Barry’s), or training centers—also utilize group rooms for video rather than 1:1.


The dynamic of a group room video session brings unique concerns which your platform will need to address. For example, an active speaker indicator and a presenter’s ability to mute all participants or disable participant cameras are must-haves.


Participants in a group room expect chat functionality that runs in parallel to video. You’ll need to think through messaging (to everyone versus to specific individuals) and other aspects such as rich content or emoji support.


How Leancube used Twilio Video to create its award-winning virtual classroom software.


Read More →

In addition, you will need to consider the layout for supporting a room with dozens (or hundreds) of participants. Should there be pagination, customizable views based on user selection, or limits?


With multiple session participants added into the mix, noise and echo will be an issue. To mitigate unwanted noise, the Twilio Video SDK supports Acoustic Echo Cancellation (AEC) and Noise Suppression (NS).


Lastly, large groups often have the need for breakout rooms to accommodate smaller groups. It will be important to work through both the design of the user experience flow and the technical aspects of one group room session spawning several others.

"Twilio has provided us with scalable and fully customizable video conferencing APIs that serve as the building blocks for our branded customer facing experience. We have full autonomy to build and customize on the front-end without needing to build the underlying technology."
— Jenna Hauca, Vice President of Digital and Head of Barry's X

Or, do you just need

1:1 video sessions?

In contrast, business contexts such as telehealth, document e-signing, or customer service make use of 1:1 video sessions rather than group rooms. Applications facilitating social interactions also typically leverage 1:1 video. While 1:1 sessions may seem simpler, there are still several considerations to keep in mind.


Many business use cases will additionally require features like screen sharing, whiteboarding, or other real time collaboration niceties. File sharing—being able to send attachments to one another from within the video room— will also be considered a must-have for many contexts.


Other feature considerations include Virtual Backgrounds  backgrounds , integrating augmented reality (AR) technology, and recording or transcription services.

Example of a virtual background on a video app

Regardless of the business context for your video platform—whether you need to focus on supporting  group rooms or 1:1 sessions—the Twilio Video platform provides a wide array of tools for easy integration of the features needed to match your use case.

Learn how MDLIVE offers convenient and affordable virtual health care services to more than 40 million members nationwide.


Learn more →

Final Words

COVID-19 has accelerated digital communication strategy by six years. For your apps to remain relevant, integrating video is vital. Keeping the above items in mind while you plan for your app will help your team build the correct solution and create the best video communication experience for your users.


We hope that this checklist makes it easier to plan your video integration. As you move forward with an API/SaaS offering to bolster your platform, understanding the functionality required to build a fantastic experience is key to creating a delightful video chat app. Having a complete understanding of the scope of the problem is fundamental to success.

Ready to see how Twilio can fit your build specifications? Check out our interactive Twilio Video demo to see first-hand how to build, experience, and monitor your video application using Twilio.

Ready to see Twilio Video in action?

Check out the interactive demo