measurement-and-instrumentation
Developing a Low-latency Audio Streaming Application in C
Table of Contents
Developing a low-latency audio streaming application in C is a challenging but rewarding project that merges deep understanding of audio processing, network programming, and real‑time system design. This article expands upon the fundamental components and provides a detailed, production‑focused guide to building such an application from scratch. Whether you are building a live performance tool, a remote instrument, or a real‑time communication system, the techniques described here will help you achieve sub‑100 millisecond end‑to‑end latency.
Understanding Low‑Latency Audio Streaming
Low‑latency audio streaming aims to transmit audio data from a source (microphone, line input, or software instrument) to one or more listeners with minimal delay. For interactive applications, any delay above 20–30 milliseconds becomes noticeable and disrupts the natural feel of real‑time exchange. Achieving such low latency requires careful end‑to‑end optimization: the audio capture buffer must be small, the encoding must be lightweight, the network transport must avoid retransmission overhead, and the playback buffer must be tightly managed. Each stage adds its own delay, and the sum of these delays determines the total end‑to‑end latency.
In C, you have full control over memory allocation, thread scheduling, and I/O buffers – an advantage that interpreted or garbage‑collected languages cannot match. This control comes at the cost of manual management, but the payoff is a deterministic, low‑jitter audio pipeline.
Core Components of the Application
Every low‑latency audio streaming application can be broken into five essential layers:
- Audio Capture – reading raw PCM samples from a sound device using an API such as ALSA (Linux) or PortAudio (cross‑platform).
- Audio Encoding – optionally compressing the raw samples to conserve bandwidth. For local‑area networks, uncompressed PCM is often fast enough; for the internet, a low‑delay codec like Opus is preferred.
- Network Transport – transmitting audio packets over UDP sockets. UDP avoids the retransmission delays inherent in TCP, but requires the application to handle packet loss.
- Audio Decoding – decompressing received packets back into PCM samples, ready for playback.
- Audio Playback – sending decoded samples to the output device with minimal buffering.
Each layer interacts with its neighbours via lock‑free queues or ring buffers to avoid blocking the audio thread. The rest of this article examines each layer in depth, with code examples and optimisation strategies.
Audio Capture with C
Choosing an Audio API
On Linux, the Advanced Linux Sound Architecture (ALSA) provides direct, low‑level access to audio hardware. It is the fastest option, but its API is complex and requires careful management of hardware parameters (period size, buffer size, sample format). PortAudio is a cross‑platform wrapper that abstracts ALSA, CoreAudio (macOS), and ASIO (Windows). For prototyping, PortAudio is easier, but for maximum control and minimal latency, a direct ALSA implementation is often used in production systems.
Below is an example of ALSA capture using a small period size (128 frames) to keep latency low. The code sets the hardware parameters explicitly and uses a blocking read loop:
#include <alsa/asoundlib.h>
#include <stdio.h>
int main() {
snd_pcm_t *capture_handle;
snd_pcm_hw_params_t *hw_params;
snd_pcm_uframes_t period_size = 128;
unsigned int rate = 48000;
int dir;
char *buffer = malloc(period_size * 2 * 2); // 16-bit stereo
snd_pcm_open(&capture_handle, "default", SND_PCM_STREAM_CAPTURE, 0);
snd_pcm_hw_params_alloca(&hw_params);
snd_pcm_hw_params_any(capture_handle, hw_params);
snd_pcm_hw_params_set_access(capture_handle, hw_params, SND_PCM_ACCESS_RW_INTERLEAVED);
snd_pcm_hw_params_set_format(capture_handle, hw_params, SND_PCM_FORMAT_S16_LE);
snd_pcm_hw_params_set_channels(capture_handle, hw_params, 2);
snd_pcm_hw_params_set_rate_near(capture_handle, hw_params, &rate, &dir);
snd_pcm_hw_params_set_period_size_near(capture_handle, hw_params, &period_size, &dir);
snd_pcm_hw_params(capture_handle, hw_params);
// Start capture
snd_pcm_start(capture_handle);
while (1) {
int err = snd_pcm_readi(capture_handle, buffer, period_size);
if (err == -EPIPE) {
snd_pcm_recover(capture_handle, err, 0);
} else if (err < 0) {
fprintf(stderr, "Read error: %s\n", snd_strerror(err));
break;
}
// Process buffer (encode, send)
}
snd_pcm_close(capture_handle);
free(buffer);
return 0;
}
If your application must run on multiple platforms, PortAudio is recommended. Its callback model runs at real‑time priority and delivers audio data directly to the user‑supplied function. The example in the original article is a good starting point; ensure you set framesPerBuffer to a small value (e.g., 128 or 256) and use paInt16 for native sample format.
Buffer Size and Latency
The period size (or frames per buffer) directly affects latency. At a sample rate of 48 kHz and a period of 128 frames, the capture latency is 128 / 48000 = 2.67 ms. Using 64 frames halves this to 1.33 ms, but may cause underruns on slower hardware. Start with 128 frames and tune downward if your system can handle it.
Audio Encoding
For local network streaming, raw PCM is often sufficient. For internet streaming, compression becomes necessary. The Opus codec is specifically designed for interactive audio, offering very low delay (as low as 5 ms) and excellent quality. Its C API is well documented and easy to integrate.
Below is a snippet that encodes a 20‑ms frame of mono PCM into an Opus packet:
#include <opus/opus.h>
OpusEncoder *enc;
int error;
enc = opus_encoder_create(48000, 1, OPUS_APPLICATION_AUDIO, &error);
if (error != OPUS_OK) { /* handle */ }
opus_encoder_ctl(enc, OPUS_SET_BITRATE(64000));
opus_encoder_ctl(enc, OPUS_SET_COMPLEXITY(5)); // lower complexity = lower CPU
#define FRAME_SIZE 960 // 20 ms at 48 kHz
opus_int16 pcm[FRAME_SIZE];
unsigned char opus_data[4000]; // worst case
int bytes = opus_encode(enc, pcm, FRAME_SIZE, opus_data, sizeof(opus_data));
// bytes is the length of the compressed payload
// Send opus_data over the network
On the receiving end, decode with opus_decode(). Choose a frame size that aligns with your network packet interval (e.g., 20 ms packets are a good compromise between latency and overhead).
Network Transmission
Why UDP
Transmission Control Protocol (TCP) guarantees delivery by retransmitting lost packets, which introduces unpredictable delays – a death sentence for real‑time audio. User Datagram Protocol (UDP) fires packets and forgets. If a packet is lost, the audio suffers a minor glitch instead of a multi‑millisecond stall. The application must implement packet loss concealment (e.g., repeating the last good frame or interpolating) so that brief losses are not noticeable.
Basic UDP Sender and Receiver
The following code sets up a UDP socket and sends a buffer repeatedly. Adjust for your own audio packet size.
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <string.h>
#include <unistd.h>
int sock = socket(AF_INET, SOCK_DGRAM, 0);
struct sockaddr_in dest;
memset(&dest, 0, sizeof(dest));
dest.sin_family = AF_INET;
dest.sin_port = htons(12345);
inet_pton(AF_INET, "192.168.1.100", &dest.sin_addr);
char packet[1024]; // your encoded audio
// fill packet...
while (1) {
sendto(sock, packet, packet_len, 0, (struct sockaddr*)&dest, sizeof(dest));
// sleep to pace packets, e.g., 20 ms
}
A receiver would use bind() and recvfrom() in a loop. For low jitter, consider setting socket buffer sizes explicitly (setsockopt with SO_RCVBUF and SO_SNDBUF) and using non‑blocking I/O combined with poll() or epoll() on Linux.
Jitter Buffer
Network jitter – variation in packet arrival times – must be smoothed on the receiver side. A jitter buffer stores a few packets before playing them, converting variable delays into a constant playback delay. Typical implementations use a ring buffer with a playback pointer that lags the write pointer by a fixed number of packets (e.g., 2–4 packets, or roughly 40–80 ms). The trade‑off: more buffering increases latency, less buffering increases dropouts. For very low latency, you may accept occasional loss and rely on concealment instead of a jitter buffer.
Audio Decoding and Playback
On the receiving end, you reverse the capture pipeline. Use the same ALSA or PortAudio settings, but for playback. Configure the playback device with the same sample rate, format, and channel count as the sender. The process is:
- Receive UDP packet.
- Decode (if encoded).
- Copy PCM samples into a lock‑free ring buffer (shared with the audio playback thread).
- The audio callback (or blocking write) reads from the ring buffer and plays.
A lock‑free ring buffer is essential to avoid mutex contention on the audio thread. Implement a single‑producer, single‑consumer ring buffer (SPSC). Many lightweight implementations exist; you can also use a simple pointer‑based design with atomic head and tail indices.
Optimisation Techniques
Real‑Time Scheduling
Ensure your audio processing thread runs at a real‑time priority so that it is not preempted by other processes. On Linux, use pthread_setschedparam() with SCHED_FIFO and a high priority (e.g., 80). The user must have appropriate CAP_SYS_NICE capabilities. Without this, scheduling jitter can cause buffer underruns even on fast hardware.
CPU Affinity
Pin the audio thread to a dedicated CPU core to avoid cache thrashing and context switches. Use pthread_setaffinity_np() on Linux. Isolate one core for audio by configuring your kernel boot parameters (e.g., isolcpus=1) if possible.
Memory Allocation
Avoid heap allocations (malloc/free) in the hot audio path. Allocate all buffers once during initialisation. Use stack allocation for small buffers or pre‑allocated pools. Page‑lock memory to prevent swapping (mlockall()).
Zero‑Copy and Batch Processing
When sending UDP packets, minimise data copying. Design your audio buffer so that the capture buffer can be directly forwarded to the network socket via sendto without an intermediate memcpy. Similarly, on the receiver, place incoming packets directly into the ring buffer.
Network Optimisations
Disable Nagle’s algorithm on TCP (but remember you are using UDP). Increase socket buffer sizes so that bursts of packets are not dropped. On a wired LAN, consider enabling Quality of Service (QoS) marking (DSCP) to prioritise audio traffic.
Profiling and Tuning
Use perf on Linux to identify hotspots. Monitor CPU cache misses and branch mispredictions. Measure actual end‑to‑end latency with a loopback test (microphone → application → speakers → oscilloscope or a second capture channel). Tune buffer sizes, period settings, and codec parameters iteratively.
Example: Complete Low‑Latency Pipeline
Bringing all pieces together, a minimal but functional streaming program might look like this (pseudocode):
// Thread 1: Audio capture + encode
- Open ALSA capture with 128 frames period
- Create Opus encoder
- loop:
- read PCM from ALSA
- encode to Opus
- sendto UDP (destination IP:port)
// Thread 2: network receive + decode + playback
- bind UDP socket
- Open ALSA playback with 128 frames period, pre‑fill ring buffer with silence
- loop:
- recvfrom UDP packet
- decode Opus to PCM
- write PCM to SPSC ring buffer
- (playback callback pulls from ring buffer)
// main: start threads, set priorities, join
To achieve the lowest latency, avoid blocking calls inside the audio callback. The playback callback must never wait for a packet; instead, it reads the ring buffer. If the ring buffer is empty, play silence or repeat the last buffer.
Testing and Deployment
Before deploying, test on target hardware. Measure latency under load: run CPU‑intensive background tasks and check for underruns. Use a tool like rtirq to assign high priority to the audio interrupt handler. For internet streaming, implement forward error correction (FEC) and packet duplication for redundancy. The Opus codec supports in‑band FEC (OPUS_SET_INBAND_FEC(c)), which adds very little overhead.
Conclusion
Building a low‑latency audio streaming application in C demands meticulous attention to every link in the chain: capture, encoding, networking, and playback. By choosing the right audio API, using UDP with a carefully tuned jitter buffer, and applying real‑time scheduling and memory optimisation, you can achieve sub‑10 ms round‑trip latency on capable hardware. The examples and strategies outlined here provide a solid foundation for production‑grade projects. Continue to experiment with different buffer sizes and codec settings, and always measure – latency is not a feature you can assume; you must prove it.
For further reading, consult the Opus documentation and ALSA project page. A deep dive into UDP networking in C can be found in Beej’s Guide to Network Programming.