Implementing a Thread Pool in C for Efficient Multithreading

Efficient multithreading in C often demands a mechanism to manage threads without the overhead of creating and destroying them on every task. A thread pool offers exactly that: a set of pre-initialized worker threads that wait for tasks to execute, reusing threads across many jobs. This pattern is fundamental to high‑performance servers, real‑time systems, and parallel processing libraries. By implementing a thread pool, you gain control over resource consumption, reduce latency jitter, and can scale applications predictably across cores.

What Is a Thread Pool?

A thread pool is a collection of threads that remain alive and idle until work is submitted. When a task arrives, it is placed into a shared queue. One of the idle threads picks up the task, executes it, and then returns to the pool to await the next job. This eliminates the cost of thread creation and destruction per task – system calls that are notoriously expensive and can become a bottleneck under high concurrency.

In contrast to spawning a new thread for each unit of work, a thread pool offers:

Reduced overhead – threads are created once and reused.
Controlled concurrency – you cap the number of simultaneous threads, preventing resource exhaustion.
Improved cache locality – because threads are reused, CPU caches stay warm for recurring tasks.
Simplified error handling – a central pool can monitor thread health and restart failed workers.

When to Use a Thread Pool

Thread pools shine in any application that handles many short‑lived tasks. Prime examples include:

Web servers – handling HTTP requests from thousands of concurrent clients.
Parallel image/video processing – splitting a frame into independent blocks.
Event‑driven simulation – dispatching events to handlers.
Game engines – performing AI updates, physics, and rendering in parallel.
Database connection pools – reusing connections rather than opening new ones per query.

Avoid thread pools for long‑running or blocking tasks that monopolize a thread; such workloads can stall the pool. For those cases, consider dedicated threads or asynchronous I/O.

Key Components of a Thread Pool

Every thread pool implementation in C revolves around a few core components:

Thread Pool Structure – holds metadata: array of thread IDs, synchronization objects, the task queue, and control flags (e.g., shutdown signal).
Task Queue – a first‑in, first‑out (FIFO) container that stores pending tasks. Typically implemented as a linked list or a dynamic array.
Worker Threads – threads that run the worker function, each continuously checking the queue for tasks.
Synchronization Primitives – a mutex protects the shared queue and state; a condition variable allows workers to sleep when the queue is empty and wake up when a new task arrives.

Implementing a Thread Pool in C

The following sections walk through a complete implementation using POSIX threads (pthreads). The example uses a simple task structure and a fixed number of workers.

Data Structures

Define a task as a function pointer and a generic void* argument:

typedef struct {
    void (*function)(void *arg);
    void *arg;
} Task;

The thread pool itself holds:

typedef struct {
    Task *queue;            // array of tasks (ring buffer)
    int queue_size;         // capacity
    int front, back;        // indices for the ring buffer
    int count;              // number of tasks currently in queue
    pthread_t *threads;     // array of worker thread IDs
    int num_threads;        // number of worker threads
    pthread_mutex_t lock;   // protects queue & state
    pthread_cond_t notify;  // signals workers about new work or shutdown
    int shutdown;           // flag: 1 if pool is shutting down
} ThreadPool;

A ring buffer avoids dynamic allocation during high‑frequency enqueues and is straightforward to implement.

Initialization

Allocate the queue and threads, then initialize the mutex and condition variable:

int thread_pool_init(ThreadPool *pool, int num_threads, int queue_size) {
    pool->num_threads = num_threads;
    pool->queue_size = queue_size;
    pool->queue = malloc(sizeof(Task) * queue_size);
    pool->threads = malloc(sizeof(pthread_t) * num_threads);
    pool->front = pool->back = pool->count = 0;
    pool->shutdown = 0;

    pthread_mutex_init(&pool->lock, NULL);
    pthread_cond_init(&pool->notify, NULL);

    for (int i = 0; i < num_threads; i++) {
        pthread_create(&pool->threads[i], NULL, worker, (void*)pool);
    }
    return 0;
}

Task Submission

The enqueue function adds a task to the ring buffer, wakes up a waiting worker, and returns 0 on success or -1 if the queue is full.

int thread_pool_submit(ThreadPool *pool, void (*func)(void*), void *arg) {
    pthread_mutex_lock(&pool->lock);
    if (pool->count == pool->queue_size) {
        pthread_mutex_unlock(&pool->lock);
        return -1;  // queue full
    }
    pool->queue[pool->back] = (Task){ .function = func, .arg = arg };
    pool->back = (pool->back + 1) % pool->queue_size;
    pool->count++;
    pthread_cond_signal(&pool->notify);
    pthread_mutex_unlock(&pool->lock);
    return 0;
}

Blocking variants can wait on a different condition variable until space becomes available, or the caller can simply retry after a brief backoff.

Worker Thread Function

Each worker loops until the pool signals shutdown. It waits on the condition variable when the queue is empty.

void *worker(void *arg) {
    ThreadPool *pool = (ThreadPool*)arg;
    while (1) {
        pthread_mutex_lock(&pool->lock);
        while (pool->count == 0 && !pool->shutdown) {
            pthread_cond_wait(&pool->notify, &pool->lock);
        }
        if (pool->shutdown && pool->count == 0) {
            pthread_mutex_unlock(&pool->lock);
            break;
        }
        // fetch task
        Task task = pool->queue[pool->front];
        pool->front = (pool->front + 1) % pool->queue_size;
        pool->count--;
        pthread_mutex_unlock(&pool->lock);

        task.function(task.arg);  // execute task
    }
    return NULL;
}

Graceful Shutdown

To stop the pool, set the shutdown flag, wake all workers, then join each thread. After all threads finish, clean up resources.

int thread_pool_shutdown(ThreadPool *pool) {
    pthread_mutex_lock(&pool->lock);
    pool->shutdown = 1;
    pthread_cond_broadcast(&pool->notify);  // wake all workers
    pthread_mutex_unlock(&pool->lock);

    for (int i = 0; i < pool->num_threads; i++) {
        pthread_join(pool->threads[i], NULL);
    }
    free(pool->threads);
    free(pool->queue);
    pthread_mutex_destroy(&pool->lock);
    pthread_cond_destroy(&pool->notify);
    return 0;
}

Complete Usage Example

Here is a minimal driver that submits ten tasks to a four‑thread pool:

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>

// (Assume the above definitions for Task, ThreadPool, worker, etc.)

void print_task(void *arg) {
    int *num = (int*)arg;
    printf("Task %d executed by thread %lu\n", *num, pthread_self());
    free(arg);
}

int main(void) {
    ThreadPool pool;
    thread_pool_init(&pool, 4, 20);

    for (int i = 0; i < 10; i++) {
        int *val = malloc(sizeof(int));
        *val = i;
        thread_pool_submit(&pool, print_task, val);
    }

    thread_pool_shutdown(&pool);
    return 0;
}

Synchronization Details

The mutex lock ensures that only one thread modifies the queue, front, back, count, or shutdown at a time. The condition variable notify allows workers to block efficiently when there are no tasks; they do not busy‑wait and waste CPU.

In the worker loop, the pattern while (pool->count == 0 && !pool->shutdown) { pthread_cond_wait(...); } is mandatory because condition variables can suffer from spurious wakeups. The mutex is reacquired after returning from pthread_cond_wait.

When the pool shuts down, pthread_cond_broadcast wakes every waiting worker. Each worker checks the shutdown flag after waking and exits if pool->shutdown is set and the queue is empty.

Handling Task Completion and Return Values

The simple pool above executes tasks but does not capture results. For a more versatile pool, you can extend tasks to return values via a future or promise pattern. A common approach is to store an optional void *result and a completion flag inside the task structure, along with a condition variable that the submitting thread can wait on.

Alternatively, tasks can push results into a separate output queue or invoke a callback upon completion. For example, a web server might gather response structures into a queue and hand them to an output thread.

Advanced Topics

Dynamic Resizing

Some workloads require adjusting the number of worker threads at runtime. You can implement a “controller” thread that monitors queue depth, CPU utilization, and the number of idle threads. When the queue grows too large, it spawns more workers; when many workers are idle, it signals some to exit.

Work Stealing (for Fork‑Join Parallelism)

Instead of a single global queue, each worker maintains its own deque of tasks. When a worker runs out of work, it can steal tasks from the tail of another worker’s deque. This technique improves load balancing for recursive, divide‑and‑conquer algorithms. It is the core of modern thread pools like Intel TBB and some C++ implementations.

Thread‑Local Storage

Workers often benefit from thread‑local data: caches, memory pools, or random number generators. Use __thread or pthread_key_create to allocate per‑thread resources without contention.

Performance Considerations

Queue size – a bounded queue prevents unbounded memory growth but can cause back‑pressure. Choose a capacity that matches the expected peak burst of tasks.
Mutex contention – with many threads contending for the global lock, the queue becomes a bottleneck. Consider lock‑free queues (e.g., using atomic operations) for extremely high‑throughput scenarios. However, lock‑free programming is error‑prone; test thoroughly.
Number of threads – on CPU‑bound workloads, a good rule of thumb is number of cores. For I/O‑bound tasks, you can exceed the core count because threads will block on I/O. Measure and adjust.
Cache effects – avoid false sharing by padding frequently‑updated data (like count and front) to separate cache lines.

Common Pitfalls

Deadlock from holding the lock during task execution – never execute a task while holding the pool mutex; the task might try to submit a new task or call a pool function, causing a deadlock. The example code releases the lock before calling task.function.
Busy waiting – polling for tasks burns CPU. Always use a condition variable or other blocking primitive.
Memory leaks – tasks that allocate memory must free it, or the pool must provide a destructor callback. The shutdown routine must also release any remaining tasks in the queue.
Starvation – if tasks are added faster than workers can process them, the queue grows unboundedly (unless bounded). Use flow‑control mechanisms such as blocking on submit when the queue is full.
Double free or use‑after‑free – ensure that arguments passed to tasks are either copied or managed with proper ownership.

Testing and Debugging

Debugging concurrent code in C can be challenging. Tools and practices that help:

Valgrind’s Helgrind or DRD – detect data races and improper lock usage.
AddressSanitizer (-fsanitize=address) – catch memory errors.
ThreadSanitizer (-fsanitize=thread) – race condition detection.
Assertions – use assert to verify invariants like queue size consistency or that the mutex is held when accessing shared state.
Stress testing – run the pool with many more tasks than threads, varying task sizes, and random pause times to expose latent bugs.

Real‑World Usage and Further Reading

Thread pools are the backbone of countless production systems. The open‑source glibc threading infrastructure inspired many implementations. The POSIX Threads manual pages (pthread_create, pthread_cond_wait) are essential reading. For deeper study of lock‑free and work‑stealing designs, see Jeff Preshing’s introduction to lock‑free programming and the Linux pthreads manual.

Book recommendations include Programming with POSIX Threads by David R. Butenhof and Concurrency in C# (though C‑centric, the concepts transfer directly). The Wikipedia article on thread pools provides a high‑level overview of patterns across languages.

Conclusion

Implementing a thread pool in C is a practical exercise in concurrency that yields immediate benefits: lower overhead, controlled parallelism, and a clean separation of task scheduling from task execution. Starting with a simple bounded queue and worker threads, you can evolve the design to handle dynamic resizing, work stealing, or lock‑free operations. The key is to master the synchronization primitives – mutexes and condition variables – and to avoid common pitfalls like deadlocks and busy waiting. With a solid thread pool, your C programs can harness multicore hardware efficiently and remain responsive under heavy load.