Optimizing IIR Filter Designs for Reduced Computational Complexity in Wearable Devices

Wearable devices such as smartwatches, fitness trackers, and medical monitoring patches rely on continuous sensor data acquisition and real-time processing. From removing motion artifacts from electrocardiogram (ECG) signals to cleaning accelerometer data for step counting, digital filtering plays a central role. Among the many filter types, Infinite Impulse Response (IIR) filters are a workhorse in these applications due to their ability to achieve sharp frequency responses with relatively few coefficients. However, the recursive nature of IIR filters introduces computational overhead that can strain battery life and real-time performance. This article presents a comprehensive set of strategies to reduce the computational complexity of IIR filter designs, enabling efficient deployment in resource-constrained wearable systems.

Understanding IIR Filters in Wearables

IIR filters are digital filters defined by a transfer function that includes both feedforward and feedback coefficients. Mathematically, an IIR filter can be represented as:

y[n] = b0·x[n] + b1·x[n-1] + ... + bM·x[n-M] - a1·y[n-1] - ... - aN·y[n-N]

Because the output depends on previous outputs (feedback), IIR filters can achieve a given frequency response with a lower order than an equivalent Finite Impulse Response (FIR) filter. For wearable devices, this means fewer multiply-accumulate (MAC) operations per sample, reduced memory for coefficient storage, and lower latency. Classic IIR filter topologies include Butterworth, Chebyshev, and elliptic designs, each offering different trade-offs between ripple, roll-off, and phase linearity.

Despite their efficiency, IIR filters present challenges in wearables: they can become unstable due to coefficient quantization, suffer from limit-cycle oscillations, and require careful scaling to avoid overflow in fixed-point arithmetic. Moreover, the recursive calculations prevent easy parallelization, making each sample a serial dependency. In battery-powered devices with low-power microcontrollers or custom ASICs, every saved arithmetic operation directly translates into longer battery life or lower silicon area.

Key Strategies for Reducing IIR Computational Complexity

1. Filter Order Reduction

The most direct way to lower computational cost is to reduce the filter order. Fewer coefficients mean fewer MAC operations per sample. However, simply lowering the order without considering the specification often degrades performance. Fortunately, several techniques can help achieve the required response with a minimal order:

  • Pole-zero placement optimization: By carefully positioning poles and zeros in the z-plane, designers can meet passband and stopband constraints with a lower-order filter than a standard Butterworth or Chebyshev design. For example, using an elliptic filter with a small allowable stopband ripple often yields the lowest order for a given transition width.
  • Matched-z transform: When converting analog prototypes, the bilinear transform is standard, but the matched-z transform can sometimes preserve the frequency response with a lower digital order if the analog filter is simple.
  • Iterative order reduction: Start with a high-order design, then apply model reduction techniques like balanced truncation or Hankel norm approximation to produce a stable, reduced-order IIR filter that closely matches the original frequency response. This is particularly useful when the original filter is derived from measurement data.

In practice, a third-order IIR filter can often replace a fifth-order design with negligible performance loss for biosignal filtering tasks like EMG or ECG baseline wander removal.

2. Coefficient Quantization and Word-Length Reduction

Wearable microcontrollers typically operate with 16-bit or 24-bit fixed-point arithmetic. Using full double-precision floating-point is both power-hungry and unnecessary. Reducing the coefficient word length from 32-bit floating-point to, say, 16-bit fixed-point cuts memory usage by half and speeds up arithmetic operations (especially if the processor has dedicated integer multiply-accumulate instructions).

  • Choose the right quantization scheme: Use the Q-format (e.g., Q15 for 16-bit signed fractional) to represent coefficients. Rounding to the nearest representable value may cause instability; considering pole sensitivity helps allocate more bits to sensitive coefficients.
  • Optimize quantization using statistical methods: Apply simulated annealing or genetic algorithms to find a set of quantized coefficients that still meet the filter specification. This can allow a reduction of 2–4 bits compared with naive rounding.
  • Use powers-of-two coefficients: If coefficients can be approximated by sums of powers of two, the multiplication can be replaced by a few shifts and adds, drastically reducing computational complexity. This is particularly effective for fixed-coefficient IIR filters used in wearable audio or heart-rate monitoring.

For example, a second-order IIR notch filter with coefficients quantized to powers-of-two can be implemented using only four shift operations and three additions per sample—perfect for a low-power ARM Cortex-M0 core.

3. Cascade vs. Direct Form Structures

The structure in which an IIR filter is implemented greatly affects computational load, stability, and sensitivity to quantization. The most common forms are Direct Form I and II, and their transposed variants. For high-order designs, a cascade of second-order sections (SOS) is generally preferred:

  • Second-order sections (biquads): Each section implements a second-order IIR filter. The cascade structure reduces coefficient sensitivity, making it easier to maintain stability with short word lengths. Computation per sample scales linearly with order: a sixth-order filter requires three biquads, each needing 5 MACs, total 15 MACs per sample.
  • Parallel form: Decompose the transfer function into a sum of first- and second-order subsections. This can sometimes reduce the total number of MACs if the original filter has repeated poles.
  • Lattice/ladder structure: While more computation-heavy in general, lattice structures offer excellent stability properties and are less sensitive to quantization. They can be attractive when the order is low (e.g., adaptive filters).

For wearables, the cascade biquad implementation is often the optimal balance of computational cost, stability, and memory usage. Manufacturers like Analog Devices and STMicroelectronics provide reference implementations optimized for their low-power MCUs.

4. Frequency-Domain Implementation Using FFT

When the filtering task involves long duration signals or multiple channels, time-domain IIR filtering can be replaced by an overlap-add or overlap-save method using the Fast Fourier Transform (FFT). While this appears counterintuitive (IIR is already efficient in time domain), for very high-order IIR filters (order > 50) or when processing multiple channels simultaneously, FFT-based convolution can become more efficient.

  • Partitioned convolution: For real-time applications, the impulse response of the IIR filter (which is infinite) must be truncated. A practical approach is to approximate the IIR filter with a long FIR filter (if the impulse response decays quickly) and then use FFT-based convolution. This is common in wearable audio noise cancellation.
  • Subband filtering: Some wearables use filterbanks (e.g., for spectral analysis). In such cases, it may be beneficial to implement the entire bank in the frequency domain, combining FFT and inverse FFT per frame.

Frequency-domain methods are most suited for wearables with ample RAM and a hardware FFT accelerator (common in many modern Bluetooth Audio SoCs).

5. Approximate Filtering and Reduced-Precision Arithmetic

Not all wearable applications require the full precision of a textbook IIR filter. Approximate computing techniques can trade a small amount of accuracy for significant computational savings:

  • Truncated multipliers: Use inexact arithmetic circuits that compute the multiplication result with fewer bits. A 10-bit multiplier instead of 16-bit can cut power by 30–40% in custom hardware.
  • Stochastic computing: For very low-power designs, stochastic representations (bits of 1s and 0s) can implement multiplication with a single gate, but this often requires re-timing and error correction.
  • Downsampling and upsampling: If the signal bandwidth is wider than necessary, decimate the signal before filtering. IIR filtering at a lower rate saves per-sample MACs, and the decimation filter can be another low-cost design.
  • Adaptive order modification: Use a variable-order filter that starts with a high order during calibration and reduces order (by turning off biquad sections) when the signal is stationary. This dynamic complexity scaling can be controlled by a simple activity detector.

Example: A wrist-worn accelerometer-based fall detection system might use a second-order IIR highpass filter instead of a fourth-order when the user is stationary, reducing power consumption by 20%.

6. Hardware-Software Co-Design for Ultra-Low Power

Many wearable devices embed a dedicated digital signal processor (DSP) or a microcontroller with a hardware MAC unit. Optimizing IIR filters for the specific instruction set and memory architecture is critical:

  • Circular buffer addressing: Use the hardware circular buffer for state variable storage in biquad structures. Most DSPs support this without additional address calculation overhead.
  • SIMD (Single Instruction, Multiple Data): Some ARM Cortex-M4/7 cores support SIMD instructions that process two 16-bit MACs simultaneously. If the filter coefficients are quantized to 16 bits, two biquad sections can be computed in a single MAC cycle.
  • Clock gating and sleep modes: Implement the filter in a task that runs only when sensor data is available. Use low-power sleep modes of the MCU between samples. Many microcontrollers have hardware filter peripherals (e.g., CORDIC, digital filter coprocessor) that can run independently.
  • Zero-overhead loops: Unroll loops for fixed-order filters to eliminate branch penalties. A third-order IIR filter implemented as inline code runs up to 30% faster than a loop-based implementation.

Practical Case Study: ECG Signal Filtering in a Smartwatch

To illustrate these strategies, consider a wearable that acquires a single-lead ECG at 250 Hz. The raw signal contains baseline wander (0.5 Hz and below), power-line interference (50/60 Hz), and high-frequency muscle noise. The design goals are to reject these artifacts with a passband from 0.5 to 40 Hz and a stopband attenuation of 20 dB at 50 Hz.

A conventional approach uses a fourth-order Butterworth bandpass IIR filter (cascade of two biquads). In floating-point on a low-power Cortex-M4, this requires about 10 MAC operations per sample. Total power consumption for filtering alone is around 0.5 mW. By switching to a third-order elliptic design (order reduction) that still meets the 20 dB stopband requirement, the MAC count drops to 7.5 per sample. Further, quantizing coefficients to Q15 fixed-point reduces memory and allows the use of 16-bit SIMD MACs: the biquad cascade can now be computed in 4 instructions per section, totaling 8 instructions per sample. Power consumption falls to 0.3 mW.

Additionally, by using a power-of-two coefficient for the 50 Hz notch filter (implemented as a separate biquad), the notch filter requires only 3 shift-add operations instead of 3 MACs. The overall filter chain consumes just 0.2 mW—a 60% reduction from the original design—while maintaining signal quality metrics (SNR within 1 dB of the original).

This optimization allowed the wearable to run the ECG acquisition and filtering continuously for 24 hours on a 150 mAh battery, compared with 16 hours with the original implementation.

Challenges and Trade-offs

Every optimization brings potential pitfalls. Order reduction may increase passband ripple or degrade group delay, which can be problematic for applications requiring phase alignment (e.g., multi-channel EEG). Coefficient quantization can push poles near the unit circle, causing instability. To mitigate this, always perform pole-zero analysis with quantized coefficients. Approximate filtering introduces noise; in medical-grade wearables, regulatory standards (e.g., IEC 60601) may demand specific noise floors.

Developers should also consider power overhead from memory access. Storing coefficients in flash vs. SRAM affects dynamic power; using DMA transfers for coefficients can reduce CPU cycles. When implementing frequency-domain methods, the FFT itself consumes power—only use if the filter order is high enough to make the crossover point economical (typically order > 50).

Finally, testing with real-world sensor signals is essential. Quantization artifacts might be imperceptible in sine-wave tests but become noticeable in actual ECG or PPG waveforms. Always validate with representative data from the target device.

External Resources and Further Reading

Conclusion

IIR filters remain a cornerstone of signal processing in wearable devices. Their recursive structure offers inherently low-order implementations, but designers must navigate constraints of limited word length, memory, and energy. By applying a combination of order reduction, clever coefficient quantization, cascade biquad structures, approximate computing, and low-power hardware utilization, developers can reduce computational complexity by 50–80% without sacrificing essential performance. As edge processing continues to move into wearables for privacy and latency reasons, efficient IIR filter design will be a key enabler for next-generation health monitors, smart accessories, and AR/VR headsets. The strategies outlined here provide a practical toolkit for any engineer working on embedded signal processing in battery-powered devices.