The ARM Cortex-M series processors have become the backbone of modern embedded systems, powering everything from simple microcontrollers to sophisticated IoT endpoints. At the heart of efficient and reliable firmware development lies a deep understanding of register access. Registers are the immediate workspace for the processor, and knowing how to read, write, and manipulate them directly impacts performance, power consumption, and correctness. This guide provides a comprehensive, practical exploration of register access techniques across Cortex-M variants (M0, M3, M4, M7, M33, and others), covering assembly, C, and memory-mapped I/O patterns.

ARM Cortex-M Register Architecture

To master register access, one must first understand the register file layout. The Cortex-M architecture defines several categories of registers, each with specific roles and access rules.

Core General-Purpose Registers (R0–R12)

These 32-bit registers are the workhorses of data manipulation. They are available in all Thumb/Thumb-2 instructions. Understanding their usage conventions (e.g., R0–R3 for argument passing, R4–R11 for callee-saved variables) is essential for writing efficient assembly or analyzing compiler output.

Stack Pointer (R13 / SP)

The Cortex-M uses a banked stack pointer: the Main Stack Pointer (MSP) is used for exception handling and privileged code, while the Process Stack Pointer (PSP) is used for thread mode. Accessing the correct SP requires understanding the current mode and using the appropriate CMSIS macro (__get_MSP(), __set_PSP()).

The LR holds the return address on a function call, but in exception handling it takes special values (e.g., 0xFFFFFFF9 for return to thread mode using MSP). Direct manipulation of LR in exception handlers is a common source of bugs.

Program Counter (R15 / PC)

Reading the PC in ARM state is not directly allowed in Thumb code, but effective addresses can be computed using ADR or MOV with PC-relative offsets. Writing to the PC causes a branch.

Program Status Register (xPSR)

The xPSR is a composite of three registers: Application PSR (APSR) (condition flags N, Z, C, V, Q), Interrupt PSR (IPSR) (current exception number), and Execution PSR (EPSR) (T-bit for Thumb state, IF-THEN state). Each piece is accessible via CMSIS functions like __get_APSR() and __set_IPSR().

System Control Registers

Beyond the core registers, Cortex-M processors include a rich set of system registers for control and status: CONTROL (stack pointer selection, privilege level), PRIMASK (mask all exceptions except NMI), FAULTMASK (mask all faults), BASEPRI (priority masking), and SHPR1–SHPR3 (system handler priority registers). These are often accessed via special instructions (MRS, MSR) or CMSIS intrinsics.

Register Access in Assembly Language

Assembly provides the most direct path to registers, but it must be restricted to performance-critical sections or low-level startup code. The MRS (move from system register) and MSR (move to system register) instructions are used for special registers. For general registers, the MOV, LDR, and STR instruction families apply.

Example: Reading the current stack pointer into R0:

MRS R0, MSP

Example: Setting the priority mask to a value of 0x80 (only priority ≥ 0x80 can interrupt):

MOV R0, #0x80
MSR BASEPRI, R0

Direct memory-mapped peripheral register access in assembly uses LDR and STR with absolute addresses or PC-relative literals.

Register Access in C Using CMSIS

The Cortex Microcontroller Software Interface Standard (CMSIS) abstracts most register access into portable, readable code. It provides:

  • Core register access functions: e.g., __get_PRIMASK(), __set_CONTROL(), __enable_irq(), __disable_irq().
  • Structures for peripheral registers: Each peripheral (GPIO, USART, Timer) is defined as a C struct at a fixed base address. Fields can be accessed with a clear naming convention, e.g., GPIOA->MODER.
  • Bit-band regions (Cortex-M3/M4): CMSIS provides BIT_BAND macros for atomic bit manipulation of memory regions, though this feature is less common now.

Example: Enabling a GPIO output using CMSIS structure:

// Enable clock for GPIOA
RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN;
// Set PA0 as output (MODER bits [1:0] = 01)
GPIOA->MODER |= (1U << 0);

Using CMSIS eliminates magic numbers and improves portability across Cortex-M devices from different vendors.

Inline Assembly as an Escape Hatch

When CMSIS lacks a needed macro, or when extremely tight timing is required, inline assembly can be used. However, the compiler's optimizer must be handled carefully. The volatile qualifier is essential:

__asm volatile ("WFI"); // Wait for interrupt

For more complex sequences, GCC extended asm syntax allows specifying input/output operands and clobber lists. For example, reading the current exception number:

unsigned int exc_num;
__asm volatile ("MRS %0, IPSR" : "=r" (exc_num));

Memory-Mapped Peripheral Register Access

Peripheral registers are accessed through the memory-mapped I/O (MMIO) region, typically starting at 0x40000000 for peripherals. Correct usage hinges on the volatile qualifier to prevent the compiler from optimizing away repeated reads or writes. Additionally, alignment and access width matter: most registers are 32-bit aligned and must be accessed with 32-bit loads/stores, though some support halfword or byte accesses.

Example: Direct pointer access to a GPIO output data register:

#define GPIOA_ODR (*(volatile uint32_t *)0x40020014U)
GPIOA_ODR |= (1U << 5); // Set PA5 high

This approach is simple but error-prone: typos in addresses, missing volatile, or incorrect width can cause subtle bugs. A safer pattern is to define a struct and use a base address macro, as CMSIS does.

Bit Manipulation and Read-Modify-Write

Most register fields must be changed without disturbing others. The classic pattern is read-modify-write, which is not atomic. For example:

uint32_t temp = GPIOA->MODER;
temp &= ~(3U << 0); // Clear bits [1:0]
temp |= (1U << 0); // Set to output
GPIOA->MODER = temp;

In concurrent systems (interrupts, multi-core), this can cause corruption. Techniques to mitigate include using bit-band (on supported cores), disabling interrupts during the sequence (__disable_irq() / __enable_irq()), or using hardware-supported atomic set/clear registers (e.g., BSRR for GPIO on STM32). For cases where atomicity is critical, the __LDREX / __STREX exclusive access instructions can be used in C via intrinsics.

Best Practices for Register Access

Following these guidelines ensures robust, maintainable, and efficient firmware:

  • Always use volatile: Any pointer or variable that maps to a hardware register must be declared volatile. Without it, the compiler may optimize away accesses, leading to non-functional hardware.
  • Prefer CMSIS or vendor HAL: These libraries are thoroughly tested, provide documentation, and are portable across a family. They also handle subtle differences like endianness and alignment.
  • Encapsulate register access in functions or macros: For example, #define SET_BIT(reg, bit) ((reg) |= (1U << (bit))) improves readability and makes debugging easier.
  • Document register configurations: Use comments to explain why a particular value is written, especially for magic numbers. Better yet, use named constants.
  • Be cautious with compiler optimizations: High optimization levels can reorder memory accesses unless memory barriers (__DSB(), __DMB(), __ISB()) are used. This is especially important when enabling/disabling peripherals or changing system control registers.
  • Atomic access for shared resources: When a register is modified by both main code and an interrupt handler, ensure atomicity using the techniques discussed above.

Memory Protection Unit (MPU) Considerations

When the MPU is active, register access may fault if the memory region is not configured correctly. Peripheral memory must be marked as strongly-ordered or device memory type to avoid speculative loads or reordering. CMSIS MPU configuration functions (ARM_MPU_SetRegion()) simplify this.

Common Pitfalls and Debugging Techniques

Even experienced developers can stumble. Here are typical issues and how to avoid them:

  • Missing volatile: The code works in debug mode but fails in release. Solution: always enable compiler warnings (-Wall -Wextra) and treat warnings as errors. Use volatile consistently.
  • Incorrect access width: A 16-bit register accessed as 32-bit may read adjacent registers incorrectly. Solution: Use the correct pointer type: volatile uint16_t * for 16-bit registers.
  • Side effects of read-modify-write: Writing to a register that has write-only or toggle semantics (e.g., GPIO output data register on some architectures) can cause unexpected behavior. Solution: Consult the reference manual for each register’s behavior.
  • Concurrency: A read-modify-write interrupted by an ISR that modifies the same register leads to lost updates. Solution: disable interrupts locally or use hardware atomic features.
  • Uninitialized register base address: A typo in the base address can corrupt other peripherals. Solution: Use CMSIS definitions provided by the silicon vendor.

Debugging Register Access

Use a debugger to inspect register values directly. With tools like SEGGER J-Link or OpenOCD, you can watch peripheral registers live. Memory view windows allow you to verify that writes actually changed the expected bits. For complex sequences, insert __BKPT() (breakpoint instruction) and single-step.

External Resources for Further Learning

Conclusion

Register access is the fundamental interface between firmware and the hardware of an ARM Cortex-M processor. Mastery of this topic—from core general-purpose registers to memory-mapped peripherals—enables developers to write code that is both efficient and reliable. By leveraging CMSIS, following best practices for volatile and atomicity, and understanding the nuances of each register type, embedded engineers can avoid common pitfalls and accelerate development. As microcontroller complexity grows, the principles outlined in this guide will remain essential for low-level control and optimization.