Foundations: Representing Three-Dimensional Space

Before a single pixel is drawn, a 3D engine must have a mathematical language to describe objects in space. At the core are vectors (three-component tuples representing position, direction, or color) and matrices (4×4 arrays that encode transformations). In C, these are typically defined as simple structs:

  • typedef struct { float x, y, z; } vec3; for points and vectors.
  • typedef struct { float m[16]; } mat4; for transformation matrices stored in column-major order.

Every object is composed of triangles. A triangle is defined by three vertices, each with a position, and optionally a normal and color. Storing these as contiguous arrays (e.g., vec3 vertices[12] for a cube) improves cache efficiency—a critical consideration when you later process thousands of faces per frame.

The Graphics Pipeline: From Vertices to Pixels

A graphics pipeline is a sequence of stages that transforms 3D scene data into a 2D image. In a basic software-rendered engine, you manually implement each stage. The main phases are:

  1. Vertex Processing – applying model, view, and projection transforms.
  2. Rasterization – converting transformed triangles into fragments (potential pixels).
  3. Shading – computing the color of each fragment based on lighting and material.
  4. Output Merging – blending fragments with the frame buffer, including depth testing.

Hardware-accelerated engines (using OpenGL, Vulkan, or DirectX) perform most of these steps on the GPU, but understanding the software path gives you deep insight into how the GPU works under the hood.

Vertex Processing and Transformations

Every vertex starts in model space (local coordinates relative to the object). To position it in the world, you apply a model matrix that encodes translation, rotation, and scaling. Then the view matrix transforms world coordinates into camera-relative coordinates (view space). Finally, the projection matrix maps view space to a normalized coordinate system (clip space) where perspective division and viewport mapping produce screen coordinates.

In C, transformation functions look like:

  • mat4 mat4_identity() – returns an identity matrix.
  • mat4 mat4_translate(float x, float y, float z) – builds a translation matrix.
  • mat4 mat4_rotate_X(float angle_rad) – rotation around the X axis.
  • vec4 mat4_mul(const mat4* m, const vec4* v) – multiplies a 4×4 matrix by a 4-element vector (homogeneous coordinates).

The vertex shader equivalent in software iterates over all vertices, multiplies each by the combined model-view-projection matrix, and stores the result in a transformed vertex buffer.

Projection: Perspective versus Orthographic

Projection controls how depth is represented on screen. Perspective projection makes distant objects appear smaller, creating realism. Its matrix is built from the field of view, aspect ratio, near and far clipping planes. The classic formula involves dividing the x and y components by z (after the matrix multiplication). In C, you compute the projection matrix once and reuse it every frame:

  • mat4 mat4_perspective(float fov_y, float aspect, float near, float far) – returns a standard perspective matrix.

Orthographic projection preserves parallel lines and is simpler, used for UI or CAD tools. It does not involve perspective divide.

Rasterization: Filling Triangles

After projection, you have 2D screen coordinates (like pixel positions) and a depth value (z). Rasterization breaks each triangle into fragments covering integer pixel coordinates. The classic algorithm is scanline conversion:

  1. Sort the triangle’s three vertices by y-coordinate.
  2. Walk down the left and right edges, computing x boundaries for each scanline.
  3. For each pixel in the horizontal span, calculate the fragment’s depth by interpolating across the triangle.
  4. Perform a depth test: compare the fragment’s z with the value already in the depth buffer. If closer, update the depth buffer and compute the fragment’s color.

Implementing the edge-walking algorithm efficiently in C requires tight loops and careful use of fixed-point arithmetic to avoid floating-point overhead. Many hobbyist engines start with a brute-force approach (checking every pixel in the bounding box) and later optimize with edge equations.

Shading and Lighting

A simple flat-shaded engine uses a single color per triangle, computed from the face normal and a single light source. The Lambertian model gives diffuse intensity: I = max(dot(normal, lightDir), 0). In C, you compute the normal as the cross product of two triangle edges (after transforming vertices to world space), then multiply the light intensity by the triangle’s base color.

To add depth, implement Gouraud shading (vertex normals, interpolated across fragments) or Phong shading (per-pixel normals). The latter requires interpolating world positions and normals, then computing lighting per fragment—more expensive but dramatically better results.

Building the Engine in C: Practical Steps

Let’s walk through constructing a minimal software 3D engine from scratch. The goal is to render a rotating cube with ambient and diffuse lighting onto a 640×480 pixel window.

Step 1 – Set Up a Pixel Buffer

Define a memory buffer for the frame and depth: uint32_t* framebuffer = malloc(width * height * 4); and float* depthbuffer = malloc(width * height * sizeof(float));. Clear the framebuffer to black and the depth buffer to 1.0 (far plane) each frame.

Step 2 – Define Scene Data

For a cube, six faces (12 triangles) require 12×3 = 36 vertices. Store them in an array of structs with position (3 floats) and color (3 floats for RGB). Pre-define the 8 unique cube corners and six face colors. Use an index buffer to avoid duplicating vertices.

Step 3 – Transformation Pipeline

Each frame, compute the model matrix from an angle (e.g., rotateY(time)), the view matrix from a fixed camera looking at the origin, and the projection matrix. Combine them into a single MVP matrix: mat4 mvp = mat4_mul(projection, mat4_mul(view, model));. Transform every vertex by multiplying with mvp. Then perform perspective divide (divide x,y,z by w) to get normalized device coordinates (−1 to +1). Map to screen coordinates: screenX = (ndcX + 1) * width/2, screenY = (1 - ndcY) * height/2 (y inverted). Keep the NDC z for depth testing.

Step 4 – Rasterize and Shade

For each triangle, use the scanline method. While walking pixels, interpolate the depth and also interpolate a per-vertex attribute like the world-space position (for Phong) or the vertex normal (for Gouraud). Apply the Lambertian diffuse equation. Write the final color to the framebuffer if the depth test passes.

Step 5 – Display

Use SDL (Simple DirectMedia Layer) or a similar library to create a window and blit the framebuffer. For example, with SDL2: SDL_CreateWindow, SDL_CreateRenderer, SDL_CreateTexture in SDLPIXELFORMATARGB8888, then SDL_UpdateTexture and SDL_RenderCopy each frame.

This software path runs on the CPU, so performance is limited to a few thousand triangles per frame. Once comfortable, you can replace the rasterization layer with OpenGL, offloading the heavy work to the GPU while keeping your transformation and scene logic in C.

Optimizing Your Engine

A basic engine is slow. Several optimizations are essential for real-time performance:

  • Back-face culling: Skip triangles whose normals point away from the camera. Compute the signed area of the projected triangle (in 2D) or check the dot product of the face normal with the view direction.
  • Frustum culling: Test the bounding box of an object against the six planes of the view frustum. If completely outside, skip the entire object.
  • Fixed-point arithmetic: Replace floating-point operations with integer math in tight rasterization loops. Many engine builders use 16.16 fixed-point for scanline interpolation.
  • Pre-transform cache: Avoid transforming the same vertex multiple times by transforming all vertices of a mesh once, then using indices.
  • Blocked frame buffer writes: Write pixels in memory-order (left-to-right, top-to-bottom) to maximize cache hits.

These techniques can push a software renderer from 3–5 fps to 60 fps for scenes of moderate complexity (e.g., a few hundred triangles).

Extending the Engine

Once a solid foundation exists, you can add features incrementally:

  • Texture mapping: Store 2D images and interpolate UV coordinates across triangles. Implement bilinear filtering for smooth results.
  • Advanced shading: Add specular highlights (Blinn-Phong), ambient occlusion, or even shadow maps.
  • Scene management: Use an octree or BSP tree to quickly determine which objects are visible.
  • User interaction: Handle keyboard and mouse input to orbit the camera or pick objects.
  • Pipeline abstraction: Design a shader system: define vertex and fragment functions as function pointers, enabling flexible rendering without recompilation.

Resources and Further Reading

To deepen your understanding, explore the classic texts and online tutorials that inspired this engine:

  • Scratchapixel – In-depth explanations of ray tracing, rasterization, and math.
  • LearnOpenGL – Modern OpenGL tutorials; start with the “Hello Triangle” chapter.
  • Wikipedia: 3D Projection – The formal math behind perspective and orthographic matrices.

Building a 3D engine in C is a rite of passage for graphics programmers. It teaches you not only how to write efficient code but also how to think in three dimensions and compose performance-critical systems. Start with a single rotating cube, then add more triangles, textures, and finally hand it off to the GPU. The journey from raw C to a fully functional engine reveals the magic behind every pixel on screen.