The Challenge of Multi‑Format Data Import and Export

Modern applications frequently need to exchange data with external systems, users, or legacy tools. Supporting multiple file formats—CSV, JSON, XML, YAML, Excel—is no longer a luxury but a core requirement. A naïve approach that hard‑codes parsing logic for each format leads to tightly coupled, brittle code that is difficult to maintain and nearly impossible to extend without touching existing, tested functionality. The Factory Method design pattern offers a clean, scalable solution by decoupling the creation of format‑specific handlers from the use of those handlers. This article walks through a production‑ready implementation in PHP (the language behind Directus), explains the pattern’s mechanics, and shows how it makes your import/export tools future‑proof.

Understanding the Factory Method Pattern

The Factory Method pattern defines an interface for creating an object but lets subclasses decide which concrete class to instantiate. It falls under the creational design patterns family and is often preferred over direct instantiation because it promotes loose coupling and adheres to the Open/Closed Principle: classes are open for extension but closed for modification.

In the context of file handling, the “product” of the factory is a handler that knows how to read and write a specific format. The “creator” (the factory) centralises the decision of which handler to return. When a new format appears, you write a new handler class and extend the factory—zero changes to existing handlers or the client code that uses them.

Key Participants

  1. Product – The interface that all handlers implement (e.g., FileHandler).
  2. ConcreteProduct – Specific implementations like CsvHandler, JsonHandler, XmlHandler.
  3. Creator – The abstract class (or interface) that declares the factory method.
  4. ConcreteCreator – Overrides the factory method to produce concrete products.

Step‑by‑Step Implementation in PHP

We’ll build a robust import/export subsystem that can handle CSV, JSON, and XML out of the box, and can be extended to support YAML, Excel, or even custom binary formats without touching the core logic.

Step 1: Define the Product Interface

The interface declares two core operations: read (parsing an incoming file) and write (serialising data to a file). Both return a standardised array representation so that the rest of the application doesn’t need to worry about format details.

<?php
interface FileHandler
{
    public function read(string $filePath): array;
    public function write(string $filePath, array $data): void;
}

Step 2: Build Concrete Handlers

Each format gets its own class. For brevity we show only key methods; real implementations would include error handling, encoding detection, and streaming for large files.

CSV Handler

class CsvHandler implements FileHandler
{
    public function read(string $filePath): array
    {
        $rows = [];
        if (($handle = fopen($filePath, 'r')) !== false) {
            $headers = fgetcsv($handle);
            while (($data = fgetcsv($handle)) !== false) {
                $rows[] = array_combine($headers, $data);
            }
            fclose($handle);
        }
        return $rows;
    }

    public function write(string $filePath, array $data): void
    {
        $handle = fopen($filePath, 'w');
        if (empty($data)) return;
        fputcsv($handle, array_keys($data[0]));
        foreach ($data as $row) {
            fputcsv($handle, $row);
        }
        fclose($handle);
    }
}

JSON Handler

class JsonHandler implements FileHandler
{
    public function read(string $filePath): array
    {
        $content = file_get_contents($filePath);
        return json_decode($content, true) ?? [];
    }

    public function write(string $filePath, array $data): void
    {
        file_put_contents($filePath, json_encode($data, JSON_PRETTY_PRINT));
    }
}

XML Handler

class XmlHandler implements FileHandler
{
    public function read(string $filePath): array
    {
        $xml = simplexml_load_file($filePath);
        $json = json_encode($xml);
        return json_decode($json, true);
    }

    public function write(string $filePath, array $data): void
    {
        $xml = new SimpleXMLElement('<root/>');
        $this->arrayToXml($data, $xml);
        $xml->asXML($filePath);
    }

    private function arrayToXml(array $data, SimpleXMLElement &$xml): void
    {
        foreach ($data as $key => $value) {
            if (is_array($value)) {
                $child = $xml->addChild($key);
                $this->arrayToXml($value, $child);
            } else {
                $xml->addChild($key, htmlspecialchars($value));
            }
        }
    }
}

Step 3: Implement the Factory Method

The abstract creator class declares the factory method createHandler(). The concrete factory makes the decision based on a format string. Note that we use switch here—later we’ll discuss registry‑based alternatives for open‑ended extension.

abstract class FileHandlerFactory
{
    abstract public function createHandler(string $format): FileHandler;

    // Template method (optional) can add logging or validation
    public function getHandler(string $format): FileHandler
    {
        return $this->createHandler($format);
    }
}

class ConcreteFileHandlerFactory extends FileHandlerFactory
{
    public function createHandler(string $format): FileHandler
    {
        switch (strtolower($format)) {
            case 'csv':
                return new CsvHandler();
            case 'json':
                return new JsonHandler();
            case 'xml':
                return new XmlHandler();
            default:
                throw new \InvalidArgumentException("Unsupported format: $format");
        }
    }
}

Step 4: Client Usage

The client code never instantiates handlers directly. It asks the factory for a handler and then uses it.

$factory = new ConcreteFileHandlerFactory();
$handler = $factory->getHandler('csv');
$data = $handler->read('/path/to/input.csv');
// ... process data ...
$handler->write('/path/to/output.csv', $processedData);

If tomorrow a YAML requirement appears, you create YamlHandler (implementing FileHandler), extend the factory to add a case 'yaml', and the client code remains untouched.

Why Factory Method Over Direct Instantiation or Simple Switch Statements?

You might be tempted to put a big if/elseif or switch directly in the client. That approach violates the Open/Closed Principle: every new format forces you to modify the client. Factory Method moves the creation logic into its own abstraction, making the system extendable without modification. This is especially valuable when the factory is used across many parts of a large codebase (e.g., Directus’ import/export service).

Comparison with Other Creational Patterns

  • Abstract Factory – Creates families of related objects. If you needed not just a file handler but also a validator and a formatter for each format, Abstract Factory might be a better fit. However, for a single product hierarchy, Factory Method is simpler and more focused.
  • Builder – Useful when constructing complex objects (e.g., a handler with many optional configuration settings). For file handlers that are relatively stateless, Factory Method is sufficient.
  • Prototype – Could clone existing handler instances to avoid construction overhead, but handlers are lightweight so the pattern adds little value here.

Handling Real‑World Concerns

Error Handling and Validation

Your factory should never return a handler it cannot create. The example above throws an InvalidArgumentException. In a production system, you might also check whether the format is allowed by some configuration or whether the file extension matches the format. Use the factory method to centralise such checks.

Configuration‑Driven Factories

Instead of hard‑coding the switch, you can maintain a registry (e.g., an associative array) that maps format keys to handler class names. The registry can be populated from configuration files, allowing new formats to be added without touching any PHP code in the factory.

class RegistryFileHandlerFactory extends FileHandlerFactory
{
    private array $handlers = [];

    public function registerHandler(string $format, string $handlerClass): void
    {
        $this->handlers[$format] = $handlerClass;
    }

    public function createHandler(string $format): FileHandler
    {
        $format = strtolower($format);
        if (!isset($this->handlers[$format])) {
            throw new \InvalidArgumentException("Unsupported format: $format");
        }
        $class = $this->handlers[$format];
        return new $class();
    }
}

Clients or modules can register handlers at runtime:

$factory->registerHandler('yaml', YamlHandler::class);

Performance Considerations

Factory Method adds minimal overhead (a single method call and a lookup). If you need to create many handlers in a hot loop, consider caching the handler instances (flyweight pattern) or pre‑creating them in the factory. However, file handlers are typically used once per import/export operation, so the pattern’s flexibility far outweighs any micro‑performance cost.

Testing the Factory

Because the factory centralises creation, you can easily unit‑test it by mocking the handlers or by using the factory with a simplified registry. You can also test that the correct handler type is returned for each format:

public function testFactoryReturnsCsvHandler(): void
{
    $factory = new ConcreteFileHandlerFactory();
    $handler = $factory->getHandler('csv');
    $this->assertInstanceOf(CsvHandler::class, $handler);
}

Testing individual handlers is straightforward—they have a clear interface.

Integrating with Directus’ Import/Export Features

Directus is a headless CMS that heavily relies on data import and export. It supports JSON, CSV, and XML out of the box for its data flows. By adopting the Factory Method pattern internally, the Directus team can add new formats (e.g., Parquet for big data, Excel for business users) without rewriting the import/export controllers or breaking existing integrations. When you build a Directus extension that reads or writes custom formats, you can follow the same pattern: create a handler, register it with a factory, and hook into the platform’s data pipeline.

For example, a custom endpoint that imports user data from an Excel spreadsheet could use a factory to obtain the appropriate handler. The endpoint remains format‑agnostic and reusable:

$factory = new DirectusFileHandlerFactory();
$handler = $factory->getHandler($request->getUploadedFile()->getClientOriginalExtension());
$data = $handler->read($filePath);
// Pass $data to Directus' item service

Beyond Reading and Writing: Extending the Pattern

The Factory Method can also be applied to other aspects of file handling, such as:

  • Format detection – A factory that takes a file and returns the appropriate handler based on content sniffing (e.g., MIME type or magic bytes).
  • Validation strategies – Different formats may require different validation rules; a factory can produce a validator alongside the handler (returning a composite object).
  • Serialisation depth – For nested data, you might want handlers that flatten or preserve structure based on configuration. Factory Method can parameterise creation to handle both flat and nested cases.

The pattern scales naturally. If your tool grows to support dozens of formats, consider combining Factory Method with the Strategy pattern: the factory returns a handler (the strategy for reading/writing), while a second abstraction decides which handler to use based on file metadata.

Common Pitfalls and How to Avoid Them

  1. Over‑abstracting – Don’t use Factory Method if you only ever support one format. The pattern adds value only when you anticipate change or need to decouple creation from use.
  2. Ignoring dependency injection – If handlers need dependencies (database connections, logging), avoid hard‑coding them inside the factory. Either pass them through the factory’s constructor or use a dependency injection container.
  3. Forgetting to handle format validation – Always validate the format string before attempting to create a handler. The factory should fail fast with a clear error message.
  4. Mixing creation logic with business logic – The factory’s sole job is to produce objects. Don’t add reading/writing logic inside the factory itself.

Conclusion

The Factory Method pattern is a proven, flexible solution for building data import/export tools that must support multiple file formats. By encapsulating handler creation behind an abstract interface, you gain the ability to extend your application with new formats without modifying existing, tested code. This approach aligns with the Open/Closed Principle and results in maintainable, testable, and scalable code.

Directus, a modern headless CMS, exemplifies how such patterns can be used to create a robust data pipeline. Whether you’re building a small utility or a large enterprise platform, applying the Factory Method to your file handling logic will save you headaches when the next “must‑support” format appears.

Start by defining a clear product interface, implement concrete handlers for the formats you need now, and let a factory method decide which one to instantiate. Your future self—and your team—will thank you.