Introduction

Modern data integration tools must connect to a wide variety of data sources: relational databases, REST APIs, file-based systems, streaming platforms, and more. Each source demands its own connection protocol, authentication method, and configuration handling. Without a solid architectural strategy, the connection layer quickly becomes a tangled web of conditional logic, duplicated code, and maintenance nightmares. The Factory Pattern offers a clean, object-oriented solution to this challenge. By centralising the creation of connection objects, it decouples the client code from the specific classes it needs to instantiate, making the system both flexible and extensible.

In this article, we’ll explore how the Factory Pattern can be used to manage multiple data source connections in data integration tools, with concrete examples drawn from Directus, a popular open‑source headless CMS and data platform that relies heavily on extensibility and custom data sources. You’ll learn the theoretical foundations, see practical implementations, and understand the trade‑offs involved.

Understanding the Factory Pattern

The Factory Pattern is a creational design pattern that provides an interface for creating objects in a superclass, but allows subclasses to alter the type of objects that will be created. It encapsulates the instantiation logic, so the client code does not need to know the concrete class names or the intricacies of object construction. This promotes loose coupling and adheres to the Single Responsibility Principle and the Open/Closed Principle.

Variants of the Factory Pattern

  • Simple Factory – A static method or dedicated class that returns different objects based on input parameters. While not a full GoF pattern, it’s a common starting point.
  • Factory Method – Defines an interface for creating an object, but lets subclasses decide which class to instantiate. The base class contains the factory method with default behaviour, and subclasses override it to produce specific objects.
  • Abstract Factory – Provides an interface for creating families of related or dependent objects without specifying their concrete classes. Useful when a system needs to support multiple product families with consistent compatibility.

For most data integration scenarios, the simple factory or factory method is sufficient, but when you have to handle multiple source types that also belong to different families (e.g., SQL vs. NoSQL with different retry mechanisms), the abstract factory becomes valuable.

Challenges in Data Integration Tooling

Before diving into the pattern, let’s outline the specific difficulties that data integration tools face:

  • Heterogeneous protocols: MySQL, PostgreSQL, SFTP, HTTP, MQTT, etc. each require their own driver and connection logic.
  • Authentication diversity: Basic auth, OAuth2, API keys, JWT, certificate‑based, or no authentication for local files.
  • Configuration complexity: Users provide host, port, database name, schema, timeout, TLS settings – which differ per source.
  • Lifecycle management: Connections must be opened, kept alive, pooled, or closed – often in a high‑throughput environment.
  • Scalability requirement: Adding a new source type should not force modifications to existing connection logic.

The Factory Pattern directly addresses the last point: it encapsulates the creation rule, so adding a new source means extending a factory (or adding a new factory method) without touching the client code that uses the connections.

Applying the Factory Pattern in Directus

Directus is an excellent real‑world example because it provides a flexible extension system. Developers can create custom data sources (called “Custom Data Sources” or “Hooks”) that integrate with third‑party services or legacy databases. Under the hood, Directus uses a service‑locator and factory approach to instantiate the correct driver for the requested collection. By applying the Factory Pattern, you can write clean, testable code for your own connectors.

Example 1: Simple Factory for Database Connections

Suppose you are building a Directus extension that needs to connect to MySQL, PostgreSQL, and a REST API depending on a configuration key. A simple factory could look like this in PHP (the language of Directus’s core):

class DataSourceFactory {
    public static function create(string $type, array $config): DataSourceInterface {
        return match ($type) {
            'mysql' => new MySQLConnection($config),
            'postgresql' => new PostgreSQLConnection($config),
            'rest_api' => new RestApiConnection($config),
            default => throw new \InvalidArgumentException("Unsupported source type: $type")
        };
    }
}

The client code – for example, a Directus endpoint that needs to fetch data – simply calls:

$source = DataSourceFactory::create($config['type'], $config);
$data = $source->query(...);

This removes all conditional logic from the consumer, centralises the mapping, and makes it trivial to add a new source: just add a new case and implement the interface.

Example 2: Implementing an Abstract Factory for Mixed Sources

When your integration tool must handle families of related objects (e.g., a writer as well as a reader for the same source, or a connection manager along with a schema introspector), the abstract factory is more appropriate. Consider a scenario where each data source needs both a Connection and a SchemaExplorer:

interface DataSourceFactory {
    public function createConnection(): ConnectionInterface;
    public function createSchemaExplorer(): SchemaExplorerInterface;
}

class MySQLFactory implements DataSourceFactory {
    public function createConnection(): MySQLConnection { ... }
    public function createSchemaExplorer(): MySQLSchemaExplorer { ... }
}

class RestApiFactory implements DataSourceFactory {
    public function createConnection(): RestApiConnection { ... }
    public function createSchemaExplorer(): RestApiSchemaExplorer { ... }
}

The client receives a DataSourceFactory object (injected or obtained from a registry) and can use both the connection and the schema explorer without ever knowing which concrete classes were instantiated. This pattern is especially useful when the components of one family must be compatible with each other (e.g., the MySQL schema explorer expects a MySQL connection).

Benefits and Trade‑offs

Benefits

  • Scalability: Adding a new data source type requires minimal code changes – just a new factory or a new case in the simple factory. The client code remains untouched.
  • Maintainability: The creation logic is isolated, making it easier to update connection parameters, logging, or error handling in one place.
  • Flexibility: You can swap implementations at runtime based on configuration, environment, or user preferences without altering the rest of the system.
  • Testability: Factories can be mocked or replaced in unit tests, enabling you to test client logic without depending on real database connections.
  • Consistency: All connection objects are created through the same interface, ensuring they follow the same lifecycle (e.g., all implement connect() and disconnect()).

Trade‑offs

  • Indirection: Adding a factory layer introduces extra classes, which may be overkill for very small projects with only two or three source types.
  • Complexity: When using abstract factories or factory methods with deep inheritance, the design can become harder to follow. Over‑engineering is a real risk.
  • Runtime overhead: In languages like PHP, each factory call incurs function call overhead, though it is negligible compared to the cost of actually opening a database connection.
  • Learning curve: Team members unfamiliar with design patterns may need time to adapt, and misuse (e.g., putting business logic inside the factory) can negate the benefits.

To decide whether the pattern is appropriate, consider the number of source types you need to support, the likelihood of future additions, and the complexity of the connection logic itself. For a Directus extension with three or four stable sources, a simple factory is sufficient; for a large‑scale integration platform with dozens of connectors, the abstract factory combined with dependency injection is the better choice.

Best Practices for Implementation

To get the most out of the Factory Pattern in data integration, follow these guidelines:

  • Define a clear interface for all connections. Every connection object should implement a common interface such as DataSourceInterface with methods like connect(), disconnect(), query(), and getSchema(). This ensures the factory can return any concrete class and the client code remains uniform.
  • Centralise configuration validation. The factory is the perfect place to validate that the required configuration parameters (e.g., host, port, credentials) are present before attempting to instantiate the connection object.
  • Use dependency injection for services. If your connection objects need external services (like a logger, a cache, or an HTTP client), inject them via the factory rather than creating them inside the factory. This keeps the factory focused on object creation.
  • Consider caching or pooling. The factory can be extended to return a connection from a pool if one already exists, or to reuse the same instance for the duration of a request. This is especially useful for REST API clients that can be reused.
  • Document the registry of source types. Maintain a clear mapping of source type identifiers to the factory method or class that handles them. In Directus, this could be a configuration file or a service provider that registers your factories.

Comparison with Other Design Patterns

The Factory Pattern is not the only way to manage multiple connections. Below is a brief comparison with other approaches:

  • Builder Pattern: Used when a connection object requires many optional parameters and a step‑by‑step construction process. For example, an HTTP client with custom headers, timeout, and proxy settings. The Builder is complementary – you can combine a factory that returns a builder for each source type.
  • Prototype Pattern: Cloning an existing connection object instead of creating a new one. This is rarely used for database connections because of statefulness (open connections can’t easily be cloned).
  • Service Locator: A global registry that returns pre‑configured objects. While simpler than a factory, it introduces hidden dependencies and makes testing harder. Directus’s internal service container is a form of service locator, but ideally used in combination with factory methods.
  • Strategy Pattern: Encapsulates interchangeable algorithms. You could use a strategy for query translation (e.g., MySQL vs. PostgreSQL SQL), while still relying on a factory to create the connection.

In many real‑world applications, these patterns are used together. For instance, a Factory Method can return a Builder object that constructs the final connection, or an Abstract Factory can be combined with a Strategy to support multiple transport layers.

Real‑World Use Case: Directus Custom Data Source

Let’s walk through a complete example in a Directus extension. Imagine you need to build a custom data source that can connect to either a MongoDB instance or a Salesforce API. Your extension will expose a collection that proxies the data from the chosen source.

  1. Define the interface:
    interface DataSource {
        public function connect(): void;
        public function fetchAll(string $collection): array;
        public function disconnect(): void;
    }
  2. Implement concrete classes:
    class MongoDBDataSource implements DataSource {
        // Uses MongoDB driver…
    }
    
    class SalesforceDataSource implements DataSource {
        // Uses REST/SOAP API…
    }
  3. Create the factory:
    class DataSourceFactory {
        public static function create(string $type, array $config): DataSource {
            return match ($type) {
                'mongodb' => new MongoDBDataSource($config),
                'salesforce' => new SalesforceDataSource($config['salesforce']),
                default => throw new UnsupportedSourceException($type)
            };
        }
    }
  4. Use in the Directus hook:
    $source = DataSourceFactory::create($request->get('type'), $settings);
    $data = $source->fetchAll('products');
    // Emit data to Directus response…
    

When a new source type is needed, you simply implement the interface and add a new case to the factory. You never need to change the hook code that consumes the data source. This is the core benefit of the Factory Pattern: it protects the higher‑level logic from the details of instantiation.

External Resources

To deepen your understanding, refer to these resources:

Conclusion

Managing multiple data source connections is a classic problem in data integration tooling, and the Factory Pattern provides a proven, scalable solution. By abstracting the creation of connection objects, you decouple the client code from the specifics of each data source, making your system easier to maintain and extend. Whether you use a simple factory, a factory method, or an abstract factory depends on the complexity and variability of your sources. In Directus, the pattern fits naturally into the extension architecture, allowing developers to add new connectors without disrupting existing functionality. Embrace the Factory Pattern to keep your integration layer clean, testable, and ready for the next data source that comes along.

Remember: the pattern is a tool, not a rule. Apply it where the complexity of object creation warrants the added indirection – and always favour simplicity over over‑engineering. With a solid factory design, your data integration tool can gracefully handle the ever‑growing ecosystem of data sources that modern applications demand.