chemical-and-materials-engineering
How to Use Data Modeling to Facilitate Engineering Data Integration from Multiple Sources
Table of Contents
From Sensors to CAD: Using Data Modeling to Unify Engineering Data in Directus
Modern engineering organizations operate in a data-rich but fragmented landscape. Sensor streams from IoT devices, parametric CAD models, enterprise resource planning (ERP) systems, and laboratory test databases each produce data in different formats, at different cadences, and with different semantic meanings. Integrating these sources into a single, queryable whole is the foundation for predictive maintenance, digital twins, and closed-loop design improvement. Data modeling provides the blueprint for that integration, and with a flexible platform like Directus, engineers can translate that blueprint into a working, API-driven data hub without heavy custom coding.
This guide explains how to apply data modeling specifically to engineering data integration, using Directus as the central data layer. We will cover the types of models you need, a step-by-step implementation workflow, and practical examples that move beyond theory into production-ready patterns.
What Is Data Modeling (and Why It Matters for Engineering Data)
Data modeling is the process of defining a schema that describes the structure, relationships, constraints, and semantics of the data your organization relies on. It answers questions like: How is a wind turbine sensor reading related to the turbine's serial number? Which attributes of a CAD assembly must be present before a purchase order can be generated? Without a model, integration becomes point-to-point spaghetti — one Python script for ERP, another for SCADA, and no single source of truth.
Three levels of abstraction are standard in engineering data modeling:
Conceptual Data Model
At this high level you identify the key business entities (e.g., “Asset”, “Measurement”, “Maintenance Log”, “Component”) and their core relationships — but you don’t detail attributes or keys. An engineering manager and a data architect can discuss whether a “Measurement” is linked to one “Asset” or to an “Asset” and a “Sensor” separately. This model is often drawn as an entity-relationship diagram (ERD) using simple boxes and lines.
Logical Data Model
Here you specify every attribute, data type, and relationship. For example, the logical model for “Measurement” would include a timestamp (DATETIME), a value (FLOAT), a unit (TEXT), and a foreign key to “Sensor”. Constraints such as “value cannot be negative” or “timestamp must be in UTC” are written in this layer. The logical model is independent of any particular database engine.
Physical Data Model
Finally, the physical model maps the logical definitions to actual database objects: tables, columns, indexes, partitions. In Directus this translates to Collections (tables), Fields (columns), and Relationships (foreign keys). The physical model also considers performance — for example, adding a composite index on (sensor_id, timestamp) to speed up time‑series queries.
The power of Directus is that it collapses the gap between logical and physical modeling: you can define a logical model directly in the app’s Data Studio, and Directus automatically builds the physical database schema (PostgreSQL, MySQL, SQLite, etc.). This speeds up iterations during the integration design phase.
Benefits of Data Modeling in Engineering Integration
When you model before you integrate, you gain concrete advantages that eliminate the most common pain points in multi‑source engineering projects.
Semantic Consistency Across Disciplines
Mechanical engineers might call a part a “Bracket”, while procurement calls it “Inventory Item #447”. A logical model defines aliases, permissible values, and a canonical name so that every system speaks the same language. Directus supports field-level validation rules and dropdowns from related collections to enforce this consistency.
Data Quality at the Point of Entry
By modeling constraints — such as required fields, unique keys, or range checks — you stop bad data before it enters the integrated system. For example, a sensor telemetry endpoint can reject a reading without a valid equipment serial number before it is stored. Directus provides role-based permissions and field validation rules that can be shared across all incoming data pipelines.
Simplified Change Management
Engineering environments are not static. New sensor types are added, products are updated, and regulations shift. A well‑modeled schema isolates changes to a limited area. Adding a new attribute (“ambient temperature”) to the “Measurement” collection does not break existing dashboards or APIs — as long as the model is versioned. Directus stores a complete schema history and allows you to preview changes before publishing.
Automated Data Mapping and ETL
When you have a clear logical model, mapping source fields to target fields becomes a mechanical task that can often be automated with ETL tools or Directus Flows. For instance, a CSV from an ERP system can be mapped to the “Part” collection using field‑by‑field rules, and repeated mismatches (e.g., date format inconsistencies) are caught during transformation.
Step‑by‑Step: Build an Engineering Integration Model in Directus
Let’s walk through a concrete scenario: integrating real‑time vibration data from three wind turbine sensors with the turbine’s CAD model metadata and maintenance history. Each source has its own schema — the sensor API returns JSON like {"turbine_id": "T-07", "rms_velocity": 4.2, "timestamp": "2025-03-20T14:30:00Z"}, while the CAD system exports an XML file with nested component structures.
1. Identify and Document Data Sources
List every system that will feed or consume the integrated dataset. For our scenario:
- Sensor API – returns JSON payloads every 5 minutes for each turbine.
- PLM (Product Lifecycle Management) – exports XML BOM (bill of materials) and CAD geometry metadata.
- CMMS (Computerized Maintenance Management System) – provides work orders and repair logs as a SQL database.
Document the fields each source sends, the data types, and the update frequency. This becomes the input for your conceptual model.
2. Design a Conceptual Model
Define the core entities and their relationships without worrying about specific fields yet. For wind turbine integration:
- TurbineAsset – the physical turbine unit (serial number, location, model).
- Component – a sub‑part (blade, gearbox, generator) linked to a TurbineAsset.
- VibrationMeasurement – a time‑series reading from a sensor, linked to a Component.
- MaintenanceEvent – a repair or inspection, linked to a TurbineAsset and optionally to a Component.
Draw these boxes and lines on a whiteboard or in a tool like Lucidchart. Show that a TurbineAsset has many Components, and a Component can have many VibrationMeasurements. Share this diagram with domain experts — they will spot missing entities (e.g., “Sensor” itself as an asset).
3. Create the Logical Model in Directus
Open the Directus Data Studio and create a collection for each entity. For VibrationMeasurement:
- timestamp (DateTime field, required)
- rms_velocity (Float field, required, with a validation rule: value > 0)
- component_id (Many‑to‑One relationship to the Component collection)
- source_sensor (Text field, but consider a many‑to‑one to a Sensor collection if you need to track sensor metadata)
For Component:
- name (String field)
- part_number (String field, unique)
- turbine_id (Many‑to‑One to TurbineAsset)
Directus automatically creates the many‑to‑one foreign key and generates a REST/GraphQL API endpoint for each collection. At this stage you are building the logical model directly on top of the underlying database (PostgreSQL, for example).
4. Build the Physical Model (Performance Optimizations)
Now add indexes and field settings that affect query performance. In Directus, you can set a field as the “primary key” (auto‑increment integer or UUID) and add custom indexes through the database interface or by running raw SQL in the Directus context. For a time‑series table like VibrationMeasurement:
- Add a composite index on (component_id, timestamp) — this speeds up the most common query: “fetch all readings for gearbox #3 in the last 24 hours.”
- Consider partitioning the table by date if you expect millions of rows. Directus does not manage partitioning natively, but you can set it up in the underlying database and Directus will still work against each partition.
The physical model also includes data retention rules. You can use Directus Flows or a scheduled script to purge readings older than 90 days, or archive them to a cheaper storage tier while keeping the model intact.
5. Integrate the Sources into Directus
There are several ways to load data from external systems into the Directus collections you’ve defined:
- Directus Flow — a no‑code automation that can call an external API, transform JSON, and write to collections. A Webhook trigger can listen for sensor POST requests and map the payload fields.
- Directus SDK — write a Node.js or Python script that authenticates to the Directus API and inserts records. For the PLM XML import, a Python script can parse the XML and call
POST /items/Component. - ETL tool — Connect a tool like n8n or Talend to Directus using its REST API. This is useful when you need complex transformations or error handling.
- Direct Database Sync — If the CMMS runs on a SQL Server database, you can create a Directus “collection” that is actually a database view mirroring the remote table (using PostgreSQL Foreign Data Wrappers or MySQL Federated Engine). This avoids copying data and keeps the integration real‑time.
During the integration phase, log every mapping failure and review the Directus Activity Feed to understand why a record was rejected (missing required field, type mismatch, etc.). This feedback loop will help you refine the model.
6. Validate and Evolve the Model
After the data flows, check that queries return correct results. For example, run Directus’s built‑in filter to find all “VibrationMeasurement” records where rms_velocity > 5.0 and join them with the Component and TurbineAsset collections. Do the results make engineering sense? If not, adjust the logical model — perhaps a “Measurement” should be linked to both “Component” and “Sensor” to distinguish data provenance.
Over time, you will add new sources (e.g., oil analysis results) or deprecate old ones. In Directus you can add new fields to existing collections or create new collections without affecting existing APIs — just regenerate the SDK or document the changes in an OpenAPI spec.
Tools and Techniques for Engineering Data Modeling
While Directus is the execution environment, the data modeling process benefits from specialized tools. Use the combination that fits your team’s workflow.
Schema Design and Documentation
- dbdiagram.io — export your logical model as a DSL and then manually translate it to Directus collections. Good for version‑controlling the model in a Git repo.
- Lucidchart or Draw.io — create conceptual ERDs and share them with non‑technical stakeholders before starting the Directus build.
- Directus Data Studio itself can serve as a living documentation tool. Enable the “Display Template” feature to show linked records in a human‑readable format (e.g., “Turbine T-07 - Gearbox”).
ETL and Data Pipelines
- Directus Flows — built‑in automation that can transform and load data without additional infrastructure. Supports webhooks, schedule triggers, and a library of transformation operations (JSONata, math, string operations).
- Apache NiFi — a powerful flow‑based programming tool for handling complex integrations with retry logic and provenance tracking. Directus’s REST API makes NiFi an excellent orchestrator.
- Custom scripts (Python, Node.js) — highly flexible for tasks like parsing CAD STEP files or communicating with industrial protocols (OPC UA, MQTT). Use the Directus SDK to write records.
Data Governance and Metadata
Consider treating the Directus schema itself as a governed asset. Use Directus’s “Comment” and “Note” fields on each collection to store business definitions, responsible owner, and retention policy. For larger organizations, an external data catalog like Alation or DataHub can be used to index the Directus schema and track lineage.
Best Practices and Common Pitfalls
Through experience with engineering integration projects, several patterns repeatedly emerge. Adopting these will save significant rework.
Best Practices
- Start with a conceptual model, not with fields. Confirm with domain experts that the entities and relationships are correct before diving into attribute details.
- Use UUIDs as primary keys for collections that will be merged or moved. Auto‑increment integers are fragile when you later integrate a second turbine farm that already has its own ID sequence.
- Leverage Directus Revisions. Enable revisions on collections where the data history matters — for example, tracking changes to a turbine’s configuration over time. This is effectively an audit trail built into the model.
- Model time‑series data explicitly. Do not embed a JSON array of readings inside the Component collection. Create a separate measurement collection with a foreign key and a timestamp. This makes querying and indexing efficient.
- Design for read‑heavy and write‑heavy patterns separately. Engineering dashboards often query the last 24 hours of sensor data, while the ingestion process writes thousands of points per minute. For large volumes, consider using Directus’s “Database” mode to bypass the application layer and insert directly into the underlying table with well‑optimized SQL.
Common Pitfalls
- Over‑normalization. Splitting every possible attribute into a separate collection can make queries slow and complex. For example, storing “MeasurementUnit” as a separate collection with a single field “unit_name” is usually overkill — a text field with validation rules suffices.
- Ignoring schema evolution. When a new sensor model sends a “peak_acceleration” field that you haven’t modeled, the data may be rejected or lost. Design your ingestion pipeline to either accept unknown fields (and store them in a catch‑all JSON field) or trigger an alert when the schema changes.
- Failing to name fields consistently. Mixing camelCase (
rmsVelocity) with snake_case (rms_velocity) across different collections leads to confusion. Define a naming convention at the start of the project. - Without a staging area. Raw data from sensors often includes duplicates or mislabeled timestamps. Insert it first into a “staging” collection (without many constraints), run cleanup and dedup logic, then move the cleaned data into the production collections. Directus Flows can orchestrate this two‑step pattern.
Realizing the Integrated Engineering Data Platform
Data modeling is not a one‑time design exercise — it is an ongoing discipline that adapts as your engineering environment changes. By using Directus as the central data platform, you gain the ability to iterate on the model without downtime, expose the integrated data via consistent REST and GraphQL APIs, and empower your engineering teams to build dashboards, digital twins, and machine learning models on top of a trusted data foundation.
The example of wind turbine integration demonstrates the universal pattern: identify entities, define relationships, implement in Directus collections, connect sources, and validate. As you repeat this process for other engineering domains — automotive, aerospace, industrial automation — the model becomes a reusable asset that reduces integration time from months to days.
Start by documenting the ten most important entities in your current integration project. Map them in a conceptual model, then create those collections in Directus. The API will be ready in minutes, and your data will finally speak the same language.