chemical-and-materials-engineering
Fmea for Chemical Process Startup and Shutdown Procedures
Table of Contents
Introduction: Why Startup and Shutdown Demand Specialized Risk Analysis
Chemical process startups and shutdowns are among the highest-risk phases in plant operations. Unlike steady-state operation, these transient conditions involve rapid changes in temperature, pressure, composition, and equipment status. A single misstep — an unopened valve, an incorrect purge sequence, or a sensor reading delayed by thermal lag — can cascade into a catastrophic event. The US Chemical Safety Board has documented numerous incidents where startup or shutdown procedures were either incomplete or improperly executed, leading to fires, explosions, and toxic releases. Failure Mode and Effects Analysis (FMEA) provides a structured, team-based method to identify, evaluate, and mitigate these risks before they cause harm. When applied to procedural steps, FMEA transforms vague "be careful" instructions into targeted, data-driven safeguards.
Regulatory frameworks such as the OSHA Process Safety Management (PSM) standard (29 CFR 1910.119) explicitly require employers to maintain safe operating limits and written operating procedures for startups and shutdowns. FMEA not only supports compliance but also improves operational discipline by revealing hidden dependencies and single points of failure. By performing FMEA on these critical procedures, facilities can reduce unplanned downtime, protect personnel, and avoid environmental releases.
What Is FMEA and How Does It Apply to Procedures?
FMEA is a bottom-up, inductive risk assessment tool originally developed by the U.S. military and later adopted by industries from aerospace to automotive manufacturing. In the chemical sector, it is most commonly used for equipment design and process hazard analysis (PHA). However, its application to procedural steps is equally powerful. A procedure-focused FMEA treats each manual or automated action as a potential failure mode. The analysis asks: "If this step is performed incorrectly or omitted, what happens?" The answer leads to a structured evaluation of severity, occurrence, and detection — the three components of the Risk Priority Number (RPN).
Unlike a traditional Hazard and Operability Study (HAZOP), which typically examines deviations from design intent, a procedural FMEA zeroes in on human actions, sequence errors, and timing issues. This makes it ideal for startup and shutdown workflows, where operator decisions and adherence to written steps are paramount.
Core Elements of a Startup/Shutdown FMEA
- Team composition: The analysis should include operators, process engineers, maintenance personnel, and safety specialists. Operators bring practical knowledge of where procedures tend to deviate; engineers understand design margins and interlocks.
- Procedure decomposition: Break the overall procedure into discrete, logically grouped steps. A startup may be divided into pre-start checks, inert gas purging, feed introduction, heating, and pressurization. Each step becomes a row in the FMEA worksheet.
- Failure mode identification: For each step, list every conceivable way the step could fail. Examples: valve left closed, valve opened too fast, temperature held too long, sequence skipped.
- Effect analysis: Determine the local and system-wide consequences of each failure. Even a minor upset — such as a momentary flow surge — can damage catalysts or trip interlocks.
- Cause analysis: Identify root causes. These could be equipment malfunction (stuck valve), human error (misread label), or poor procedure design (confusing numbering).
- Current controls: Document existing safeguards: safety interlocks, alarms, double-check lists, supervisor reviews, automatic shutdown systems.
- Risk ranking: Assign Severity (1–10), Occurrence (1–10), and Detection (1–10) ratings. Multiply to get RPN. Focus improvement actions on items with highest RPN or with high severity regardless of RPN.
- Recommended actions: Propose specific improvements: add verification steps, install limit switches, simplify procedure language, provide additional training, add interlock bypass procedures.
Decomposing Startup and Shutdown: A Step-by-Step Framework
Because startup and shutdown involve many interdependent actions, the FMEA team must decide the appropriate level of granularity. Too coarse, and subtle failure modes are missed. Too fine, and the analysis becomes unwieldy. A practical middle ground is to use the five-phase model common in chemical batch processes:
- Pre-start preparation: Including utility verification, equipment inspection, and permit to work. Failure modes: missing lockout/tagout, incomplete vessel inspection, incorrect nitrogen purity.
- Initial energizing: Applying power, opening isolation valves, starting pumps. Failure modes: starting pump against a closed discharge, overcurrent from wrong breaker, reversing rotation.
- Process conditioning: Heating, cooling, pressurizing, or purging to reach operating conditions. Failure modes: thermal shock, under-purge leaving oxygen in flammable atmosphere, overpressure due to blocked relief path.
- Transition to steady state: Introducing feedstock, adjusting controls, bringing online. Failure modes: feed too quickly causing reactor runaway, controller integral windup, alarm flooding.
- Final stabilization: Verifying all parameters within limits, tuning loops, monitoring first holds. Failure modes: delay in recognizing non-normal conditions, failure to log key readings.
For shutdown, the reverse phases — reduction, isolation, depressurization, cleaning/ purging, and lockout — should each be analyzed similarly.
Common Failure Modes in Detail
The original article listed five generic failure modes. Below we expand each with deeper operational context, real-world impact, and suggested controls.
Incorrect Valve Operation
This is the most cited procedural failure in chemical incidents. It includes opening the wrong valve, opening it too quickly, leaving a valve partially open, or failing to open/close a critical block valve. Consequences range from product contamination to catastrophic pressure surges. A 2018 incident at a petrochemical plant involved an operator opening a manual drain valve instead of a bypass valve during startup, releasing a cloud of flammable hydrocarbon that ignited.
Controls: Color-coded valve tags, sequential numbering on procedures, lockout devices for critical valves, operator training with simulation, and in some cases smart valve position sensors that verify state before proceeding.
Failure to Purge Lines
Purging removes residual materials — flammable gases, toxic substances, or incompatible reagents — that can react violently when process conditions change. A common failure is omitting the purge step entirely because it was not clearly listed in the procedure. Another is using an insufficient purge volume or incorrect purge medium (e.g., using nitrogen when an inert gas is required for oxygen-sensitive chemistry).
Controls: Written purge criteria (time, flow rate, oxygen concentration target), visual verification using portable gas detectors, and interlocks that prevent pressurization until purge is confirmed.
Sensor Malfunctions During Transition
Startup and shutdown place sensors under stress: temperature swings, rapid pressure changes, vibration from pumps starting, or steam hammer. A pressure transmitter that was calibrated for steady-state operation may drift or fail entirely during a rapid depressurization. Similarly, thermocouples can experience thermal lag, giving false low readings that cause operators to overshoot temperature.
Controls: Require functional check of all critical sensors before startup, include high- and low-limit alarms set tight enough to detect abnormal rates, use redundancy (e.g., 2oo3 voting on critical pressure indicators).
Inadequate Safety Checks
Pre-startup safety reviews (PSSR) are mandatory in many jurisdictions, but the procedure itself may omit required checks. For example, failing to verify that relief valves are aligned, that interlocks are bypassed only with written authorization, or that fire protection systems are in service. During a shutdown, inadequate checks can leave equipment energized or hazardous materials in place.
Controls: Incorporate mandatory checklists tied to a shift log or computerized maintenance management system (CMMS). Use a "stop sign" step in the procedure that requires a supervisor signature or barcode scan before proceeding.
Delayed Equipment Shutdown
During a controlled shutdown, timing is critical. If a pump is left running after its suction valve closes, cavitation can destroy the impeller. If a stack of trays is not drained before shutdown, thermal expansion can warp internals. Delays also occur when operators must manually trip equipment in sequence but are distracted or under time pressure.
Controls: Sequence interlocks that automatically shut down equipment based on logical conditions (e.g., "if vessel pressure drops below X, close feed valve"). Also, clear time limits written into the procedure with alarm reminders.
Advanced FMEA Techniques for Complex Procedures
For large-scale facilities with dozens of procedures, a traditional spreadsheet FMEA can become cumbersome. Several enhancements improve efficiency and depth:
- Layer of Protection Analysis (LOPA) integration: After the FMEA identifies a failure mode with high severity, LOPA can quantify the likelihood and the risk reduction provided by each protective layer. This allows the team to decide if additional independent layers are needed.
- Time-sequence FMEA: Many startup failures occur precisely because a step is performed out of sequence. A timestamped FMEA maps each step to a timeline and checks for conflicts — e.g., a pre-heat step that must finish before a feed step can begin. Gaps in the timeline represent opportunities for mis-sequence.
- Human factors integration: Include a separate column for human error probability based on tasks (reading, selecting, verifying). Use techniques like SHERPA (Systematic Human Error Reduction and Prediction Approach) to identify error-likely situations such as similar-looking valves, heavy workload, or poor lighting.
Incorporating FMEA into the Procedure Management Lifecycle
FMEA should not be a one-time event. The procedure lifecycle includes:
- Initial procedure drafting: The process engineer writes the startup/shutdown procedure based on design documents. A draft FMEA is performed simultaneously to validate steps and identify gaps.
- Pre-implementation review: The FMEA team (including operators) reviews the procedure during a tabletop exercise. They walk through every step, possibly with a simulator, and update the FMEA.
- Field validation: During a dry run or actual startup (under extra supervision), a team member observes compliance and notes any deviations. Those observations feed back into the FMEA.
- Periodic revalidation: At least every five years (or after a significant incident, change in equipment, or personnel turnover), the FMEA is revisited. New failure modes may emerge from equipment aging, new chemicals, or updated codes.
Digital procedure management platforms (e.g., from Directus) can host the FMEA alongside the procedure text, linking each failure mode to the specific step number. This allows operators to access the risk analysis directly from the procedure screen, reinforcing why each step exists.
Case Study: FMEA Preventing a Startup Explosion
Consider a batch polymerization unit that had experienced near-misses during reactor heating. The FMEA team performed a step-by-step analysis of the startup procedure. During the step "Heat reactor to 80°C at 2°C/min," the team identified a failure mode: "Heating rate exceeded due to operator selecting manual heat instead of ramping." The effect: monomer may explode at local hot spots above 100°C. The existing control was a high-temperature alarm at 95°C — but this was only a local alarm in the control room. The team recommended adding a heating rate interlock that would automatically cut power if the rate exceeded 2.5°C/min for 30 seconds. They also required a second operator verification of the setpoint. After implementation, a subsequent startup encountered a surge in steam pressure, but the rate interlock tripped, preventing an incident. The FMEA had turned an unmeasured risk into a protected one.
Benefits Beyond Safety: Efficiency and Reliability
While the primary driver for FMEA is safety, the analysis also reveals opportunities to shorten startup time, reduce waste, and improve product quality. For instance, an FMEA may show that a particular purge step takes 30 minutes but has no safety relevance — the residual gas concentration is already below the lower flammable limit after 10 minutes. The step can be optimized without increasing risk. Similarly, by identifying the most frequent procedural errors, training can be targeted to those specific tasks, reducing costly rework and off-spec product.
Documenting the FMEA also creates a knowledge base that survives personnel turnover. New operators can review the history of why certain steps exist, reducing the temptation to bypass steps for convenience.
Best Practices for Conducting Startup/Shutdown FMEA
- Use a trained facilitator who is not part of the direct operating team to avoid groupthink.
- Include at least two experienced operators — one current and one recently retired if possible — to capture "tribal knowledge" about past failures.
- Set a clear scope boundary: define which specific startup or shutdown scenario (e.g., normal startup after turnaround vs. emergency restart after power loss) is being analyzed.
- Document all assumptions, such as "nitrogen supply pressure is stable at 6 bar" or "all relief valves were tested within the last year."
- Do not ignore minor failure modes. A small steam leak during startup may not cause an injury, but it can obscure a critical gauge reading and lead to a larger failure later.
- Update the RPN after actions are taken. A recommended action that reduces Occurrence from 5 to 2 and increases Detection from 4 to 8 should be reflected in a new, lower RPN.
- Share findings across the organization. A failure mode found in one unit may be present in identical units elsewhere.
Conclusion: Embedding FMEA into Operational Discipline
Failure Mode and Effects Analysis is not a form-filling exercise; it is a proactive tool that reshapes how organizations think about procedural risk. By systematically examining each step of a startup or shutdown, chemical facilities can move from a culture of incident response to one of incident prevention. The investment in time — typically several hours per procedure — pays dividends through fewer process upsets, lower insurance costs, and stronger compliance with regulations such as OSHA PSM and the EPA's Risk Management Program.
As operations become increasingly automated, FMEA must also evolve. Smart sensors, digital twins, and procedure execution systems (such as those enabled by Directus' flexible data modeling) allow real-time tracking of procedural compliance and automated risk scoring. However, the human element remains central: the FMEA process forces teams to ask "what if" and to answer with concrete, verifiable safeguards. For any chemical plant aiming for high reliability, periodic FMEA of startup and shutdown procedures is not optional — it is a core operational discipline.
For further reading on FMEA methodology, consult the Center for Chemical Process Safety (CCPS) guidelines. The standard OSHA 29 CFR 1910.119 includes requirements for operating procedures and process hazard analysis that FMEA helps satisfy. Free FMEA templates are available from the American Society for Quality (ASQ).