Introduction

Process capability studies are a cornerstone of statistical quality control, providing a quantitative measure of how well a manufacturing or service process can meet its specifications. While traditional univariate studies examine one quality characteristic at a time, many real-world processes involve multiple interrelated variables that must be considered together. Conducting a process capability study in a multivariate context is essential when characteristics are correlated, because evaluating each variable separately can lead to misleading conclusions about overall process performance. This article provides a comprehensive, step-by-step guide to performing an effective multivariate process capability study, from foundational concepts through advanced analysis techniques and practical interpretation.

Understanding Multivariate Process Capability

Multivariate process capability extends the principles of univariate capability analysis to situations where two or more quality characteristics are simultaneously important and often correlated. For example, in a chemical manufacturing process, both the concentration and the temperature of a product may need to stay within specific limits, and these variables frequently interact. In such cases, assessing capability using only individual Cp or Cpk indices for each variable can miss the joint behavior that defines product quality.

The key difference between univariate and multivariate capability is the treatment of correlation. A process might appear capable for each variable individually, but the combination of variables could produce a high proportion of nonconforming units that fall outside a multivariate specification region. Multivariate capability indices such as the multivariate Cp (CpMV) and Cpk (CpkMV) account for the joint distribution of the variables, often using the concept of a multivariate normal distribution and a specification region defined by an ellipsoid or a rectangular box in the variable space.

Another important concept is the use of principal component analysis (PCA) to reduce dimensionality while preserving the essential structure of the data. By transforming the original correlated variables into uncorrelated principal components, analysts can compute capability indices on those components or directly in the original space using Hotelling’s T² statistic. The choice of method depends on the nature of the specifications and the underlying assumptions of normality and linearity.

Steps to Conduct a Multivariate Process Capability Study

1. Define the Process and Variables

Begin by clearly identifying the process boundaries and the key quality characteristics that must be controlled. Engage with process engineers, operators, and quality managers to select variables that are both critical to customer requirements and likely to exhibit correlation. Document the specification limits for each variable, noting whether they are bilateral (upper and lower) or unilateral. When specifications are defined as a region (e.g., a rectangle or ellipse in two dimensions), record the exact boundaries.

It is also important to determine whether the process is stable and in a state of statistical control before attempting capability analysis. Use control charts (such as Hotelling’s T² control chart) to verify that the process does not exhibit special cause variation during the data collection period.

2. Collect Representative Data

Data collection for multivariate capability requires careful planning to ensure that the sample adequately represents the process variation under normal operating conditions. A common rule of thumb is to collect at least 100 to 150 observations when the number of variables is small (e.g., 2–5), but larger samples become necessary as dimensionality increases. The sample should span a period long enough to capture common cause variation, such as shifts in raw materials, environmental changes, or tool wear.

Record data in time order to enable checks for autocorrelation and trends. If the process is batch-oriented, consider sampling across batches and within batches to capture both short-term and long-term variation. Document any process adjustments or maintenance events that occur during sampling, as they can help explain anomalies later.

3. Assess Data Quality and Assumptions

Before building any multivariate model, inspect the data for outliers, missing values, and departures from normality. For multivariate normality, you can use the Mardia test or a quantile–quantile plot of the squared Mahalanobis distances. If the data significantly deviate from normality, consider transformations (e.g., Box-Cox) or nonparametric approaches.

Check for multicollinearity using correlation matrices and variance inflation factors (VIFs). High multicollinearity can destabilize capability index calculations and may require dimensionality reduction via PCA or variable subset selection. Also verify that the process is stable by plotting the data in time order and using multivariate control charts.

4. Analyze Variable Relationships

Understanding the correlation structure is central to multivariate capability. Compute the sample correlation matrix and visualize it using scatterplot matrices or heatmaps. If the variables are highly correlated, consider principal component analysis (PCA) to transform them into uncorrelated components. PCA not only simplifies the analysis but also reveals which directions in the variable space contribute most to overall variation.

Another useful technique is factor analysis, which can identify latent factors driving the correlations. However, for most capability studies, PCA is preferred because it directly relates to variance accounted for and can be used to compute capability indices on the component scores.

5. Model the Process

Several modeling approaches exist for multivariate capability analysis:

  • Hotelling’s T² statistic: For a process that is multivariate normally distributed, the T² statistic measures the distance from the sample mean to the target. You can compute the proportion of data points that fall within a specified T² control limit, which serves as a capability measure.
  • Principal component analysis (PCA): After PCA, you can compute univariate capability indices on each principal component, because the components are independent. However, the original specification limits must be transformed to the component space, which can be complex if the limits are not elliptical.
  • Multivariate capability indices (CpMV and CpkMV): These indices are extensions of Cp and Cpk that use the concept of a multivariate tolerance region (often an ellipsoid) compared to the process variation ellipsoid defined by the covariance matrix. Several formulations exist; the most common is based on the ratio of the volumes of the specification ellipsoid and the process ellipsoid.

When specifications form a rectangular region (the most common case in practice), a common approach is to use the proportion of nonconforming units estimated from the multivariate normal distribution. This proportion can be converted into a capability index analogous to Cpk using an inverse normal transformation.

6. Calculate Capability Indices

Let’s examine the calculation of CpMV more closely. Suppose we have two variables (X1, X2) with a bivariate normal distribution. The specification region is defined by lower and upper specification limits (LSL1, USL1, LSL2, USL2). The process variation is described by the covariance matrix Σ. The CpMV index is computed as:

CpMV = (Volume of specification region) / (Volume of process variation region)

The process variation region is defined as the smallest ellipsoid that contains a certain proportion (usually 99.73%) of the process output, corresponding to a 3-sigma sphere in the multivariate space. The volume of a p-dimensional ellipsoid is proportional to the square root of the determinant of the covariance matrix. Therefore, CpMV can be expressed as:

CpMV = (∏(USLi - LSLi) / ( (χ²p, 0.9973)p/2 × √|Σ| ) )

where χ²p, 0.9973 is the chi-square critical value with p degrees of freedom for 99.73% coverage.

For CpkMV, which accounts for process centering, a common index is the one proposed by Chan, Cheng, and Spiring (1988). It uses the ratio of the distance from the process mean to the nearest specification limit along the multivariate direction to the process variation in that direction. Alternatively, you can use the proportion of nonconforming units approach: convert the estimated fraction nonconforming to a Z-value and then express CpkMV = Zmin / 3, where Zmin is the smallest one-sided Z-value.

7. Interpret Results and Identify Improvement Areas

Interpretation of multivariate capability indices requires care. A CpMV value greater than 1 indicates that the process variation occupies a smaller region than the specification region, but it does not guarantee that the process is centered. CpkMV should be used to assess centering. When CpkMV is significantly lower than CpMV, the process mean needs adjustment.

Examine which variables contribute most to the lack of capability. PCA loadings can indicate which original variables drive the principal components with low capability. For example, if the first principal component (which explains the most variation) shows a low capability index, the variables with high loadings on that component are prime candidates for process improvement.

Plot the data and the specification region (e.g., overlay a 99.73% confidence ellipse on a scatter plot) to visually assess how the process compares to specifications. Look for clusters of points near one specification limit or outside the region.

Tools and Techniques

Principal Component Analysis (PCA)

PCA transforms a set of correlated variables into a smaller set of uncorrelated principal components that capture the maximum variance. It is especially useful when the number of variables is large (e.g., 10 or more) or when multicollinearity exists. After PCA, you can compute univariate capability indices on the first few components that account for most of the variation. However, interpreting these indices in terms of original specifications requires back-transformation.

Multivariate Control Charts

Before performing capability analysis, use multivariate control charts such as Hotelling’s T² chart (for phase I analysis) and MEWMA or MCUSUM charts (for phase II monitoring). A stable process is a prerequisite for valid capability estimates. The T² chart shows overall distance from the target; a point above the control limit suggests a multivariate outlier or a process shift.

Multivariate Capability Indices

Besides CpMV and CpkMV, other indices exist, such as the multivariate capability index proposed by Wierda (1992) based on the proportion of nonconforming units. Software packages like Minitab, JMP, and R (using the qualityTools or mvtnorm packages) offer functions to compute these indices. Excel add-ins are also available but often limited.

Interpreting Multivariate Capability Results

When interpreting the output of a multivariate capability study, consider the following guidelines:

  • CpMV > 1.33: The process variation is well within the specification region, indicating excellent potential capability.
  • 1.00 ≤ CpMV ≤ 1.33: The process is capable but may require monitoring to prevent deterioration.
  • CpMV < 1.00: The process variation exceeds the specification region; immediate improvement is needed.
  • CpkMV vs. CpMV: A large gap indicates the process is off-center. Focus on adjusting the mean toward the target.
  • Univariate vs. multivariate indices: If univariate Cp/Cpk values are all high but multivariable indices are low, correlation among variables is the likely culprit. Reducing correlation through design changes or better control of inputs may help.

Keep in mind that multivariate capability indices are sensitive to the assumption of multivariate normality. If the data are not normal, consider using nonparametric methods that estimate the proportion of nonconforming units directly from the empirical distribution, such as the multivariate approach described by Shore (2005).

Best Practices and Common Pitfalls

Best Practices

  • Collect sufficient data: Aim for at least 100 observations for low-dimensional cases, more for higher dimensions. Use power analysis to determine sample size for required precision.
  • Verify assumptions: Always test for multivariate normality and correlation structure before computing indices. Use transformations if necessary.
  • Use appropriate software: Leverage statistical packages that support multivariate capability analysis, such as Minitab, JMP, R, or Python with libraries like scipy.stats and statsmodels. The NIST/SEMATECH e-Handbook of Statistical Methods provides detailed guidance and examples. Visit the NIST e-Handbook.
  • Collaborate with domain experts: Work with process engineers and statisticians to ensure that the chosen variables and specifications are meaningful and that the analysis aligns with practical constraints.
  • Validate the model: Use hold-out data or cross-validation to confirm that the capability estimates are reliable and stable over time.

Common Pitfalls

  • Ignoring correlation: This is the most frequent mistake. Univariate capability may look fine, but correlated variables can cause a high percentage of nonconforming units. Always perform a multivariate analysis when variables are known or suspected to be correlated.
  • Using insufficient data: Small samples lead to large sampling errors in covariance matrix estimates, resulting in unreliable capability indices.
  • Overlooking data quality: Outliers and missing data can severely distort multivariate indices. Perform rigorous data cleaning.
  • Misinterpreting indices: CpMV does not account for centering; always use CpkMV in conjunction. Also, remember that many indices rely on the specification region shape; rectangular specifications may require a different approach than elliptical ones.
  • Applying multivariate methods to unstable processes: Capability analysis is meaningless if the process is not in statistical control. Control the process first, then assess capability.

Conclusion

Conducting a process capability study in a multivariate context is more complex than its univariate counterpart, but it provides a far more accurate picture of process performance when variables interact. By defining variables carefully, collecting representative data, verifying assumptions, using appropriate statistical tools like PCA and Hotelling’s T², and computing robust multivariate capability indices, organizations can identify the true state of process capability and target improvements effectively. Multivariate capability analysis is a powerful tool for achieving high-quality output and reducing waste in modern manufacturing and service processes. For further study, the American Society for Quality (ASQ) offers courses and the Quality Resources on Process Capability. Additionally, the book “Statistical Quality Control” by Montgomery provides in-depth treatment of multivariate methods.