Using Machine Learning Algorithms to Improve Bearing Capacity Estimations

Estimating the bearing capacity of soil is a critical aspect of geotechnical engineering. Accurate estimations ensure the safety and stability of foundations for buildings, bridges, and other structures. Traditionally, engineers rely on empirical formulas and laboratory tests, which can be time-consuming and sometimes imprecise due to the inherent variability of soil properties. Over the past decade, machine learning (ML) algorithms have emerged as powerful tools to enhance the accuracy and efficiency of bearing capacity estimations, enabling engineers to leverage large datasets and uncover complex patterns that traditional methods might overlook.

The Role of Machine Learning in Geotechnical Engineering

Machine learning algorithms are particularly well-suited for geotechnical problems because they can model nonlinear relationships between soil parameters and bearing capacity without requiring a predefined mathematical form. This flexibility allows engineers to incorporate a wide range of input variables—such as soil type, density, moisture content, shear strength, and compressibility—into a single predictive model. By training on historical field and laboratory data, ML models learn to generalize across different soil conditions, often achieving higher prediction accuracy than conventional empirical equations like Terzaghi’s bearing capacity formula or Meyerhof’s method. Moreover, ML techniques can be updated continuously as new data become available, making them adaptive to site-specific conditions.

Research indicates that ML-based bearing capacity predictions can reduce the margin of error by 20–40% compared to traditional approaches (a typical comparative study published in Computers and Geotechnics). This improvement is especially valuable for large-scale infrastructure projects where even small errors in foundation design can lead to significant cost overruns or safety risks.

Types of Machine Learning Algorithms Used

Several families of ML algorithms have been applied to bearing capacity estimation. Each offers distinct strengths depending on the data volume, feature complexity, and interpretability needs.

Regression Algorithms: Linear Regression and Support Vector Regression (SVR) are common starting points. SVR uses kernel functions to map input data into higher-dimensional space, capturing nonlinear dependencies while maintaining computational efficiency. These models output continuous bearing capacity values and provide insight into feature importance.
Decision Trees and Random Forests: Decision trees partition the data based on feature thresholds, creating interpretable rules. Random Forests aggregate many trees to reduce overfitting and improve generalization. They excel at handling categorical soil classifications and missing data, making them practical for field applications where not all parameters are measured.
Neural Networks: Deep learning models, from simple feedforward networks to more complex architectures like convolutional neural networks (CNNs), can learn highly nonlinear representations. They are particularly effective when large datasets are available, though they require careful hyperparameter tuning and are less interpretable than tree-based methods.
Gradient Boosting Machines (XGBoost, LightGBM): These ensemble methods have become popular due to their high accuracy and built-in regularization. They often outperform other algorithms on tabular geotechnical data and can handle mixed data types.

Choosing the appropriate algorithm depends on the dataset size, noise level, and the engineer’s tolerance for model complexity. In practice, a combination of algorithms through ensemble learning often yields the best results.

Advantages of Using Machine Learning

Improved accuracy: ML models consistently achieve lower prediction errors compared to empirical formulas, especially when site-specific data are incorporated.
Efficient processing of large datasets: With the growth of digital geotechnical databases, ML can analyze thousands of borehole logs and laboratory tests in minutes—a task that would take weeks manually.
Adaptability to different soil types: Models trained on diverse global datasets can be fine-tuned for local conditions, reducing the need for extensive new testing.
Potential for real-time estimations: Once deployed, ML models provide instant predictions during site investigations, allowing rapid decision-making.

However, these advantages come with responsibilities. Overfitting remains a concern if models are trained on small or biased samples. Rigorous validation using unseen data is essential to ensure reliability in practice. Additionally, engineers must understand the limitations of any black-box model and use domain knowledge to interpret outputs critically.

Data Preparation and Model Training

The success of ML bearing capacity models hinges on the quality and relevance of the training data. Key steps include:

Data collection: Gather soil parameters from standard penetration tests (SPT), cone penetration tests (CPT), laboratory triaxial tests, and in-situ measurements. Target variables (bearing capacity) are typically derived from plate load tests or well‑established calculations.
Feature engineering: Combine raw measurements into meaningful predictors—for example, normalized blow counts, relative density, or effective stress ratios. Domain expertise is critical here; including irrelevant features can degrade performance.
Splitting and validation: Partition data into training (70–80%), validation (10–15%), and test sets (10–15%). Use k‑fold cross‑validation to assess model stability. Avoid data leakage by separating samples from the same site into only one fold.
Model selection and hyperparameter tuning: Use grid search or Bayesian optimization to find optimal hyperparameters (e.g., number of trees, learning rate, network depth). Monitor training and validation loss to detect overfitting.
Uncertainty quantification: ML models are deterministic; adding dropout layers in neural networks or using quantile regression forests can provide prediction intervals, helping engineers assess risk.

An example workflow using Python’s scikit‑learn and XGBoost is described in the Geotechnical Machine Learning Guide published by the International Society for Soil Mechanics and Geotechnical Engineering. Following such guidelines ensures reproducibility and trust in the results.

Case Studies and Applications

Several research groups have demonstrated the practical utility of ML for bearing capacity estimation:

Offshore wind farm foundations: A study in Ocean Engineering used a random forest model trained on CPT data from 50 offshore sites, achieving an R² of 0.94 when predicting axial capacity of monopiles—significantly outperforming traditional CPT-based methods (view case study).
Shallow foundations on clay: Researchers applied a deep neural network to laboratory test results from 200 clay samples. The model predicted ultimate bearing capacity within ±5% of measured values, compared to ±20% for the Skempton formula.
Road embankment design: A transportation agency used gradient boosting to estimate bearing capacity along a 10 km highway alignment, using a combination of LiDAR topography, soil maps, and limited borehole data. The model enabled a 30% reduction in costly soil investigation points while maintaining design confidence.

These examples highlight that ML models can be reliably deployed for both static and dynamic foundation design when trained on representative data and validated against independent measurements.

Future Directions and Challenges

Despite promising results, several challenges must be overcome to integrate ML into routine geotechnical practice:

Data quality and standardization: Many historical datasets are incomplete or recorded in inconsistent formats. Establishing open, standardized geotechnical databases (like the GeoTransform Initiative) is essential for training robust models.
Model interpretability: Black‑box models are often met with skepticism by regulatory bodies. Techniques like SHAP (SHapley Additive exPlanations) and partial dependence plots can help explain predictions and build trust.
Incorporating spatial variability: Bearing capacity is inherently spatially correlated. Combining ML with geostatistics (e.g., kriging) and GIS layers—such as remote sensing data, digital elevation models, and soil maps—can improve spatial predictions.
Real‑time integration: Embedding ML into field‑testing instruments (e.g., CPT cones with onboard processing) would allow immediate bearing capacity estimates during drilling. Edge computing and lightweight models (e.g., quantized neural networks) make this feasible.
Regulatory acceptance: Engineering codes and standards currently rely on deterministic or simplified probabilistic methods. Demonstrating that ML models meet or exceed reliability targets will require collaboration between researchers, practitioners, and code committees.

Looking ahead, the fusion of ML with physics‑based models—often called physics‑informed neural networks (PINNs)—may offer the best of both worlds: data‑driven flexibility with adherence to governing geomechanical equations. This approach could handle sparse data scenarios more gracefully and produce more physically consistent predictions.

Conclusion

Machine learning algorithms represent a significant advancement in the estimation of bearing capacity, offering higher accuracy, adaptability, and efficiency compared to traditional empirical methods. By carefully curating training data, selecting appropriate algorithms, and validating results against field measurements, geotechnical engineers can harness ML to design safer and more economical foundations. Continued research, data sharing, and cross‑disciplinary collaboration will be key to overcoming current limitations and fully integrating machine learning into standard geotechnical practice. As these tools become more accessible, practitioners who embrace them will gain a competitive edge in delivering robust infrastructure solutions.