Thermoelectric materials have the unique ability to convert heat directly into electricity, making them vital for energy harvesting and waste heat recovery. Discovering high-performance thermoelectric materials has traditionally been a slow and costly process. However, recent advances in machine learning (ML) algorithms are accelerating the discovery and optimization of these materials, enabling researchers to screen thousands of candidates computationally before a single sample is synthesized. This article explores how ML techniques are reshaping the search for efficient thermoelectrics, the algorithms behind the breakthroughs, and the road ahead.

Thermoelectric Materials and the Figure of Merit ZT

Thermoelectric materials are characterized by their efficiency, measured by the dimensionless figure of merit ZT. A higher ZT indicates better performance. The formula is ZT = S²σT / κ, where S is the Seebeck coefficient, σ is electrical conductivity, T is absolute temperature, and κ is thermal conductivity. Achieving a high ZT requires a delicate balance: materials must have high electrical conductivity and Seebeck coefficient while simultaneously possessing low thermal conductivity. This combination is rare because in most solids, high electrical conductivity (from mobile charge carriers) also leads to high electronic thermal conductivity, and the crystal structure that enhances phonon scattering (to reduce lattice thermal conductivity) often degrades carrier mobility. Traditional thermoelectric materials such as Bi₂Te₃, PbTe, and SiGe have ZT values around 1 near their operating temperatures, but practical applications demand ZT above 2 for widespread adoption.

The Challenge of Traditional Discovery

Historically, discovering new thermoelectric materials has relied on trial-and-error experimentation. Researchers would synthesize candidate compounds, measure their transport properties, and then iterate based on intuition or physical models. This approach is slow: a single synthesis and characterization cycle can take weeks, and the chemical space of possible inorganic compounds is enormous—estimated at over 10¹⁰ for quaternary systems alone. Furthermore, the relationships between composition, structure, and thermoelectric performance are highly nonlinear and involve competing mechanisms. Many promising candidates are overlooked because traditional screening cannot cover the vast combinatorial landscape. The need for accelerated discovery is urgent, especially as demand for waste-heat recovery and solid-state cooling grows.

Machine Learning: A Paradigm Shift

Machine learning algorithms can analyze large datasets of known materials to identify patterns and predict properties of new compounds, dramatically reducing the number of experiments needed. ML models are trained on existing data (e.g., from experiments or density functional theory (DFT) calculations) and learn the mapping between descriptors—such as elemental composition, crystal structure features, and electronic properties—and target values like ZT. Once trained, the model can predict performance for thousands of unexplored compositions in seconds, allowing researchers to focus synthesis efforts on the most promising candidates. This data-driven approach has already led to the discovery of novel thermoelectrics, including record-performance half-Heusler compounds and modified tin selenide systems.

How ML Models Predict Thermoelectric Properties

Building an effective ML model for thermoelectric materials requires careful selection of descriptors and features. Common features include elemental properties (electronegativity, atomic radius, valence electron count), crystal structure descriptors (space group, lattice parameters, density), and derived quantities (band gap, effective mass, Debye temperature). For deep learning, raw structure data can be encoded as graphs (nodes for atoms, edges for bonds) via crystal graph convolutional networks (CGCN) or as images of the electron density. The model then learns to correlate these features with experimental or computed ZT values. A key challenge is that ZT depends on multiple competing transport coefficients, so many models instead predict the individual components (Seebeck coefficient, electrical resistivity, thermal conductivity) and then combine them.

Key Machine Learning Algorithms in Materials Science

Several classes of algorithms have proven successful in accelerating thermoelectric materials discovery:

  • Supervised Learning: Used to predict ZT or transport coefficients from labeled data. Popular models include random forests, gradient boosting (e.g., XGBoost, LightGBM), and support vector regression. These methods work well on tabular data with moderate dimensionality and provide feature importance rankings that help identify physically relevant descriptors.
  • Deep Learning: Neural networks—especially feedforward networks, convolutional neural networks (CNNs) for structure images, and graph neural networks (GNNs) for crystal graphs—can capture complex, non‑linear interactions. For example, CGCNN (Crystal Graph Convolutional Neural Network) has been used to predict formation energies and band gaps with high accuracy, and adaptations now target thermoelectric properties.
  • Unsupervised Learning: Clustering algorithms (k‑means, DBSCAN) and dimensionality reduction (PCA, t‑SNE) help identify families of similar materials or outliers that may represent undiscovered high‑performance phases. Combined with supervised models, unsupervised methods can propose entirely new material classes for exploration.
  • Generative Models: Variational autoencoders (VAEs) and generative adversarial networks (GANs) can propose novel crystal structures or compositions by learning the underlying distribution of known good thermoelectrics. These models are still emerging but hold promise for inverse design—specifying a target ZT and having the algorithm suggest a synthesis‑feasible material.

Random Forests and Gradient Boosting

Random forests and gradient‑boosted trees are often the first choice for tabular materials data. They handle missing values, capture feature interactions, and are robust to outliers. In thermoelectric discovery, they have been used to predict ZT for half‑Heusler compounds with errors below 0.2 after training on DFT‑computed data. Feature importance from these models reveals that average atomic mass, crystal symmetry, and valence electron count are strong predictors of thermal conductivity, guiding researchers toward heavier, more complex structures.

Neural Networks and Deep Learning

Deep learning methods excel when the input data are high‑dimensional or hierarchical. For thermoelectrics, crystal graph neural networks (e.g., CGCNN, MEGNet, MatErials Graph Network) directly encode atomic bonds and local environments, learning structural features without manual feature engineering. These models have achieved state‑of‑the‑art accuracy for property prediction across multiple materials databases. A recent study used a GNN to screen over 20,000 hypothetical half‑Heusler compositions, identifying ten that were experimentally confirmed to have ZT > 1 at 700 K—a success rate unheard of in traditional trial‑and‑error.

Generative Models for Novel Compounds

Generative models go beyond prediction to actually create new candidate materials. A variational autoencoder trained on a large set of known thermoelectric structures can be used to decode new compositions from random points in a latent space. By conditioning the decoder on a desired ZT range, researchers can generate candidate formulas that satisfy performance targets. Although most generated materials still require DFT validation, this method dramatically expands the explored space and has already suggested n‑type Mg₃Sb₂‑based alloys that were later measured to have ZT = 1.5.

Advantages of Machine Learning Approaches

ML accelerates the discovery process by narrowing down the list of promising candidates before experimental validation. This reduces the number of syntheses by orders of magnitude—from thousands to dozens—saving both time and resources. For example, a traditional screening of quaternary chalcogenides might require years of work; an ML model trained on 10,000 samples can prioritize the top 50 candidates in hours. Additionally, ML can suggest materials that are not intuitive based on conventional physical reasoning, because the algorithm uncovers hidden correlations in high‑dimensional data. This has led to the discovery of thermoelectrics with unexpected combinations of elements, such as BaAg₂SnSe₄, which exhibits ultralow thermal conductivity due to its layered crystal structure.

Another advantage is the ability to integrate with high‑throughput DFT calculations. By using ML as a surrogate model to replace expensive DFT runs, researchers can scan millions of compositions in a fraction of the time. This active learning loop—where the model suggests new candidates, DFT evaluates them, and the results update the model—can rapidly converge on global optima. Such closed‑loop workflows are now being automated in materials acceleration platforms (MAPs) at institutions like MIT and Lawrence Berkeley National Lab.

Case Studies: ML in Thermoelectric Discovery

Several recent successes illustrate the power of ML‑driven discovery:

  • Half‑Heusler Superlattices: A team at Duke University used random forests and neural networks to screen over 300,000 half‑Heusler compositions. Their model predicted that TiNiSn‑based compounds doped with excess Ni or Cu would achieve ZT > 1.2. Experimental validation confirmed a peak ZT of 1.4 in TiNi₁.₁₅Sb at 950 K, a 40% improvement over the baseline.
  • SnSe Modifications: Tin selenide (SnSe) is known for its record ZT of 2.6 in single crystals, but its anisotropy and mechanical properties limit applications. Using gradient boosting, researchers identified Na‑doped SnSe polycrystals with ZT = 2.1 at 773 K, achieving comparable performance in a scalable form.
  • High‑Throughput Screening of Oxides: An ML model trained on 1,500 known oxides predicted that SrTiO₃‑based superlattices would have low lattice thermal conductivity due to nanoscale grain boundaries. Subsequent experiments confirmed a 50% reduction in κL compared to bulk SrTiO₃.

Challenges and Limitations

Despite its advantages, ML‑based discovery faces significant hurdles. Data scarcity and quality are the most pressing issues. The number of thermoelectric materials with measured ZT is still small—fewer than 10,000 compounds have reliable experimental data, and many measurements are taken under different conditions (temperature, doping level), making them hard to compare. DFT calculations can supplement the data, but they are expensive for large sets and suffer from known errors (e.g., band gap underestimation). Moreover, data leakage between training and test sets (e.g., similar compositions) can inflate performance metrics; careful splitting by material family is essential.

Model interpretability is another challenge. Deep neural networks are black boxes, making it difficult to extract physical insights or to trust predictions for extrapolation far from training data. Researchers are developing explainable AI techniques, such as SHAP (SHapley Additive exPlanations) values, to understand which features drive predictions. However, many industrial and academic teams still prefer simpler models (like random forests) that provide explicit feature importance.

Out‑of‑distribution prediction remains risky. An ML model trained on known thermoelectric families (e.g., half‑Heuslers and chalcogenides) may perform poorly on completely new structure types (e.g., boron‑based phases or metal‑organic frameworks). Combining ML with physical constraints (like the Boltzmann transport equation) or using active learning strategies can mitigate this, but the problem is far from solved.

Future Directions

The next decade will see several exciting developments at the intersection of ML and thermoelectrics:

  • Integration with Automated Laboratories: Robotics and ML are merging in self‑driving labs that automatically synthesize compounds, measure properties, and update models in real time. Such systems could screen thousands of compositions per month, drastically accelerating discovery.
  • Explainable and Physics‑Informed AI: New architectures that incorporate physical laws (e.g., conservation of energy, transport equations) as inductive biases will improve extrapolation and trust. Physics‑informed neural networks (PINNs) are already used to solve the Boltzmann transport equation for thermoelectrics, providing property predictions that respect fundamental physics.
  • Federated Learning for Data Sharing: One barrier to larger datasets is that many experimental groups are reluctant to share raw data. Federated learning allows multiple institutions to train a shared model without exchanging proprietary data, enabling a global thermoelectrics model.
  • Multi‑Objective Optimization: Real thermoelectric devices require not only high ZT but also mechanical strength, thermal stability, and low toxicity. Future ML models will simultaneously optimize multiple objectives, suggesting materials that balance performance with manufacturability.

Open databases such as the AFLOW and the Thermoelectrics Database are growing rapidly, providing richer training sets. Combined with advances in generative models and high‑throughput DFT, the pace of new thermoelectric discoveries is poised to accelerate exponentially.

Conclusion

Machine learning algorithms have transformed the search for high‑performance thermoelectric materials, shifting the paradigm from slow, intuition‑driven experimentation to rapid, data‑guided discovery. By leveraging supervised learning, deep neural networks, and generative models, researchers can now screen millions of compositions and identify promising candidates with unprecedented speed. While challenges remain—especially in data quality, model interpretability, and extrapolation—the integration of ML with automated experimentation and physics‑informed AI promises to deliver next‑generation thermoelectrics with ZT > 2. These materials will unlock efficient waste‑heat recovery, solid‑state cooling, and power generation for portable devices, contributing directly to global energy sustainability.