Table of Contents
t-SNE (t-Distributed Stochastic Neighbor Embedding) is a popular technique for visualizing high-dimensional data in two or three dimensions. It helps reveal patterns and clusters that are not easily observable in the original data. Understanding the mathematical principles behind t-SNE can improve its application and interpretation.
Core Concepts of t-SNE
t-SNE converts high-dimensional data points into a probability distribution that reflects their similarities. It then seeks a low-dimensional embedding that preserves these similarities as closely as possible. The process involves two main steps: computing pairwise similarities and minimizing a divergence between distributions.
Mathematical Foundations
In the high-dimensional space, the similarity between two points is modeled using a Gaussian distribution. The probability that point j is a neighbor of point i is given by:
pj|i = frac{exp(-|xi – xj|^2 / 2sigma_i^2)}{sum_{k neq i} exp(-|xi – xk|^2 / 2sigma_i^2)}
This defines a probability distribution over neighbors for each point. The joint probability pij is symmetrized as:
pij = frac{pj|i + pi|j}{2N}
where N is the total number of points. In the low-dimensional space, similarities are modeled using a Student’s t-distribution with one degree of freedom:
qij = frac{(1 + |yi – yj|^2)^{-1}}{sum_{k neq l} (1 + |yk – yl|^2)^{-1}}
Optimization Process
The goal is to find low-dimensional points yi that minimize the Kullback-Leibler divergence between the high- and low-dimensional distributions:
KL(P || Q) = sum_{i neq j} pij log frac{pij}{qij}
This is achieved through gradient descent, adjusting the positions of points in the low-dimensional space to reduce divergence. The gradients are computed based on the differences between pij and qij.
Conclusion
Understanding the mathematical basis of t-SNE involves grasping how similarities are modeled and how the optimization aligns these similarities across dimensions. This foundation helps in tuning parameters and interpreting visualizations effectively.