Understanding the Mathematics Behind T-sne for Visualizing High-dimensional Data

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a popular technique for visualizing high-dimensional data in two or three dimensions. It helps reveal patterns and clusters that are not easily observable in the original data. Understanding the mathematical principles behind t-SNE can improve its application and interpretation.

Core Concepts of t-SNE

t-SNE converts high-dimensional data points into a probability distribution that reflects their similarities. It then seeks a low-dimensional embedding that preserves these similarities as closely as possible. The process involves two main steps: computing pairwise similarities and minimizing a divergence between distributions.

Mathematical Foundations

In the high-dimensional space, the similarity between two points is modeled using a Gaussian distribution. The probability that point j is a neighbor of point i is given by:

pj|i = frac{exp(-|xi – xj|^2 / 2sigma_i^2)}{sum_{k neq i} exp(-|xi – xk|^2 / 2sigma_i^2)}

This defines a probability distribution over neighbors for each point. The joint probability pij is symmetrized as:

pij = frac{pj|i + pi|j}{2N}

where N is the total number of points. In the low-dimensional space, similarities are modeled using a Student’s t-distribution with one degree of freedom:

qij = frac{(1 + |yi – yj|^2)^{-1}}{sum_{k neq l} (1 + |yk – yl|^2)^{-1}}

Optimization Process

The goal is to find low-dimensional points yi that minimize the Kullback-Leibler divergence between the high- and low-dimensional distributions:

KL(P || Q) = sum_{i neq j} pij log frac{pij}{qij}

This is achieved through gradient descent, adjusting the positions of points in the low-dimensional space to reduce divergence. The gradients are computed based on the differences between pij and qij.

Conclusion

Understanding the mathematical basis of t-SNE involves grasping how similarities are modeled and how the optimization aligns these similarities across dimensions. This foundation helps in tuning parameters and interpreting visualizations effectively.