Denoising diffusion probabilistic models and navigating the moduli spaces of Normal distributions

In this post, we connect the Siegel upper half-space to denoising diffusion probabilistic models (DDPMs), and explore a way of navigating the Siegel upper half-space.

Denoising diffusion probabilistic models
Denoising diffusion probabilistic models (DDPMs) generate data by iteratively removing small amounts of Gaussian noise. This is typically done over hundreds of time steps. This generation process is the time-reversal of the learning process. The idea is to start with a given datapoint $\mathbf{x}$ from the ground truth data distribution, and destroy the structure in $\mathbf{x}$ until all that is left is random noise. During this process, we use a parameterized model (a deep network) to learn how the noise has been destroyed, in order to learn how to reverse the noise and generate the original image.

In other words, with DDPMs, we attempt to learn a generative model $q(\mathbf{x})$ that can generate samples as close as possible to the true distribution $p(\mathbf{x})$ of the ground truth dataset. To do this, we start with a point $\mathbf{x}_0 \sim p(\mathbf{x})$ in the dataset, and then iteratively destroy the structure in $\mathbf{x}$ , over $T$ timesteps from $0$ to $T$ , until the final output $\mathbf{x}_T$ is isotropic Gaussian noise, $\mathbf{x}_T \sim \mathcal{N}(\mathbf{\mu}, \mathbf{\sigma}^2I)$ . Generation is then the reverse process, where we start with $\mathbf{x}_T \sim q(\mathbf{x})$ , and then iteratively compute $q(\mathbf{x}_{t-1} \rvert \mathbf{x}_{t})$ until we arrive at the output generated datapoint $\mathbf{x}_0$ .

The Siegel upper half-space and multivariate normal distributions
The diffusion process as described above can be viewed as a path through (a subspace of) the parameter space of multivariate normal distributions. In other words, it is a trajectory through the moduli space of multivariate normal distributions. We can view this moduli space as the Siegel upper half-space $\mathcal{H}_g$ for genus $g$ , which is the space of $g \times g$ symmetric matrices with positive-definite imaginary part. The boundary $\partial\mathcal{H}_g$ contains the positive semi-definite matrices.

The reasoning is as follows. Covariance matrices are symmetric. When there is no linear relationship between features (ie, uncorrelated features), a covariance matrix is positive-definite; when there is a linear relationship between features (ie, has correlated features), a covariance matrix is positive semi-definite. Then we can view the covariance matrix $\Sigma_\theta$ as the imaginary part of a matrix $M$ and the mean vector $\mu_\theta$ as a diagonal matrix which is the real part of $M$ . In other words, $\Sigma_\theta = Im(M)$ and $diag(\mu_\theta) = Re(M)$ . The space of these matrices $M$ , which we will denote by $D$ , is then a subspace $D \subset \mathcal{H}_g$ (specifically, $D$ is the subspace of matrices with diagonal real part). Then, changes in the multivariate normal distribution’s parameters are exactly changes in the coordinates in this subspace $D$ of $\mathcal{H}_g$ . Thus the diffusion process is a trajectory through this subspace $D$ of $\mathcal{H}_g$ .

Diffusion models typically use isotropic covariance matrices. The reasons include computational efficiency and tractability, ease of optimization, and theoretical conventions. However, if we can capture more of the overall noise by using general (not isotropic) covariance matrices, and if we can navigate between these distributions intentionally, then we should be able to use fewer time steps in the diffusion process. This would allow us to considerably speed up diffusion modeling.

How can we learn trajectories through the Siegel upper half-space $\mathcal{H}_g$ ? Here is one idea.

Consider the Siegel-Jacobi space of complex dimension $g$ , which we will denote by $\mathcal{SJ}_g$ . It is the product of the Siegel upper half-space $\mathcal{H}_g$ and the Heisenberg group $\mathbf{H}_{2g+1}(\mathbb{R})$ , denoted $\mathcal{SJ}_g = \mathcal{H}_g \times \mathbf{H}_{2g+1}(\mathbb{R})$ . The Heisenberg group $\mathbf{H}_{2g+1}(\mathbb{R})$ consists of upper-triangular matrices with 3 free parameters $\{ (\mathbf{a}, \mathbf{b}; c) \vert \mathbf{a}, \mathbf{b} \in \mathbb{R}^{2g}; c \in \mathbb{R} \}$ and 1’s on the diagonal. We can think of the Heisenberg group as defining a position, and the Siegel upper half-space as defining the content that is located at the location defined by the Heisenberg group.

Now consider the natural group action on the Siegel-Jacobi $\mathcal{SJ}_{g}$ space by the Jacobi group $\mathcal{J}_{2g}(\mathbb{R})$ , which is the semi-direct product of the symplectic group (the group of isometries of $\mathcal{H}_g$ ) and the Heisenberg group, denoted $\mathcal{J}_{2g}(\mathbb{R}) = Sp(2g, \mathbb{R}) \ltimes \mathbf{H}_{2g+1}(\mathbb{R})$ . This action is given by the mapping:

$(M, (\mathbf{a}, \mathbf{b}; c)) \cdot (\tau, (\mathbf{a'}, \mathbf{b'}; c')) \mapsto (M \cdot \tau, (\mathbf{a}, \mathbf{b}; c) \circ (\tau, (\mathbf{a'}, \mathbf{b'}; c')))$
where $M \cdot \tau \mapsto (A\tau + B)(C\tau + D)^{-1}$ for $M = \begin{pmatrix} A & B \\ C & D \end{pmatrix}$ and $(\mathbf{a}, \mathbf{b}; c) \circ (\tau, (\mathbf{a'}, \mathbf{b'}; c')) \mapsto (\mathbf{a} + \mathbf{a'}, \mathbf{b} + \mathbf{b'}; c + c' + \mathbf{a}^T \tau \mathbf{b'} - \mathbf{b}^T \tau \mathbf{a'})$

We now have a way to navigate the moduli space of normal distributions and, in principle, to run diffusion models, with deep connections to number theory, automorphic forms, conformal field theory, and many other areas of math and physics. The form of this action is also strikingly similar to that of structured state space models (S4) for very long sequence learning. Given the roots of diffusion models in thermodynamics, these are intriguing results.

Symbolic Mirror blog

Denoising diffusion probabilistic models and navigating the moduli spaces of Normal distributions

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply