Originally Posted on: https://medium.com/mlearning-ai/sefa-finding-semantic-vectors-in-latent-space-for-gans-9573c557f21e

Paper Explained: SeFa — Closed-Form Factorization of Latent Semantics in GANs

Motivation

The generator in GANs usually takes a randomly sampled latent vector z as the input and generates a high-fidelity image. By changing the latent vector z, we can change the output image.

a change in latent space in a certain direction results in a change in the output image

However, in order to change a specific attribute in the output image (e.g. hair color, facial expression, pose, gender, etc.), we need to know the specific direction for us to move our latent vector z.

Some previous works have tried to interpret the latent semantics in a supervised fashion. They usually label the dataset and train an attribute classifier to predict the labels of the images, and then calculate the direction vectors of the latent code z for each label. Even though there were some unsupervised methods for this task, most of them require model training and data sampling.

Nonetheless, this paper proposed a closed-form and unsupervised method, named SeFa, to let us find out these direction vectors for altering different attributes in the output image without data sampling and model training.

A closed-form solution is a mathematical expression with a finite number of standard operations.
The word “unsupervised” means that we don’t need to label the dataset.

Moving Latent Code

To change the latent code meaningfully, we need to first identify a semantically meaningful direction vector n. The new latent code is calculated as z’=z+αn, where α is the number of steps towards the direction n.

where edit(…) denotes the editing operation

The problem is how do we find out the semantically meaningful direction vector n?

Related Work — PCA Approach

In a previously published paper GANSpace: Discovering Interpretable GAN Controls, Härkönen et al. performed Principal Component Analysis (PCA) on the sampled data to find out the primary direction vectors in the latent space.

Remind that PCA is a tool to find out axes of large variations

Let’s take the generator in StyleGAN as an example. The latent code z will be fed into a Fully-Connected layer (FC) before going into each intermediate layer.

The proposed method is as follows: we first sample N random vectors {z₁, z₂, …, zₙ} and then feed them into the FC layer to get the projected outputs {w₁, w₂, …, wₙ}. Hence, we apply PCA to these {w₁, w₂, …, wₙ} values to get the k-dimensional basis V.

performing PCA on sampled latent vectors

Given a new image defined by w, we can edit it by varying PCA coordinates x before feeding it to the synthesis network as follows.

where each entry xᵢ of x is a separate control parameter. The entries xᵢ are initially zero.

Although this PCA method is unsupervised, it requires data sampling, which is inefficient. I mention this approach in this article because it bears a resemblance to today’s topic — SeFa.

SeFa — Semantic Factorization

State-of-the-art GAN models typically consist of multiple layers. Each layer learns a transformation from one space to another. This paper focuses on examining the first transformation, which can be formulated as an affine transformation as follows.

where A and b denote the **weight** and **bias** in the first transformation respectively

If we apply z’=z+αn to the input latent code, the first transformation formula can be simplified as follows.

Since G₁(z+αn)=G₁(z)+αAn, we know that if a latent code z and the direction vector n are given, the editing process can be achieved by adding αAn to the projected code after the transformation.

From this perspective, the weight parameter A should contain the essential knowledge of the image variation. Thus we aim to discover important latent directions by decomposing A.

The SeFa algorithm is similar to the previous PCA approach. But instead of applying PCA to the projected latent code G₁(z)=y, it applies a very similar process to the weights of the projection layer (weights of G₁) directly.

Just like PCA, this process also aims to find out the direction vectors that can cause large variations after the projection of A. It is formulated as the following optimization problem.

where ||…||₂ denotes the L2-norm and n is a unit vector

To find out k most important directions {n₁, n₂, …, nₖ}:

where N = [*n₁, n₂, …, n*ₖ] correspond to the top-k semantics

To prevent the equation from producing a trivial solution when ||nᵢ|| → ∞, we restrict nᵢ to be a unit vector and introduce the Lagrange multipliers {λ₁, λ₂, …, λₖ} into the equation.

By taking the partial derivative on each nᵢ, we have:

As you can tell, this is very similar to PCA. The only difference is that the SeFa method replaced the covariance matrix S with AᵀA, where A is the weights of G₁.

where λ is the eigenvalues and n is the eigenvectors

Instead of computing the eigenvectors of the covariance matrix, SeFa computes the eigenvectors of AᵀA. By virtue of this, we don’t need to sample any data for computing the covariance matrix of the projected vectors. This makes the algorithm much easier and faster, also makes it closed-form.

Generalizability

This paper has shown how they apply the SeFa algorithm to the following 3 types of GAN models: PGGAN, StyleGAN, and BigGANs. The following are the brief diagrams showing how each of them feeds the latent vector z into their generators.

PGGAN

The PGGAN generator is just like the traditional generator, in which the latent code z is fed into a Fully-Connected layer (FC) before going into the synthesis network.

For this kind of generator structure, SeFa studies the transformation from
the latent code to the feature map. (aka. the weights of the first FC layer)

StyleGAN

In the StyleGAN generator, the latent code is transformed into a style code and then is fed into each convolution layer.

The SeFa algorithm is flexible such that it supports interpreting all or any subset of layers. For this purpose, we concatenate the weight parameters (i.e. A) from all target layers along the first axis, forming a larger transformation matrix.

BigGAN

In the BigGAN generator, the latent code will be fed into both the initial feature map and each convolutional layer.

Hence, the analysis of BigGAN can be viewed as a combination of the above two types of GANs.

Results

References

[1] E. Härkönen, A. Hertzmann, J. Lehtinen and S. Paris, “GANSpace: Discovering Interpretable GAN Controls”, arXiv.org, 2022. https://arxiv.org/abs/2004.02546

[2] Y. Shen and B. Zhou, “Closed-Form Factorization of Latent Semantics in GANs”, arXiv.org, 2022. https://arxiv.org/abs/2007.06600

SeFa — Finding Semantic Vectors in Latent Space for GANs