Generative Image Models and Representation Learning

Vision is the problem of mapping visual data to representations. This section examines that mapping and its inverse—mapping representations to data—and forges a tight link between them. This part of the book covers some of the same topics as in other parts but with a new set of tools that revolve primarily around deep neural nets. These tools are highly effective and provide a simple and unified framework for dealing with a large family of problems in computer vision.

Outline

Chapter 30 Representation Learning introduces the idea of representation learning, where the goal is to train a model that produces good representations of the raw data, such as vector embeddings.
Chapter 31 Perceptual Grouping zooms in on the particular problem of identifying perceptual groups, which are an important kind of visual representation with a long history in vision science.
Chapter 32 Generative Models describes generative models that synthesize images.
Chapter 33 Generative Modeling Meets Representation Learning forges a connection between representation learning and generative modeling, describing these as inverses of each other.
Chapter 34 Conditional Generative Models extends generative models to the conditional setting, where some data is synthesized based on other data.

Notation

This part deals extensively with random variables and probability distributions. See the Notation section before chapter 1 for our conventions. A few reminders follow.

\(X\) and \(Y\) are random variables, while \(x\) and \(y\) are realizations of those variables. \(\mathcal{X}\) and \(\mathcal{Y}\) are the domains of those variables.
\(p(X)\) is a distribution over \(X\); that is, \(p(X)\) is a function over the domain \(\mathcal{X}\). Conversely, \(p(\mathbf{x})\) is the probability of a realization \(X=\mathbf{x}\). It is short for \(p(X = \mathbf{x})\), and it is a scalar.