Foundations of Learning

Learning is one of the key components of a computer vision system. In these chapters, we cover the foundations of machine learning from a general perspective but explore examples using vision problems.

Outline

Notation

  • The algorithms we will see apply to many kinds of signals, not just images. Therefore, in this part we will use \(\mathbf{x}\) to represent model inputs rather than \({\boldsymbol\ell}\). A model’s final output will usually be represented by \(\mathbf{y}\).

  • Neural networks consist of a sequence of layers that perform a sequence of transformations \(\mathbf{x}_0 \rightarrow \mathbf{x}_1 \rightarrow \ldots \rightarrow \mathbf{y}\). When we consider a single layer in isolation, we will generically refer to its input as \({\mathbf{x}_{\texttt{in}}}\) and its output as \({\mathbf{x}_{\texttt{out}}}\). We will also use the variables \(\mathbf{h}\) and \(\mathbf{z}\) to represent certain kinds of intermediate representations in neural nets, which will be defined when they are first used.