Throughout this book, we adhere to the following notational conventions. Note that some of these symbols are placeholders, while others refer to specific objects. As a general rule of thumb, the indefinite article “a” indicates that the symbol is a placeholder and that similarly formatted symbols can denote other objects of the same type. For example, “\(x\): a scalar” means that lowercased letters generally represent scalar values.

Numerical Objects

  • \(x\): a scalar

  • \(\mathbf{x}\): a vector

  • \(\mathbf{X}\): a matrix

  • \(\mathsf{X}\): a general tensor

  • \(\mathbf{I}\): an identity matrix—square, with \(1\) on all diagonal entries and \(0\) on all off-diagonals

  • \(x_i\), \([\mathbf{x}]_i\): the \(i^\mathrm{th}\) element of vector \(\mathbf{x}\)

  • \(x_{ij}\), \(x_{i,j}\),\([\mathbf{X}]_{ij}\), \([\mathbf{X}]_{i,j}\): the element of matrix \(\mathbf{X}\) at row \(i\) and column \(j\).

Set Theory

  • \(\mathcal{X}\): a set

  • \(\mathbb{Z}\): the set of integers

  • \(\mathbb{Z}^+\): the set of positive integers

  • \(\mathbb{R}\): the set of real numbers

  • \(\mathbb{R}^n\): the set of \(n\)-dimensional vectors of real numbers

  • \(\mathbb{R}^{a\times b}\): The set of matrices of real numbers with \(a\) rows and \(b\) columns

  • \(|\mathcal{X}|\): cardinality (number of elements) of set \(\mathcal{X}\)

  • \(\mathcal{A}\cup\mathcal{B}\): union of sets \(\mathcal{A}\) and \(\mathcal{B}\)

  • \(\mathcal{A}\cap\mathcal{B}\): intersection of sets \(\mathcal{A}\) and \(\mathcal{B}\)

  • \(\mathcal{A}\setminus\mathcal{B}\): set subtraction of \(\mathcal{B}\) from \(\mathcal{A}\) (contains only those elements of \(\mathcal{A}\) that do not belong to \(\mathcal{B}\))

Functions and Operators

  • \(f(\cdot)\): a function

  • \(\log(\cdot)\): the natural logarithm (base \(e\))

  • \(\log_2(\cdot)\): logarithm with base \(2\)

  • \(\exp(\cdot)\): the exponential function

  • \(\mathbf{1}(\cdot)\): the indicator function, evaluates to \(1\) if the boolean argument is true and \(0\) otherwise

  • \(\mathbf{1}_{\mathcal{X}}(z)\): the set-membership indicator function, evaluates to \(1\) if the element \(z\) belongs to the set \(\mathcal{X}\) and \(0\) otherwise

  • \(\mathbf{(\cdot)}^\top\): transpose of a vector or a matrix

  • \(\mathbf{X}^{-1}\): inverse of matrix \(\mathbf{X}\)

  • \(\odot\): Hadamard (elementwise) product

  • \([\cdot, \cdot]\): concatenation

  • \(\|\cdot\|_p\): \(L_p\) norm

  • \(\|\cdot\|\): \(L_2\) norm

  • \(\langle \mathbf{x}, \mathbf{y} \rangle\): dot product of vectors \(\mathbf{x}\) and \(\mathbf{y}\)

  • \(\sum\): summation over a collection of elements

  • \(\prod\): product over a collection of elements

  • \(\stackrel{\mathrm{def}}{=}\): an equality asserted as a definition of the symbol on the left-hand side


  • \(\frac{dy}{dx}\): derivative of \(y\) with respect to \(x\)

  • \(\frac{\partial y}{\partial x}\): partial derivative of \(y\) with respect to \(x\)

  • \(\nabla_{\mathbf{x}} y\): gradient of \(y\) with respect to \(\mathbf{x}\)

  • \(\int_a^b f(x) \;dx\): definite integral of \(f\) from \(a\) to \(b\) with respect to \(x\)

  • \(\int f(x) \;dx\): indefinite integral of \(f\) with respect to \(x\)

Probability and Information Theory

  • \(X\): a random variable

  • \(P\): a probability distribution

  • \(X \sim P\): the random variable \(X\) has distribution \(P\)

  • \(P(X=x)\): the probability assigned to the event where random variable \(X\) takes value \(x\)

  • \(P(X \mid Y)\): the conditional probability distribution of \(X\) given \(Y\)

  • \(p(\cdot)\): a probability density function (PDF) associated with distribution P

  • \({E}[X]\): expectation of a random variable \(X\)

  • \(X \perp Y\): random variables \(X\) and \(Y\) are independent

  • \(X \perp Y \mid Z\): random variables \(X\) and \(Y\) are conditionally independent given \(Z\)

  • \(\sigma_X\): standard deviation of random variable \(X\)

  • \(\mathrm{Var}(X)\): variance of random variable \(X\), equal to \(\sigma^2_X\)

  • \(\mathrm{Cov}(X, Y)\): covariance of random variables \(X\) and \(Y\)

  • \(\rho(X, Y)\): the Pearson correlation coefficient between \(X\) and \(Y\), equals \(\frac{\mathrm{Cov}(X, Y)}{\sigma_X \sigma_Y}\)

  • \(H(X)\): entropy of random variable \(X\)

  • \(D_{\mathrm{KL}}(P\|Q)\): the KL-divergence (or relative entropy) from distribution \(Q\) to distribution \(P\)