Optimal dimensionality reduction methods are proposed for the Bayesian
inference of a Gaussian linear model with additive noise in presence of
overabundant data. Three different optimal projections of the observations are
proposed based on information theory: the projection that minimizes the
Kullback-Leibler divergence between the posterior distributions of the original
and the projected models, the one that minimizes the expected Kullback-Leibler
divergence between the same distributions, and the one that maximizes the
mutual information between the parameter of interest and the projected
observations. The first two optimization problems are formulated as the
determination of an optimal subspace and therefore the solution is computed
using Riemannian optimization algorithms on the Grassmann manifold. Regarding
the maximization of the mutual information, it is shown that there exists an
optimal subspace that minimizes the entropy of the posterior distribution of
the reduced model; a basis of the subspace can be computed as the solution to a
generalized eigenvalue problem; an a priori error estimate on the mutual
information is available for this particular solution; and that the
dimensionality of the subspace to exactly conserve the mutual information
between the input and the output of the models is less than the number of
parameters to be inferred. Numerical applications to linear and nonlinear
models are used to assess the efficiency of the proposed approaches, and to
highlight their advantages compared to standard approaches based on the
principal component analysis of the observations.

DOI: 10.1016/j.csda.2018.03.002