Title: Universal Loss and Gaussian Learning Bounds
In this talk I address two fundamental predictive modeling problems: choosing a universal loss function, and approaching non-linear learning problems with linear means.
A loss function quantifies the difference between the true values and the estimated fits, for a given instance of data. Different loss functions correspond to a variety of merits, and the choice of a "correct" loss may sometimes be questionable. Here, I show that for binary classification problems, the Bernoulli log-likelihood loss (log-loss) is universal with respect to practical alternatives. In other words, I show that by minimizing the log-loss we minimize an upper bound to any smooth, convex and unbiased binary loss function. This property justifies the broad use of log-loss in regression, in decision trees, as an InfoMax criterion (cross-entropy minimization) and in many other applications.
I then address a Gaussian representation problem which utilizes the log-loss. In this problem we look for an embedding of an arbitrary data which maximizes its "Gaussian part" while preserving the original dependence between the variables and the target. This embedding provides an efficient (and practical) representation as it allows us to consider the favorable properties of a Gaussian distribution. I introduce different methods and show that the optimal Gaussian embedding is governed by the non-linear canonical correlations of the data. This result provides a primary limit for our ability to Gaussianize arbitrary data-sets and solve complex problems by linear means.