Title: Spectral methods for unsupervised ensemble learning and latent variable models
Abstract
With the availability of huge amounts of unlabeled data, unsupervised learning methods
are gaining increasing popularity and importance. We focus on
"unsupervised ensemble learning", where one obtains the predictions of multiple
classifiers over a set of unlabeled instances. The classifiers may be human
experts as in crowdsourcing, or prediction algorithms developed by research
groups worldwide. The challenge is to estimate the accuracies of the different classifiers
and combine them to an accurate meta-learner. To tackle this
problems we show how it relates to latent variable models, and derive simple
estimates for the classifiers' accuracies based on a spectral analysis of the ob-
served data. On the experimental side, we apply our methods to a problem in
Computational Biology, where for various classification tasks one combines the
results of multiple algorithms for improved accuracy.
In the second part of the talk, I will focus on extending the techniques developed
for unsupervised ensemble learning to a specific family of linear latent
variable models. For cases where the latent layer is binary, we derive an interesting
relation between the model parameters and the relatively recent notion
of tensor eigenvectors of the data higher order moments. We apply our methods
to the problem of inferring global ancestry in population genetics.