Online Learning and Prediction with Eventual Almost Sure Guarantees

Wu, Changlong
Santhanam, Narayana Prasad
Electrical Engineering
Journal Title
Journal ISSN
Volume Title
Starting Page
Ending Page
Alternative Title
In this dissertation, we introduce a general prediction paradigm where a learner predicts properties of an underlying model and of future samples from the model using past observations from the model. The prediction game continues for infinite steps in an online fashion as the sample size grows with new observations (not necessarily i.i.d.). After each prediction, the predictor incurs a binary (0-1) loss. The probability model underlying a sample is otherwise unknown except that it belongs to a known class of models. The goal of a learner is to make only finitely many errors (i.e. loss 1) with probability 1 under the generating model, no matter what the underlying model may be in the known model class. Any model class and loss pair that admit predictors that make finitely many errors in the above fashion is called eventually almost surely (or e.a.s. for abbreviation) predictable. Our main contributions of this dissertation are general characterizations for the e.a.s.-predictable class and loss pairs. Using our general characterizations, we establish tight necessary and sufficient conditions for a wide range of prediction problems to be e.a.s.-predictable, which include hypothesis testing, online learning and risk management theory. Moreover, our results establish striking connections between the e.a.s.-predictability and the notion of regularization. While e.a.s.-predictable classes admit predictors with only finitely many errors, where we made the final error may yet remain unknown. In particular, we say a class and loss pair to be e.a.s.-learnable if it is e.a.s.-predictable and, in addition, we are able to identify the last error with any given confidence using a universal stopping rule. We provide general characterizations for the e.a.s.-learnability, which is tight in many natural settings. While the above results bring out broad principles, to bring about a more refined development of the framework, we study three broad categories of applications in our framework: hypothesis testing, online classification and learning, and risk prediction. Our characterization of hypothesis testing problems includes testing general properties of distributions using i.i.d. samples, including testing entropy properties of discrete distributions and testing properties of random matrices with Bernoulli entries. Our general results in the e.a.s.-prediction framework also strengthen and extend prior results in Dembo and Peres (1994) with simple and elementary proofs and provide a partial resolution to an open problem posed therein. In our approach to online classification, a classifier obtains the training data in an online fashion and predicts labels on the next instance, but is required to only make finitely many errors over an infinite horizon. Extending the classical bounded error scenario by Littlestone (1988), we show that a binary labeled hypothesis class can be learned online with finitely many errors almost surely using i.i.d. samples from a given distribution µ iff the class is effectively countable w.r.t. that distribution µ. We also characterize the setting where µ is unknown and show that corrupting the labels by independent Bernoulli(η) noise does not change learnability so long as η < 1/2. We extend our results to the case where class labels need not be binary. Going past prior results on learning recursive functions in Zeugmann and Zilles (2008), we show that the class of all binary valued computable functions on naturals can be online learned with finitely many errors almost surely by a computable predictor given samples from certain non-degenerated distributions. Next, we bound the computational complexity of the predictor, and study classes of functions that can be computed in exponential and polynomial time respectively. Lastly, we study the problem of predicting upper bounds on the next draw of an unknown probability distribution after observing a sample generated by it. We show that a prediction rule exists that violates the bounds only finitely many often almost surely if and only if a class can be decomposed into countable union of tight classes. This implies, e.g., the class of all monotone distributions can not be predicted in such a sense.
Statistics, Computer science, Mathematics, Decomposition, Enumeration, Eventually Almost Surely, Finitely many errors, Online Prediction, Regularization
148 pages
Geographic Location
Time Period
Related To
Table of Contents
All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Rights Holder
Local Contexts
Email if you need this content in ADA-compliant format.