Markov Processes: Estimation and Sampling in Undersampled Regime

Date
2015-08
Authors
Paravitorghabeh, Ramezan
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
[Honolulu] : [University of Hawaii at Manoa], [August 2015]
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
This work concentrates on estimation and sampling aspects in slow mixing Markov Processes. When a process has slow mixing property, a large number of observations is needed before exploring the state space of the process properly. Empirical properties of finite sized samples from Markov processes need not reflect stationary properties. When empirical counts of samples eventually reflect the stationary properties, we say that the process has mixed. The contributions of this work revolve around interpretation of samples before mixing has occurred. In first part of the work, we deal with estimation of parameters of the process, while in the second part, we will show how the evolution of the samples obtained from the process can be exploited to identify subsets of state space which communicate well together. This insight will be translated into algorithmic rules for detecting communities in graphs. Estimation : We observe a length-n sample generated by an unknown, stationary ergodic Markov process (model ) over a finite alphabet A. Given any string w of symbols from A we want estimates of the conditional probability distribution of symbols following w (model parameters), as well as the stationary probability of w. Two distinct problems that complicate estimation in this setting are (i) long memory, and (ii) slow mixing which could happen even with only one bit of memory. Any consistent estimator can only converge pointwise over the class of all ergodic Markov models. Namely, given any estimator and any sample size n, the underlying model could be such that the estimator performs poorly on a sample of size n with high probability. But can we look at a length-n sample and identify if an estimate is likely to be accurate ? Since the memory is unknown a-priori, a natural approach is to estimate a potentially coarser model with memory kn = O(log n). As n grows, estimates get refined and this approach is consistent with the above scaling of kn also known to be essentially optimal. But while effective asymptotically, the situation is quite different when we want the best answers possible with a length-n sample, rather than just consistency. Combining results in universal compression with Aldous' coupling arguments, we obtain su cient conditions on the length-n sample (even for slow mixing models) to identify when naive (i) estimates of the model parameters and (ii) estimates related to the stationary probabilities are accurate ; and also bound the deviations of the naive estimates from true values. Sampling : We introduce a new randomized algorithm for community detection in graphs by defining Markov random walks on them. The mixing properties of the random walks we construct are used to identify communities. The more polarized the communities are, the slower mixing the random walk is. Our algorithm creates a random walk on the nodes of the graph. We start different, coupled random walks from different nodes. We then adapt the coupling from the past approach to identify clusters before the chain mixes (rather than sample from the stationary distribution). The Markov random walks are built such that the restriction of the random walk to within any one cluster mixes much faster than the overall walk itself. Finally, we analyze the performance of the algorithm on specific graph structures, including the Stochastic Block Models, LFR benchmark random graphs and real world networks. The number of communities is not known to our algorithm in advance, nor is any generative model we may have used. Where relevant, we used cluster edit distance (number of addition and deletion of edges needed to turn a graph into disjoint cliques) and compared the results with the state-of-the-art Correlation Clustering CC-PIVOT algorithm.
Description
Ph.D. University of Hawaii at Manoa 2015.
Includes bibliographical references.
Keywords
Citation
Extent
Format
Geographic Location
Time Period
Related To
Theses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Electrical Engineering
Table of Contents
Rights
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.