Markov Processes: Estimation and Sampling in Undersampled Regime

Paravitorghabeh, Ramezan

Markov Processes: Estimation and Sampling in Undersampled Regime

dc.contributor.author	Paravitorghabeh, Ramezan
dc.date.accessioned	2017-12-18T21:25:55Z
dc.date.available	2017-12-18T21:25:55Z
dc.date.issued	2015-08
dc.description	Ph.D. University of Hawaii at Manoa 2015.
dc.description	Includes bibliographical references.
dc.description.abstract	This work concentrates on estimation and sampling aspects in slow mixing Markov Processes. When a process has slow mixing property, a large number of observations is needed before exploring the state space of the process properly. Empirical properties of finite sized samples from Markov processes need not reflect stationary properties. When empirical counts of samples eventually reflect the stationary properties, we say that the process has mixed. The contributions of this work revolve around interpretation of samples before mixing has occurred. In first part of the work, we deal with estimation of parameters of the process, while in the second part, we will show how the evolution of the samples obtained from the process can be exploited to identify subsets of state space which communicate well together. This insight will be translated into algorithmic rules for detecting communities in graphs. Estimation : We observe a length-n sample generated by an unknown, stationary ergodic Markov process (model ) over a finite alphabet A. Given any string w of symbols from A we want estimates of the conditional probability distribution of symbols following w (model parameters), as well as the stationary probability of w. Two distinct problems that complicate estimation in this setting are (i) long memory, and (ii) slow mixing which could happen even with only one bit of memory. Any consistent estimator can only converge pointwise over the class of all ergodic Markov models. Namely, given any estimator and any sample size n, the underlying model could be such that the estimator performs poorly on a sample of size n with high probability. But can we look at a length-n sample and identify if an estimate is likely to be accurate ? Since the memory is unknown a-priori, a natural approach is to estimate a potentially coarser model with memory kn = O(log n). As n grows, estimates get refined and this approach is consistent with the above scaling of kn also known to be essentially optimal. But while effective asymptotically, the situation is quite different when we want the best answers possible with a length-n sample, rather than just consistency. Combining results in universal compression with Aldous' coupling arguments, we obtain su cient conditions on the length-n sample (even for slow mixing models) to identify when naive (i) estimates of the model parameters and (ii) estimates related to the stationary probabilities are accurate ; and also bound the deviations of the naive estimates from true values. Sampling : We introduce a new randomized algorithm for community detection in graphs by defining Markov random walks on them. The mixing properties of the random walks we construct are used to identify communities. The more polarized the communities are, the slower mixing the random walk is. Our algorithm creates a random walk on the nodes of the graph. We start different, coupled random walks from different nodes. We then adapt the coupling from the past approach to identify clusters before the chain mixes (rather than sample from the stationary distribution). The Markov random walks are built such that the restriction of the random walk to within any one cluster mixes much faster than the overall walk itself. Finally, we analyze the performance of the algorithm on specific graph structures, including the Stochastic Block Models, LFR benchmark random graphs and real world networks. The number of communities is not known to our algorithm in advance, nor is any generative model we may have used. Where relevant, we used cluster edit distance (number of addition and deletion of edges needed to turn a graph into disjoint cliques) and compared the results with the state-of-the-art Correlation Clustering CC-PIVOT algorithm.
dc.identifier.uri	http://hdl.handle.net/10125/51117
dc.language.iso	eng
dc.publisher	[Honolulu] : [University of Hawaii at Manoa], [August 2015]
dc.relation	Theses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Electrical Engineering
dc.title	Markov Processes: Estimation and Sampling in Undersampled Regime
dc.type	Thesis
dc.type.dcmi	Text

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 2015-08-phd-paravitorghabeh_r.pdf
Size:: 3.47 MB
Format:: Adobe Portable Document Format
Description:: Version for non-UH users. Copying/Printing is not permitted

Download

Name:: 2015-08-phd-paravitorghabeh_uh.pdf
Size:: 3.47 MB
Format:: Adobe Portable Document Format
Description:: For UH users only

Download

Collections

Ph.D. - Electrical Engineering