Covariance Selection Quality and Approximation Algorithms

Tafaghodi Khajavi, Navid
Kuh, Anthony
Electrical Engineering
Journal Title
Journal ISSN
Volume Title
Starting Page
Ending Page
Alternative Title
This dissertation conducts a study of graphical models, discusses the quality of statistical model selection approximation, and proposes algorithms for model approximation. Graphical models are useful tools for describing the geometric structure of networks in applications that deal with high dimensional data. Learning from these high dimensional data requires large computation power which is not always available due to hardware limitation for different applications. Thus, we need to compromise between model complexity and its accuracy by using the best possible approximation algorithm that chooses a simpler, yet informative model. The first problem we study in this work is the quality of statistical model selection. We consider the problem of quantifying the quality of approximation model. The statistical model selection often uses a distance measure such as the Kullback-Leibler (KL) divergence between the original distribution and the model distribution to quantify the quality of approximated model distribution. Although the KL divergence is minimized to obtain model approximation in many cases, there are other measures and divergences that can be used to do so. We extend the body of research by formulating the model approximation as a parametric detection problem between the original distribution and the model distribution. The proposed detection framework results in the computation of symmetric closeness measures such as receiver operating characteristic (ROC) and area under the curve (AUC). In particular, we focus on statistical model selection for Gaussian distributions, i.e. the covariance selection [1]. In the case of covariance selection, closeness measures such as KL divergence, reverse KL divergence, ROC, and AUC depend on the eigenvalues of the correlation approximation matrix (CAM). We find expressions for the KL divergence, the log-likelihood ratio, and the AUC as a function of the CAM. We present a simple integral to compute the AUC. In addition, easily computable upper and lower bounds are also found for the AUC to assess the quality of an approximated model. Through some examples and simulation for real and synthetic data, we investigate the quality of the covariance selection for both tree-structured models and non-tree structured models. The second problem we target in this work is to formulate a general framework and algorithms to perform covariance selection. We develop a multistage framework for graphical model approximation using a cascade of models such as trees. In particular, we look at the model approximation problem for Gaussian distributions as linear transformations of tree models. This is a new way to decompose the covariance matrix. Here, we propose an algorithm which incorporates the Cholesky factorization method to compute the decomposition matrix and thus can approximate a simple graphical model using a cascade of the Cholesky factorization of the tree approximation transformations. The Cholesky decomposition enables us to achieve a tree structure factor graph at each cascade stage of the algorithm which facilitates the use of the message passing algorithm since the approximated graph has fewer loops compared to the original graph. The overall graph is a cascade of factor graphs with each factor graph being a tree. This is a different perspective on the approximation model, and algorithms such as Gaussian belief propagation can be used on this overall graph. Here, we present theoretical results that guarantee the convergence of the proposed model approximation using the cascade of tree decompositions. In the simulations, we look at synthetic and real data and measure the performance of the proposed framework by comparing the KL divergences.
Electrical engineering, Covariance Matrix Approximation, Covariance Matrix decomposition, Covariance Selection, Detection problem, Gaussian Graphical Model, Tree Approximation
130 pages
Geographic Location
Time Period
Related To
Table of Contents
All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Rights Holder
Local Contexts
Email if you need this content in ADA-compliant format.