Bayesian learning framework with kernel-imbedded Gaussian processes applied to microarray analysis

Zhao, Xin
Journal Title
Journal ISSN
Volume Title
Starting Page
Ending Page
Alternative Title
Thesis (Ph.D.)--University of Hawaii at Manoa, 2008.
DNA microarray technology has provided researchers a high-throughput means to simultaneously measure expression levels for thousands of genes in an experiment. With a probit regression setting and assuming that the link function between significant gene expression data and latent variable for the response label is a Gaussian process, a kernel-induced hierarchical Bayesian framework is built for a cancer classification problem by using microarray gene expression data.
In summary, built on a Gaussian process model, a kernel-induced hierarchical Bayesian framework using microarray gene expression data for a cancer multi-classification problem is presented in this study. Our main contribution is a fully automated learning algorithm to solve this Bayesian model. Satisfactory results have been achieved in both the simulated examples and the real-world data studies.
Six published microarray datasets were analyzed in this study. The results show that predictive performance of our method for all these datasets is better than or at least as good as that of other state-of-the-art microarray analysis methods. Our method especially shows its superiority in analyzing one dataset that contains multiple suspicious mislabeled samples. For each of these datasets, we identified a set of significant genes, which can be used for further biological inspection at genome level.
Targeting a multi-classification problem and adopting a variable selection approach with a Gibbs sample as core, we developed the algorithm, kernel-imbedded Gaussian Process (KIGP), to analyze microarray data under a Bayesian framework. Through a feature projection procedure and using a univariate ranking scheme as gene-selection strategy, we further designed an alternative microarray analysis model, natural kernel-imbedded Gaussian Process (NKIGP). In the end, embedded with a reversible jump Markov chain Monte Carlo (RJMCMC) model, we present an efficient algorithm with a cascading structure to unify the proposed methods of this study.
The simulated examples demonstrate that, our method performs almost always close to the Bayesian bound in both the cases with linear Bayesian classifiers and the cases with very non-linear Bayesian classifiers. Even with mislabeled training samples, our method is still robust, showing its broad usability to those microarray analysis problems that linear methods may work flakily.
Includes bibliographical references (leaves xxx-xxx).
Also available by subscription via World Wide Web
179 leaves, bound 29 cm
Geographic Location
Time Period
Related To
Theses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Computer Science; no. 5137
Table of Contents
All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Rights Holder
Local Contexts
Email if you need this content in ADA-compliant format.