Kernel-based empirical bayesian classification methods with applications to protein phosphorylation and non-coding RNA

Date
2014-08
Authors
Menor, Mark Soroten
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
[Honolulu] : [University of Hawaii at Manoa], [August 2014]
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
With the advancement of high-throughput sequencing technologies, a new era of "big data" biological research has dawned. However, the abundance of biological data presents many challenges in their analysis and it has proven very difficult to extract important information out of the data. One approach to this problem is to use the methods of machine learning. In this dissertation, we describe novel probabilistic kernel-based learning methods and demonstrate their practical applicability by solving major bioinformatics problems at the transcriptome and proteome levels where the resulting tools are expected to help biologists further elucidate the important information contained in their data. The proposed binary classification method, the Classification Relevance Units Machine (CRUM), employs the theory of kernel and empirical Bayesian methods to achieve non-linear classification and high generalization. We demonstrate the practical applicability of CRUM by applying it to the prediction of protein phosphorylation sites, which helps explain the mechanisms that control many biochemical processes. Then we develop an extension of CRUM to solve multiclass problems, called the Multiclass Relevance Units Machine (McRUM). McRUM uses the error correcting output codes framework to decompose a multiclass problem into a set of binary problems. We devise a linear-time algorithm to aggregate the results into the final probabilistic multiclass prediction to allow for predictions in large scale applications. We demonstrate the practical applicability of McRUM through a solution to the identification of mature microRNA (miRNA) and piwi-interacting RNA (piRNA) in small RNA sequencing datasets. This provides biologists a tool to help discover novel miRNA and piRNA to further understand the molecular processes of the organisms they study.
Description
Ph.D. University of Hawaii at Manoa 2014.
Includes bibliographical references.
Keywords
Classification Relevance Units Machine
Citation
Extent
Format
Geographic Location
Time Period
Related To
Theses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Computer Science.
Table of Contents
Rights
All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Rights Holder
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.