Kernel-based empirical Bayesian classification methods with applications to protein phosphorylation and non-coding RNA

dc.contributor.authorMenor, Mark Soroten
dc.date.accessioned2015-10-02T20:39:58Z
dc.date.available2015-10-02T20:39:58Z
dc.date.issued2014-08
dc.description.abstractWith the advancement of high-throughput sequencing technologies, a new era of "big data" biological research has dawned. However, the abundance of biological data presents many challenges in their analysis and it has proven very difficult to extract important information out of the data. One approach to this problem is to use the methods of machine learning. In this dissertation, we describe novel probabilistic kernel-based learning methods and demonstrate their practical applicability by solving major bioinformatics problems at the transcriptome and proteome levels where the resulting tools are expected to help biologists further elucidate the important information contained in their data. The proposed binary classification method, the Classification Relevance Units Machine (CRUM), employs the theory of kernel and empirical Bayesian methods to achieve non-linear classification and high generalization. We demonstrate the practical applicability of CRUM by applying it to the prediction of protein phosphorylation sites, which helps explain the mechanisms that control many biochemical processes. Then we develop an extension of CRUM to solve multiclass problems, called the Multiclass Relevance Units Machine (McRUM). McRUM uses the error correcting output codes framework to decompose a multiclass problem into a set of binary problems. We devise a linear-time algorithm to aggregate the results into the final probabilistic multiclass prediction to allow for predictions in large scale applications. We demonstrate the practical applicability of McRUM through a solution to the identification of mature microRNA (miRNA) and piwi-interacting RNA (piRNA) in small RNA sequencing datasets. This provides biologists a tool to help discover novel miRNA and piRNA to further understand the molecular processes of the organisms they study.
dc.description.degreePh.D.
dc.identifier.urihttp://hdl.handle.net/10125/100402
dc.languageeng
dc.publisherUniversity of Hawaii at Manoa
dc.relationTheses for the degree of Doctor of Philosophy (University of Hawaii at Manoa). Computer Science.
dc.rightsAll UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
dc.subjectClassification Relevance Units Machine
dc.titleKernel-based empirical Bayesian classification methods with applications to protein phosphorylation and non-coding RNA
dc.typeThesis
dc.type.dcmiText

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Menor_Mark_r.pdf
Size:
2.26 MB
Format:
Adobe Portable Document Format
Description:
Version for non-UH users. Copying/Printing is not permitted
Loading...
Thumbnail Image
Name:
Menor_Mark_uh.pdf
Size:
2.21 MB
Format:
Adobe Portable Document Format
Description:
Version for UH users