Please use this identifier to cite or link to this item:

Kernel-based empirical bayesian classification methods with applications to protein phosphorylation and non-coding RNA

File Description SizeFormat 
Menor_Mark_r.pdfVersion for non-UH users. Copying/Printing is not permitted2.31 MBAdobe PDFView/Open
Menor_Mark_uh.pdfVersion for UH users2.26 MBAdobe PDFView/Open

Item Summary

Title: Kernel-based empirical bayesian classification methods with applications to protein phosphorylation and non-coding RNA
Authors: Menor, Mark Soroten
Keywords: Classification Relevance Units Machine
Issue Date: Aug 2014
Publisher: [Honolulu] : [University of Hawaii at Manoa], [August 2014]
Abstract: With the advancement of high-throughput sequencing technologies, a new era of "big data" biological research has dawned. However, the abundance of biological data presents many challenges in their analysis and it has proven very difficult to extract important information out of the data. One approach to this problem is to use the methods of machine learning.
In this dissertation, we describe novel probabilistic kernel-based learning methods and demonstrate their practical applicability by solving major bioinformatics problems at the transcriptome and proteome levels where the resulting tools are expected to help biologists further elucidate the important information contained in their data.
The proposed binary classification method, the Classification Relevance Units Machine (CRUM), employs the theory of kernel and empirical Bayesian methods to achieve non-linear classification and high generalization. We demonstrate the practical applicability of CRUM by applying it to the prediction of protein phosphorylation sites, which helps explain the mechanisms that control many biochemical processes.
Then we develop an extension of CRUM to solve multiclass problems, called the Multiclass Relevance Units Machine (McRUM). McRUM uses the error correcting output codes framework to decompose a multiclass problem into a set of binary problems. We devise a linear-time algorithm to aggregate the results into the final probabilistic multiclass prediction to allow for predictions in large scale applications. We demonstrate the practical applicability of McRUM through a solution to the identification of mature microRNA (miRNA) and piwi-interacting RNA (piRNA) in small RNA sequencing datasets. This provides biologists a tool to help discover novel miRNA and piRNA to further understand the molecular processes of the organisms they study.
Description: Ph.D. University of Hawaii at Manoa 2014.
Includes bibliographical references.
Rights: All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Appears in Collections:Ph.D. - Computer Science

Items in ScholarSpace are protected by copyright, with all rights reserved, unless otherwise indicated.