DATA SCIENCE FOR MOLECULAR GENETICS AND COMMUNICATION IN THE NATURAL SCIENCES

Date

2022

Contributor

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

Ending Page

Alternative Title

Abstract

“By 2025, it’s estimated that 463 exabytes of data will be created each day globally – that’s the equivalent of 212,765,957 DVDs per day!” -World Economic Forum Data science refers to the study of increasingly large and complex datasets. Data that are too large for standard tools (e.g., Excel, Google Sheets) to analyze are often referred to as “big data.” While big data exists across many areas and is thought to be the path to answering many questions, there is still no consensus on the fundamental principles and skills needed to interact with big data. Further, skills to study big data are not universally taught systematically at the college level–the resulting gap in skills leaves students unable to analyze the same big data that are touted as the way to answer complex questions. This dissertation proposes a plan to close the big data knowledge gap by incorporating data science principles from diverse disciplines into a biology curriculum. Specifically, essential information was distilled from three independent study systems in cancer diagnostics, plant genomics, and academic publishing. Each study system contributed a different perspective on skills and knowledge from analyzing big data. From these systems, I identified three critical areas that are central to using big data effectively. From these diverse perspectives, I developed a model to assist instructors in constructing curricula that will work in many different biological contexts. I piloted the use of these principles in a summer course. I found that by incorporating instruction developed across knowledge areas, meaningful data science instruction can occur in any curriculum at any student level.

Description

Keywords

Bioinformatics

Citation

Extent

115 pages

Format

Geographic Location

Time Period

Related To

Related To (URI)

Table of Contents

Rights

All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.