Ph.D. - Computer Science
Permanent URI for this collectionhttps://hdl.handle.net/10125/20034
Browse
Recent Submissions
Item type: Item , Domain-specific foundation models for science applications: Self-supervised learning with SAR and DXA(University of Hawai'i at Manoa, 2025) Glaser, Yannik; Sadowski, Peter; Computer ScienceThis dissertation explores the use of self-supervised pre-training in non-natural image domains andprovides three main contributions to the literature: 1) it adapts current self-supervised learning frameworks to pre-train a model specific to synthetic aperture radar Wave mode imagery; 2) it adapts current self-supervised learning frameworks to pre-train a model specific to dual-energy x-ray absorptiometry; 3) it analyzes embedding characteristics of both models to identify representation quality metrics effective beyond natural image applications. The immediate goal of this work is to provide embedding models that generalize effectively to a range of downstream tasks in their respective domains. These models serve as highly specific foundation models — they generalize well to in-domain tasks, they are robust to training settings and hyperparameter choices, and they are extremely labeled-data-efficient. Training models with self-supervised methods that are tuned to the characteristics of the data domain is important because most self-supervised frameworks are highly tuned for optimal perfor- mance on natural images. By adapting these frameworks to respect domain-specific characteristics of the data or simply removing natural-image-focused biases, downstream task performance and generalizability can be improved. The secondary goal is to add to the body of literature exploring representation characteristics, searching for embedding space qualities that indicate a well-performing model without access to labeled data for direct evaluation. This addresses a crucial bottleneck for similar domain-specific pre-training efforts where architecture search, hyperparameter tuning, and comparison between self-supervised methods are all hindered by the need to train each candidate model to completion and evaluate performance on specific downstream tasks. By training novel embedding models for two separate vision domains and extensively analyzing intermediate representations of successful and unsuccessful models, this study seeks to establish a foundation for future research attempting similar pre-training efforts for other computer vision domains beyond natural images.Item type: Item , Enhancing data exploration through a pragmatic voice assistant(University of Hawai'i at Manoa, 2024) Tabalba, Roderick; Leigh, Jason; Computer ScienceRecent advancements in Natural Language Interfaces (NLIs), driven by powerful natural language models such as BERT, LLAMA, GPT, and ChatGPT, have generated considerable interest in enhancing human-computer interaction. However, a significant gap persists in the ability of these systems to facilitate genuinely natural conversations. Current NLIs often rely on users to initiate dialogue with wake words like "Alexa" or "Hey Siri," leading to interactions that lack the spontaneity and fluidity of human exchanges. This limitation is particularly problematic when considering the role of individual personality traits in communication. According to the attraction paradigm, users tend to prefer interaction styles that align with their personalities, suggesting that NLIs could benefit from being more attuned to users' unique conversational preferences. To address these challenges, this dissertation explores the concept of always-listening capabilities as a foundation for developing proactive AI systems. Drawing on literature from Pragmatics, the Psychology of spontaneous thought, and Personality research, I examine the subtleties of human communication, including contextual awareness, interruptions, the capacity for spontaneous engagement, and the relationship between characteristic personality traits and desired proactive behavior. This research highlights the potential for NLIs to achieve natural human interactions and engage users in deeper, more meaningful dialogues. In response to this need, I developed Articulate+, an NLI designed to investigate always-listening functionality, which eliminates the requirement for wake words and fosters ongoing conversations that reflect natural human interaction. Building on the insights gained from this exploration, I subsequently created ArticulatePro, a more advanced system focused on studying proactive behavior in voice assistants. ArticulatePro continuously listens to user interactions and proactively generates visualizations to enhance data exploration tasks. Through user studies comparing interactions with ArticulatePro and its non-proactive counterpart, I found that participants demonstrated higher levels of engagement and generated more data insights. The proactive nature of the system not only enhanced the reliability of user insights but also improved the system’s learnability and efficiency. Additionally, users exhibited a greater utilization of diverse chart types, leading to richer data analysis experiences. This research not only contributes to the design and functionality of NLIs but also deepens our understanding of how artificial intelligence can emulate human-like interaction, ultimately redefining user engagement with technology.Item type: Item , HUMAN-CENTERED AI DEVELOPMENT FOR AUGMENTED COGNITION(University of Hawai'i at Manoa, 2024) Doliashvili, Mariam; Crosby, Martha E.; Computer ScienceThis dissertation examines the integration of artificial intelligence (AI) into daily life, emphasizing human-centered AI development for augmented cognition (AC), which focuses on enhancing rather than replacing human capabilities. It explores how individuals perceive and interact with AI systems compared to how they interact with other humans. In addition, it studies how AI systems can interpret and adapt to humans’ cognitive states.During the development of AI systems, the focus has been on creating fully autonomous systems, completely replacing manual human labor. However, there are tasks where humans outperform AI systems, and there are also tasks where human-AI systems (HAIS) can outperform both traditional human labor and autonomous AI systems. Rapid replacement of humans with AI systems raises security and privacy concerns and increases operational complexity. In addition, AI models have limited personalization, marginalizing user groups from diverse languages and cultures. This research examines designing AI systems to complement human capabilities: how they could detect, adapt to, and personalize responses based on a human’s cognitive state, encompassing intentions, languages, and cultural backgrounds. The following studies identify user intent through biometric markers (e.g., mouse or pen pressure), detect native and non-native users, assess the impact of linguistic and cultural diversity on user preferences and performance, and evaluate Large Language Models (LLMs) as tools for human enhancement. This dissertation underscores the importance of designing AI systems with consideration and adaptability to users’ diverse backgrounds, ensuring inclusivity and personalization and that they are being delivered to people in a way that enhances their capabilities.Item type: Item , Follow-Up Questions Improve Generative AI Output and User Experience: Working Towards a Collaborative Model of Human-AI Interaction(University of Hawai'i at Manoa, 2024) Tix, Bernadette Jzexoia; Binsted, Kim; Computer ScienceThis research investigates the impact of Large Language Models (LLMs) generating follow-up questions in response to user requests for short (1-page) text documents. This dissertation will argue that there are clear benefits to LLMs asking follow-up questions, and engaging users in thought-provoking and context-clarifying dialog before producing documents and other outputs. Two experiments support this research, including a pilot study and a larger full study with an improved design based on insights from the pilot study. In both experiments, users interacted with a novel web-based AI system designed to ask follow-up questions. Users requested documents they would like the AI to produce. The AI then generated follow-up questions to clarify the user’s needs or offer additional insights before generating the requested documents. After answering the questions, users were shown a document generated using both the initial request and the questions and answers, and a document generated using only the initial request. Users indicated which document they preferred and gave feedback about their experience with the question-answering process. The findings of these experiments show clear benefits to question-asking both in document preference and in the qualitative user experience, and further show that users found more value in questions which were thought-provoking, open-ended, or offered unique insights into the user’s request as opposed to simple information-gathering questions. These results point to the need to incorporate follow-up questions and collaborative dialog into LLMs as a part of the human / AI interaction experience.Item type: Item , The Makawalu Visualization Environment: Defining Design Practice of Spatial Augmented Realities for Place-Based Learning in Indigenous Education(University of Hawai'i at Manoa, 2024) Noe, Kari; Leigh, Jason; Computer ScienceThis research investigates how immersive technologies can effectively support place-based learning approaches to benefit Indigenous students. Employing a research through design (RtD) approach, the dissertation aims to identify design principles and metrics that measure the impact of immersive technologies within student development programs, particularly when they are adapted to unique Indigenous educational frameworks. The iterative design and development of the spatial augmented reality (SAR) system, Makawalu Visualization Environment (VE), serves as a case study to evaluate how these technologies facilitate meaningful interactions with place, culture, and ancestral knowledge. The key contributions of this work include a set of design principles, an equitable co-design framework, and a rubric for assessing the effectiveness of immersive technology in facilitating place-based learning activities. The findings of this dissertation contribute to the field of human-computer interaction (HCI) by underscoring the need for indigenous-centered design processes that prioritize the unique learning contexts of Indigenous communitiesItem type: Item , TEXT SUMMARIZATION IN QUANTUM COMPUTING(University of Hawai'i at Manoa, 2024) Mohamed, Muzamil; Crosby, Martha E.; Computer ScienceThis study explores the classification of paradigms in natural language processing (NLP) tasks, emphasizing the distinction between compositional and distributional approaches. While compositional methods prioritize structural understanding, distributional approaches focus on contextual behaviors. DisCo, a hybrid model integrating both paradigms, shows some hope in overcoming the limitations of traditional compositional and distributional models by incorporating grammatical structures as inputs and leveraging quantum computing principles. This hope is based on the analogy of composing words as entangled states and the meaning extraction as information flow between entangled states. In this dissertation, we show that this hope might not be realized on the current and near future quantum hardware. Our practical implementation reveals challenges from the model in handling longer texts and issues with handling semantic composition, especially in distinguishing whether a simple sentence is a summary of a simple paragraph. We offer insights on why such task is hard for near future quantum hardware in general, and if some solutions are realized they may not overcome classical NLP in terms of computational resources.Item type: Item , Model-agnostic Trajectory Abstraction and Visualization Method for Explainability in Reinforcement Learning(University of Hawai'i at Manoa, 2024) Takagi, Yoshiki; Leigh, Jason; Computer ScienceReinforcement learning (RL) has evolved rapidly in the past decade and is now capable of achieving human capabilities, such as self-driving cars. Moreover, in the last few years, the performance of deep RL, which applies deep neural networks to RL, has surpassed that of skilled human players in areas of video games, chess, and Go. However, as deep RL models become more complex, understanding and interpreting these models poses significant challenges. Explainable AI (XAI) research has shown the potential to close the gap between humans and a deep RL agent by providing explanations that help users to understand how the agent works. XAI approaches have been tailored for both RL experts and non-experts. For RL experts, visualizations of internal agent parameters reveal the learning mechanisms of deep RL agents, offering precise insights into agent behavior. However, this approach is less accessible to users who do not have RL expertise (non-RL experts). The communication gap between RL experts and non-experts thus remains a critical issue. For example, in discussions about the decision boundaries of autonomous Unmanned Aerial Vehicles (UAVs) between RL practitioners and pilots, the following issues arise: Pilots, who are non-RL experts, have domain knowledge, but they cannot use XAI interfaces designed for RL experts in the assessment of the RL model; In order to obtain feedback from pilots, RL experts need to explain the behavior of the RL model while minimizing the use of RL terminology; Pilots may use domain specific terminology during the assessment and the RL expert needs to interpret the pilot's statements and apply them to the model; Therefore, the central questions are: How can both RL experts and non-RL experts understand the behavior of an agent? In other words, how can humans naturally build a mental model of an agent? A promising approach is the 'familiarization effect' from cognitive psychology, where exposure to an agent's behavior in various scenarios helps users intuitively understand the agent, which is later applied to Human Robot Interaction. For instance, one research group observed that watching a robot’s trajectory in videos enables users to predict the robot's future trajectory. Another study pointed out that short video clips of an agents' game-play can effectively build mental models of the agents' performance. However, this strategy may be less effective with multiple agents or in complex, extended tasks due to human limitations in short-term visual memory. Therefore, this dissertation addresses this problem by proposing a trajectory visualization that gives a high-level view of agents' behaviors through an abstraction of agents' behavior. This research will open up new directions, such as that domain experts who are not familiar with RL can get more involved in the development of RL which can lead to identifying important agent's behavior patterns that cannot be recognized by RL experts alone, and that the possibility of allowing general users to assess the capabilities and limitations of agents in the task of monitoring self-driving agents as a driver.Item type: Item , Parallel Cache-Efficient Algorithms on GPUs(University of Hawaii at Manoa, 2023) Berney, Kyle Mitsuo; Sitchinava, Nodari S.; Computer ScienceGraphics Processing Units (GPUs) have emerged as a highly attractive architecture for general-purpose computing due to their numerous programmable cores, low-latency memory units, and efficient thread context switching capabilities. However, theoretical research on parallel algorithms for GPUs is challenging due to the multitude of interdependent factors influencing overall runtime. Computational models are commonly employed to provide simplified abstractions of computing system architectures. However, developing a computational model that is both simple and accurate, encompassing all performance-affecting aspects of GPU algorithms, is a seemingly impossible task. Existing GPU models often incorporate numerous variables to account for specific performance factors, rendering them less accessible to researchers. This dissertation obviates the lack of a widely accepted model of computation for GPUs by instead employing multiple classical parallel models to capture both parallel computational complexity and cache-efficiency. Namely, we leverage existing knowledge and algorithmic techniques from the Parallel Random Access Machine (PRAM), Parallel External Memory (PEM), and Distributed Memory Machine (DMM) models to aid in the design and analysis of GPU algorithms at various levels of detail. We validate and demonstrate our approach through case studies on specific problems (e.g., sorting, searching, and single source shortest paths), providing both theoretical analysis and corresponding empirical results. Our results highlights the applicability of the selected parallel models of computation to GPUs and illustrates how theoretical research can expose valuable insights into the performance of GPU algorithms in practice.Item type: Item , Novel algorithms to account for uncertainties in the sequencing of genetic material with skewed abundance(University of Hawaii at Manoa, 2022) Arisdakessian, Cedric; Belcaid, Mahdi; Poisson, Guylaine; Computer ScienceThe sequencing of genetic material (microbial DNA or RNA) is essential in biological experiments. However, while the cost of sequencing has decreased substantially, the highly skewed distribution of genetic material makes it challenging to accurately represent the genetic content of a sample. For instance, in DNA-based metagenomic experiments, DNA fragments are randomly sampled and used to identify and quantify organisms present in an environmental sample. Rare species are sampled less frequently, thus challenging subsequent bioinformatic analyses. Given the prevalence and the drastic implications of the uneven distribution of genetic material on bioinformatic analyses, our research focuses on new graph- and deep learning-based methods to address these issues in three different contexts. Specifically, we propose (1) an imputation method that can accurately recover the abundance of under-represented genetic material in single-cell RNA-seq experiments (2) a binning method to reduce genome fragmentation in viral metagenome sequencing experiments, and (3) a tool to explore and cluster viral populations based on their genomic structure. Our contributions focus on three popular biological contexts for which the issue of abundance hampers the bioinformatic analyses. Furthermore, the last two chapters focus on understanding viral diversity and modeling the genesis of novel virus strains through recombinations. Despite being at the core of the current COVID-19 crisis, the issue of recombination remains understudied, and few tools exist to model how viral populations evolve through recombination.Item type: Item , Decentralized Multi-robot SLAM And Ad Hoc Network For Exploration In A Remote And Enclosed Environment(University of Hawaii at Manoa, 2022) Idota, Tetsuya; Baek, Kyungim; Computer ScienceRobotic exploration in an enclosed, unknown, and unstructured environment such as a cave is a challenging problem as the robots cannot directly communicate with any exterior facilities. In such an environment, the robots need to perform Simultaneous Localization and Mapping (SLAM) without any external supports, e.g., GPS, a prior map, wifi access points, and so on. To cope with this situation, the proposed method establishes a decentralized multi-robot system forming an ad hoc network. Each robot is deployed at the entrance of the environment and moves into deeper areas while maintaining the distances to its neighboring robots to locally communicate with each other. In this formation, they perform the decentralized cooperative localization and the distributed submap building. The decentralized cooperative localization enables the robots to localize themselves with respect to the global reference frame by taking mutual measurements on a relative pose between the robots. Communicating with neighboring robots, each robot updates its estimated location based on sensory data about relative poses with respect to the neighbors and their estimated locations. To deal with the overconfidence problem, which leads to inconsistent estimates due to cyclic updates, the proposed method performs the conservative data exchange. When the robots pass their estimation to each other, they reduce the confidence in their estimation by applying fractional exponents to the probability distributions. As a result, the distributions become "smoother" with less intense peaks. Thereby, the robots can avoid inconsistent amplification of confidence and update their estimation based on each other's information without the knowledge about the entire network topology. The distributed submap building directs the robots to cooperatively build submaps, each of which represents a small local area in the environment. Each robot builds a series of submaps along its trajectory. When the robot detects that its neighboring robots enter the areas where submaps were built previously and can no longer update the submaps by itself, it passes the submaps to such robots so that the submaps can be used by the receiving robots. Thus, each of the robots does not have to hold all the submaps that it has created from the beginning of the operation, and hence can save the memory space for the mapping process. Consequently, the robots can deliver consistent results of mapping and localization in a decentralized manner while holding only the submaps associated with their local areas. The proposed methods are implemented in Robot Operating System (ROS) and the protocols for the communication are also designed. For experiments, demonstration, and evaluation, the implemented system is simulated using the Gazebo simulator.Item type: Item , Research through Design of Bendable Interactive Playing Cards(University of Hawaii at Manoa, 2021) Kirshenbaum, Nurit; Robertson, Scott P.; Computer ScienceComputer Interaction has become second nature for almost all modern people and touch interaction on smart devices has likewise become ubiquitous. It is easy to forget how new the touch interaction paradigms are and the path it took to develop them. Nowadays, interaction designers are looking for even more novel interaction techniques with previously unheard of input and output channels. One direction for this search is bendable interfaces - interfaces that require their users to bend them as a form of interaction. In this work, I will overview and analyze a collection of prior academic research relating to bendable devices. Researchers often wonder: what will work well with bend interactions? In this dissertation I offer the answer "bendable interactive playing cards", and I frame my work on this word-salad using the Research through Design methodology. Ultimately, I hope to answer the question: Is bending interaction suitable, feasible, and expressive for interactive playing cards? My interactive playing card devices, which I call PEPA (Paper-like Entertainment Platform Agents) are inspired by my love of both paper-based and digital card games. By combining computational capabilities in multiple stand-alone physical devices, I can offer more than the two media forms can offer separately. I describe 6 possible scenarios where such a system can be used as well as other hybrid digital-physical game systems inspired by card and board games. Of course, the concept of interactive playing cards does not automatically lend itself to bend interaction, so I will try to justify this integration of ideas via a study of the literature and my observations of card players. Following my arguments to incorporate bending and interactive cards, I created a proof-of-concept prototype. In true Research through Design form, this was a situation where one has to build an object before they can understand what research directions to take. In this case, the prototype led to further user studies regarding the timing of actions during the bend gesture and a model for bend events. At a different point, I used design as a research activity when I conducted a workshop for designing games for interactive cards. I will report the procedure, results and analysis from this workshop to illustrate the design space of possible games. Research through Design is a research approach within the field of HCI that has multiple, sometimes conflicting, interpretations. It is mostly agreed that such research involves the creation of some prototype and an end goal of extracting and disseminating knowledge. In this work I will present the different approaches for documenting RtD as well as my own contribution: the Designer's Reframe and Refine Diagram. This is a method that uses a diagram as a tool to reflect on the design process as a whole in a prototype-centric way. I will show how I use this method to systematically document 5 versions of prototype in the PEPA project.Item type: Item , SageXR - Design, Development And Study Of Efficacy And User Behavior In Virtual And Augmented Reality Project Rooms(University of Hawaii at Manoa, 2021) Kobayashi, Dylan; Leigh, Jason; Computer ScienceRecent years have seen a decrease in cost with significant improvements in quality of XR (Augmented and Virtual Reality) headsets. XR devices are no longer limited to research facilities and have made their way into the consumer market where they are known for entertainment. Over the decades, research has shown the immersion provided by XR devices improves user memory and performance, where its nature is especially suited to 3D and multidimensional data. XR devices are traditionally used in a support role to provide a focused experience or as an inspection apparatus; similar to the role of a microscope. But, such a role prevents XR devices from supporting information analysis from start to finish. Part of this is related to the bias towards 3D and often exclusion of 2D applications which are common in our daily work. However, XR devices have great potential for supporting project room usage of ideation, brainstorming, and information analysis from start to finish. This thesis conducted research to answer hypotheses made in regards to working within immersive environments. The contributions are: COVACh, a framework to guide the design of a system based on user interaction states with information media; SageXR a working prototype designed using the framework; and the discoveries made while testing SageXR to verify the hypothesis. COVACh was found to be successful in guiding the design of SageXR; all user study participants were able to submit their task work for evaluation and formed a favorable option regarding XR’s potential to support project room usage. All participants were observed to make use of at least 2x more virtual area than physical area by the end of their task work, indicating that provided the opportunity, participants wanted and were able to incorporate more space into their workflow than physically accessible. Participants not only used more space but also created information structures which, in some cases, would not be physically possible to replicate due to accessibility. Interestingly there seems to be an underlying desire for layout support best described as dynamic tiled display wall structures within the virtual environment. These are just some of the discoveries made which also includes some caveats found during usage of SageXR. Discussion regarding future development directions and how to address these caveats are covered, describing how SageXR can support usage beyond the project room.Item type: Item , Perceive: Proactive Exploration of Risky Concept Emergence for Identifying Vulnerabilities & Exposures(University of Hawaii at Manoa, 2021) Paradis, Carlos Vinicius; Kazman, Rick; Computer ScienceNational databases that collect various kinds of textual threat reports such as ASRS, CERT, and NVD manually process their reports individually. They then offer data products to disseminate the aggregate information, like newsletters, alerts or individual report searching. The goal of this research is to connect these individual reports thematically and temporally to identify emerging or recurring threats, by analyzing large collections of text, source code, collaboration and communication patterns. This capability, I argue, enables us to identify the emergence and recurrence of such themes, and the contexts in which they re-occur, facilitating faster and more capable mitigation. I propose two models to shed light on this goal: An empirical model of vulnerabilities as bugs, the commit flow model, and one of the vulnerabilities and aviation safety threats as topics, the topic flow model. I use as gold standard existing manual workflows in both domains, reflected in the existing data products by these organizations, and empirically evaluate if the automated models can match or outperform existing manual practices.Item type: Item , Exploratory Analysis Of Research Publications Collections With Human Steerable Black-box Models. Towards Generalizing Inverse Computations For Semantic Interaction.(University of Hawaii at Manoa, 2021) Gonzalez Martinez, Alberto; Leigh, Jason; Computer ScienceUnderstanding highly-dimensional data sets is a complex task for many scientists, engineers, and intelligence analysts. Traditionally, this problem has been tackled with linear pipelines that rely on mathematical models and algorithms to summarize relationships and structure, producing a visual representation of the data in a collapsed, low-dimensional form. The main issue with these traditional pipelines is that they are driven solely by algorithms or models, and without a human in the loop, they can potentially limit sense-making by masking expected or known structure in the data. In recent years, Semantic Interaction has become a promising approach as a user interaction methodology for model steering in Visual Analytics systems, as it provides mechanisms with which to adjust the parameter space, explore data, and test hypotheses. Under the paradigm of Semantic Interaction, users can steer model parameters and explore data representations without leaving the visual space, thus combining algorithms and models with expert human judgment. Semantic Interaction systems need to invert the computation of one or more mathematical models to support a bidirectional structure within their pipelines to facilitate this interaction modality. For example, dimensionality reduction and clustering are frequently used to explore multidimensional data in Visual Analytic systems and are typically always present in Semantic Interaction systems. Since users interact with clustered data in its compressed form, the system needs to link this compressed form to the original high dimensional representation to affect the model and algorithms from within the visualization. The necessity of this reverse link from the low-dimensional representation to the high-dimensional input space requires that Semantic Interaction pipelines be bidirectional. Most examples of Semantic Interaction systems make use of simple and interpretable linear models for dimensionality reduction and clusterings such as LDA (Latent Dirichlet Allocation) and PCA (Principal Component Analysis) to be able to provide a straightforward bidirectional pipeline. By contrast, the state-of-the-art techniques for dimensionality reduction and clustering in visual analytics, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), are "black-box" models, which are neither linear nor directly interpretable. Furthermore, these techniques are computationally expensive, suffer from out-of-sample stability problems, and are complex to retrain for new instances, requiring precise hyper-parameter tuning. A novel Deep Surrogate model approach is proposed in this thesis to perform backward and forward computations within semantic interaction pipelines that were previously implemented with "black-box" models. This approach allows for the efficient "merging" of new instances into a previously trained model without retraining. It also provides a reverse link, allowing a trained model's parameters to be affected by user interactions with the visual representation of data. To demonstrate this approach's usefulness, I present the Zexplorer system, a tool for exploring Large Document Collections of Research papers with Semantic Interaction, as well as a user study to validate the approach. The Zexplorer system is built as an extension to Zotero, a widely-used open source bibliography system.Item type: Item , Energy-Efficient Mobile Sensing(University of Hawaii at Manoa, 2020) Mo, Tianli; Lim, Lipyeow; Computer ScienceMobile sensing has emerged rapidly in the past few years as a promising avenue for collecting, leveraging and querying various information around users. One reason is that mobile devices have increasingly become central computing devices in people's daily lives. The other reason is that more affordable sensors have been embedded in smartphones. Since the mobile devices are battery-powered devices, we have to face the critical problem of how to reduce the energy consumption of mobile sensing, especially the processing of continuous queries on the sensed data, in order to extend the operational lifetime of mobile devices. In this dissertation, we first propose the ACQUA framework to reduce the energy overhead of sensor data acquisition and processing for individual mobile devices. ACQUA reduces the energy consumption of individual mobile devices by modifying both the ordering and the segments of data streams that are retrieved by the continuous query evaluation. Then we propose CloQue, a framework that exploits correlation among the different sensing data streams. It can 1) reorder the order of predicate processing to preferentially select predicates, and 2) intelligently propagate the query evaluation results to dynamically update the confidence values of other correlated context predicates, in order to maximally reduce the energy consumption on individual mobile devices. For energy savings on multiple mobile devices, we propose a collaborative query processing framework called CQP. CQP exploits the overlap (in both the sensor sources and the query predicates) across multiple executing queries, and then reduces the energy consumption of repetitive execution and data transmissions by having a set of `leader' mobile device nodes execute and disseminate these shareable partial results. CQP also utilizes lower-energy short-range wireless links to disseminate such results directly among proximate mobile devices. We also propose an improved framework called C2QP, that can exploit not only the correlation among multiple mobile devices and their sensed data streams, but can also process the continuous queries collaboratively by sharing partial results and shareable sensed data.Item type: Item , Cnn-based Plant Species Categorization Using Natural Images(University of Hawaii at Manoa, 2020) Krause, Jonas; Lim, Lipyeow; Baek, Kyungim; Computer ScienceAutomatic identification of plants from natural images is a challenging problem that is relevant to both the disciplines of Botany and Computer Science. The classification of plant images at the species level is a computer vision task called fine-grained categorization. This categorization problem is particularly complicated due to a large number of plant species, the inter-species similarity, the large-scale variation in appearance, and the lack of annotated data. Despite the availability of dozens of plant identification mobile applications, categorizing plant species from natural images remains an unsolved problem - e.g., most of the existing applications do not address the multi-scale nature of this type of image. Furthermore, an automated system capable of addressing the complexity of this computer vision problem has important implications for society at large, not only in preserving ecosystem biodiversity and public education but also in numerous agricultural activities such as detecting abnormalities in plants and analyzing food crops. In this dissertation, I present a new approach to the problem of automatically categorizing plant species using photos taken in nature. Essentially, this approach assembles a collection of Convolutional Neural Networks (CNN-based) to create a plant categorization system that I named WTPlant (What's That Plant?). One of the novelties of this system is a preprocessing method that extracts multi-scale samples from natural images, making the classification models more robust to variations in the scale of the plant. A comprehensive experimental evaluation of this new preprocessing method compares its performance with frequently used data augmentation techniques over different classification models of the system. WTPlant also enables the categorization of multiple plant components simultaneously by employing distinct classification pipelines for plants (leaves, branches, bushes, and trees) and flowers. The combination of these multi-organ analyses ensures a broader categorization process. It can be further extended by adding pipelines for fruits, barks, roots, etc., depending on the availability of annotated images. In summary, this new approach locates multiple plant organs in a natural image and guides the extraction of representative samples at various scales used to train and test state-of-the-art CNN classification models. To apply the WTPlant system in a real-world environment, I implement a scale-up process that adapts the classification models. In this process, models have their top classification layers replaced to accommodate a more significant number of plant species. But due to a lack of training data, these models have to be pre-trained to achieve satisfactory performance. As a result, I also implement the integration of domain-specific knowledge to create plant and flower expert classification models. Initially focusing on the University of Hawai'i Manoa campus plants, this research aims to produce the most accurate system for classifying Hawaiian plants and make it available to botanists, tourists, and the entire community to use. As a case study, I create a mobile version of the WTPlant system to categorize plant species from the Harold L. Lyon Arboretum, a University of Hawai'i Research Unit located at the upper end of the Manoa Valley.Item type: Item , Laha: a Framework for Adaptive Optimization of Distributed Sensor Frameworks(University of Hawaii at Manoa, 2020) Christe, Anthony James; Johnson, Philip M.; Computer ScienceDistributed Sensor Networks (DSNs) face a myriad of technical challenges. This dissertation examines two important DSN challenges. One problem is converting "primitive" sensor data into actionable products and insights. For example, a DSN for power quality (PQ) might gather primitive data in the form of raw voltage waveforms and produce actionable insights in the form of the ability to predict when PQ events are going to occur by observing cyclical data. For another example, a DSN for infrasound might gather primitive data in the form of microphone counts and produce actionable insight in the form of determining what, when, and where the signal came from. To make progress towards this problem, DSNs typically implement one or more of the following strategies: detecting signals in the primitive data (deciding if something is there), classification of signals from primitive data (deciding what is there), and localization of signals (when and from where did the signals come). Further, DSNs make progress towards this problem by forming relationships between primitive data by finding correlations between spatial attributes, temporal attributes, and by associating metadata with primitive data to provide contextual information not collected by the DSN. These strategies can be employed recursively. As an example, the result of aggregating typed primitive data provides a new higher level of typed data which contains more context than the data from which is was derived from. This new typed data can itself be aggregated into new, higher level types and also participate in relationships. A second important challenge is managing data volume. Most DSNs produce large amounts of (increasingly multimodal) primitive data, of which only a tiny fraction (the signals) is actually interesting and useful. The DSN can utilize one of two strategies: keep all of the information and primitive data forever, or employ some sort of strategy for systematically discarding (hopefully uninteresting and not useful) data. As sensor networks scale in size, the first strategy becomes unfeasible. Therefore, DSNs must find and implement a strategy for managing large amounts of sensor data. The difficult part is finding an effective and efficient strategy deciding what data is interesting and must be kept and what data to discard. This dissertation investigates the design, implementation, and evaluation of the Laha framework, which provides new insight into both of these problems. First, the Laha framework provides a multi-leveled representation for structuring and processing DSN data. The structure and processing at each level is designed with the explicit goal of turning low-level data into actionable insights. Second, each level in the framework implements a "time-to-live" (TTL) strategy for data within the level. This strategy states that data must either "progress" upwards through the levels towards more abstract, useful representations within a fixed time window, or be discarded and lost forever. The TTL strategy is useful because when implemented, it allows DSN designers to calculate upper bounds on data storage at each level of the framework and supports graceful degradation of DSN performance. There are several smaller, but still important problems that exist within the context of these two larger problems. Examples of the smaller problems that Laha hopes to overcome in transit to the larger goals include optimization of triggering, detection, and classification, building a model of sensing field topology, optimizing sensor energy use, optimizing bandwidth, and providing predictive analytics for DSNs. Laha provides four contributions to the area of DSNs. First, the Laha design, a novel abstract distributed sensor network that provides useful properties relating to data management. Second, an evaluation of the Laha abstract framework through the deployment of two Laha-compliant reference implementations, validated data collection, and several experiments that are used to either confirm or deny the benefits touted by Laha. Third, two Laha-compliant reference implementations, OPQ and Lokahi, which can be used to form DSNs for the collection of distributed power quality signals and the distributed collection of infrasound signals. Fourth, a set of implications for modern distributed sensor networks as a result of the evaluation of Laha. The major claim of this dissertation is that the Laha Framework provides a generally useful representation for real-time high-volume DSNs that address several major issues that modern DSNs face.Item type: Item , Inferring human personality from written media(University of Hawaii at Manoa, 2020) Wright, William Reynolds; Chin, David; Computer ScienceThis work explores the association between human personality and language features consisting of sequences of tokens. My work reveals that there are such features that are predictive of personality over multiple corpora taken from different populations of English speakers. I gathered written text authored by 50 individuals who participated on a bodybuilding web forum (the Forum corpus). Also I administered a personality questionnaire following the protocol provided by the International Personality Item Pool (IPIP). For comparison across other populations I also obtained text corpora from three other research groups, along with the results of personality assessments: the EAR corpora consisting of transcripts of the speech of 96 participants as they go about their daily lives, Essays written by 2,588 undergraduates at the University of Texas and posts by 244 Facebook users. After performing part-of-speech (POS) tagging on the text for all the participants in these corpora, I extracted unigrams, bigrams and trigrams (n-grams) of tokens and their POS tags, and counted every word/tag permutation that appeared. I considered only features appearing one or more times per 1000 words in the Forum corpus because there was not enough data to consider sparser features. I found 766 such features. From among those features I explored which were relevant across both my Forum corpus and at least one of the borrowed corpora, since those are the most promising, robust features that illustrate the possibility of building models across various corpora using the same language features. 75 of the features were associated with one or more personality dimensions across both the Forum corpus and at least one additional corpus. I devised explanations as to why some of the features are correlated with a given personality dimension. That task establishes that although some of the features may have arisen randomly, one can confidently proceed with the conclusion that English speakers consistently express their personalities through their language usage. In addition, to show that it is possible to use these features for prediction, I generated multiple linear regression models for each corpora-personality dimension combination; in the best case (Openness with the Forum corpus) I obtained R2 of 0.686 and S (standard error of the estimate) of 0.561. My work sets a foundation for more robust, accurate models of personality. I hope that others will find additional principled explanations of why the features I found are associated with personality. In the future I anticipate that suitable language-analytical techniques will deepen insight both in the case of English speakers and speakers of additional world languages.Item type: Item , Design, Implementation, And Evaluation Of Napali: A Novel Distributed Sensor Network For Improved Power Quality Monitoring(University of Hawaii at Manoa, 2020) Negrashov, Sergey; Johnson, Philip M.; Computer ScienceToday’s big data world heavily relies upon providing precise, timely, and actionable intelligence, while being burdened by the ever increasing need for data cleaning and preprocessing. While in the case of ingesting large quantity of unstructured data this problem is unavoidable, when it comes to sensor networks built for a specific purpose, such as anomaly detection, some of that computation can be moved to the edge of the network. This thesis concerns the special case of sensor networks tailored for monitoring the power grid for anomalous behavior. These networks monitor power delivery infrastructure with the intent of finding deviations from the nominal steady state, across multiple geographical locations. Aforementioned deviations, known as power quality anomalies, may originate, and be localized to the location of the sensor, or may affect a sizable portion of the power grid. The difficulty of evaluating the extent of a power quality anomaly stems directly from their short temporal and variable geographical impact. I present a novel distributed power quality monitoring system called Napali which relies on extracted metrics from individual meters and their temporal locality in order to intelligently detect anomalies and extract raw data within temporal window and geographical areas of interest. The claims of this thesis are that Napali outperforms existing power quality monitoring gridwide event detection methods in resource utilization and sensitivity. Furthermore, Napali residential monitoring is capable of power grid monitoring without deployment on the high voltage transmission lines. Final claim of this thesis is that Napali capability of extracting portions of the events which did not pass the critical thresholds used in other detection methods allows for better localization of power quality disturbances. Napali claim validation was performed through deployment at the University of Hawaii. Fifteen OPQ Box devices, designed specifically to operate with Napali were located in various locations on campus. Data collected from these monitors was compared with smart meters already deployed across the University. Additionally, Napali was compared with standard methods of power quality event detection running along side the Napali systems. Napali methodology outperformed the standard methods of power quality monitoring in resource consumption, event quality and sensitivity. Additionally, I was able to validate that residential utility monitoring is capable of event detection and localization without monitoring higher levels of the power grid hierarchy. Finally, as a demonstration of Napali capabilities, I showed how data collected by my framework can be used to partition the power delivery infrastructure without prior knowledge of the power grid topology.Item type: Item , Fundamental Design Issues in Anonymous Peer-to-peer Distributed Hash Table Protocols(University of Hawaii at Manoa, 2019) Baumeister, Todd; Dong, Yingfei; Computer ScienceAnonymous communication protocols can be used to protect diminishing user privacy online. One family of those protocols is anonymous peer-to-peer (ANP2P) distributed hash table (DHT). These protocols are considered efficient, scalable, decentralized, and practical. However, the anonymity properties of these protocols are not well understood and difficult to quantify. As a result, users may have a false sense of anonymity when using these protocols. This motivates our study of the anonymity properties of ANP2P DHT protocols. In this study, we analyzed the Freenet and GNUnet systems.We cataloged the main design decisions and identified three vulnerabilities in these ANP2P DHT protocols. The first vulnerability was the Traceback attack, which enables an adversary to determine which subset of nodes routed a given message. The second vulnerability was the Routing Table Insertion attack, which can be used by an adversary to place themselves in a victim's routing table. We developed this attack to support the Traceback attack, and we present two potential mitigations - routing randomness and Look Ahead Hint. The third vulnerability was an attack on GNUnet's message bloom filter. Similar to the Traceback attack, the GNUnet bloom filter vulnerability can be used to determine which subset of nodes routed a given message. We then developed an empirical methodology for modeling a generic adversary and evaluating the performance and anonymity of ANP2P DHT design decisions. We created a novel adversarial model that uses protocol behaviors and shared states to sample the entropy in an ANP2P DHT system. The adversarial model implements a routing path Walk-back Ranking Algorithm that can be used to identify potential sender nodes for a given message. The methodology was then applied to an extension of the peer-to-peer network simulator PeerSim. We extended PeerSim to implement various ANP2P DHT design decisions. We then used the extended PeerSim and the methodology to evaluate anonymity and performance for structured, small-world, and random topologies using look ahead values of one-hop and two-hop. These experiments also validated the accuracy of the methodology. Next, we used the output of the methodology to provide a quantitative comparison of the performance and anonymity for the design decisions. The methodology was also able to identify several network sub-graphs that degraded anonymity in small-world topologies. Protocol designers can use our methodology to evaluate anonymity and performance of their protocols, and to identify the existence of network sub-graphs that degrade anonymity. Once these sub-graphs have been identified, protocol designers can create appropriate controls to prevent their formation. Future work includes applying our empirical methodology to the mitigations we proposed for the Routing Table Insertion attack and evaluating the methodology on a real-world protocol.
