EXPLORATORY ANALYSIS OF RESEARCH PUBLICATIONS COLLECTIONS WITH HUMAN STEERABLE BLACK-BOX MODELS. TOWARDS GENERALIZING INVERSE COMPUTATIONS FOR SEMANTIC INTERACTION.

dc.contributor.advisorLeigh, Jason
dc.contributor.authorGonzalez Martinez, Alberto
dc.contributor.departmentComputer Science
dc.date.accessioned2021-07-29T23:15:52Z
dc.date.available2021-07-29T23:15:52Z
dc.date.issued2021
dc.description.degreePh.D.
dc.identifier.urihttp://hdl.handle.net/10125/75925
dc.subjectComputer science
dc.subjectAnalytics
dc.subjectHuman in the Loop
dc.subjectMachine Learning
dc.subjectSemantic Interaction
dc.subjectVisual Analytics
dc.subjectVisualization
dc.titleEXPLORATORY ANALYSIS OF RESEARCH PUBLICATIONS COLLECTIONS WITH HUMAN STEERABLE BLACK-BOX MODELS. TOWARDS GENERALIZING INVERSE COMPUTATIONS FOR SEMANTIC INTERACTION.
dc.title.alternativeAnálisis de publicaciones de investigación con modelos manipulables. Generalizando computaciones inversas en modelos de Inteligencia Artificial.
dc.typeThesis
dcterms.abstractUnderstanding highly-dimensional data sets is a complex task for many scientists, engineers, and intelligence analysts. Traditionally, this problem has been tackled with linear pipelines that rely on mathematical models and algorithms to summarize relationships and structure, producing a visual representation of the data in a collapsed, low-dimensional form. The main issue with these traditional pipelines is that they are driven solely by algorithms or models, and without a human in the loop, they can potentially limit sense-making by masking expected or known structure in the data. In recent years, Semantic Interaction has become a promising approach as a user interaction methodology for model steering in Visual Analytics systems, as it provides mechanisms with which to adjust the parameter space, explore data, and test hypotheses. Under the paradigm of Semantic Interaction, users can steer model parameters and explore data representations without leaving the visual space, thus combining algorithms and models with expert human judgment. Semantic Interaction systems need to invert the computation of one or more mathematical models to support a bidirectional structure within their pipelines to facilitate this interaction modality. For example, dimensionality reduction and clustering are frequently used to explore multidimensional data in Visual Analytic systems and are typically always present in Semantic Interaction systems. Since users interact with clustered data in its compressed form, the system needs to link this compressed form to the original high dimensional representation to affect the model and algorithms from within the visualization. The necessity of this reverse link from the low-dimensional representation to the high-dimensional input space requires that Semantic Interaction pipelines be bidirectional. Most examples of Semantic Interaction systems make use of simple and interpretable linear models for dimensionality reduction and clusterings such as LDA (Latent Dirichlet Allocation) and PCA (Principal Component Analysis) to be able to provide a straightforward bidirectional pipeline. By contrast, the state-of-the-art techniques for dimensionality reduction and clustering in visual analytics, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), are "black-box" models, which are neither linear nor directly interpretable. Furthermore, these techniques are computationally expensive, suffer from out-of-sample stability problems, and are complex to retrain for new instances, requiring precise hyper-parameter tuning. A novel Deep Surrogate model approach is proposed in this thesis to perform backward and forward computations within semantic interaction pipelines that were previously implemented with "black-box" models. This approach allows for the efficient "merging" of new instances into a previously trained model without retraining. It also provides a reverse link, allowing a trained model's parameters to be affected by user interactions with the visual representation of data. To demonstrate this approach's usefulness, I present the Zexplorer system, a tool for exploring Large Document Collections of Research papers with Semantic Interaction, as well as a user study to validate the approach. The Zexplorer system is built as an extension to Zotero, a widely-used open source bibliography system.
dcterms.extent113 pages
dcterms.languageen
dcterms.publisherUniversity of Hawai'i at Manoa
dcterms.rightsAll UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
dcterms.typeText
local.identifier.alturihttp://dissertations.umi.com/hawii:10921

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
GonzalezMartinez_hawii_0085A_10921.pdf
Size:
4.17 MB
Format:
Adobe Portable Document Format