EXPLORATORY ANALYSIS OF RESEARCH PUBLICATIONS COLLECTIONS WITH HUMAN STEERABLE BLACK-BOX MODELS. TOWARDS GENERALIZING INVERSE COMPUTATIONS FOR SEMANTIC INTERACTION.

Gonzalez Martinez, Alberto

EXPLORATORY ANALYSIS OF RESEARCH PUBLICATIONS COLLECTIONS WITH HUMAN STEERABLE BLACK-BOX MODELS. TOWARDS GENERALIZING INVERSE COMPUTATIONS FOR SEMANTIC INTERACTION.

dc.contributor.advisor	Leigh, Jason
dc.contributor.author	Gonzalez Martinez, Alberto
dc.contributor.department	Computer Science
dc.date.accessioned	2021-07-29T23:15:52Z
dc.date.available	2021-07-29T23:15:52Z
dc.date.issued	2021
dc.description.degree	Ph.D.
dc.identifier.uri	http://hdl.handle.net/10125/75925
dc.subject	Computer science
dc.subject	Analytics
dc.subject	Human in the Loop
dc.subject	Machine Learning
dc.subject	Semantic Interaction
dc.subject	Visual Analytics
dc.subject	Visualization
dc.title	EXPLORATORY ANALYSIS OF RESEARCH PUBLICATIONS COLLECTIONS WITH HUMAN STEERABLE BLACK-BOX MODELS. TOWARDS GENERALIZING INVERSE COMPUTATIONS FOR SEMANTIC INTERACTION.
dc.title.alternative	Análisis de publicaciones de investigación con modelos manipulables. Generalizando computaciones inversas en modelos de Inteligencia Artificial.
dc.type	Thesis
dcterms.abstract	Understanding highly-dimensional data sets is a complex task for many scientists, engineers, and intelligence analysts. Traditionally, this problem has been tackled with linear pipelines that rely on mathematical models and algorithms to summarize relationships and structure, producing a visual representation of the data in a collapsed, low-dimensional form. The main issue with these traditional pipelines is that they are driven solely by algorithms or models, and without a human in the loop, they can potentially limit sense-making by masking expected or known structure in the data. In recent years, Semantic Interaction has become a promising approach as a user interaction methodology for model steering in Visual Analytics systems, as it provides mechanisms with which to adjust the parameter space, explore data, and test hypotheses. Under the paradigm of Semantic Interaction, users can steer model parameters and explore data representations without leaving the visual space, thus combining algorithms and models with expert human judgment. Semantic Interaction systems need to invert the computation of one or more mathematical models to support a bidirectional structure within their pipelines to facilitate this interaction modality. For example, dimensionality reduction and clustering are frequently used to explore multidimensional data in Visual Analytic systems and are typically always present in Semantic Interaction systems. Since users interact with clustered data in its compressed form, the system needs to link this compressed form to the original high dimensional representation to affect the model and algorithms from within the visualization. The necessity of this reverse link from the low-dimensional representation to the high-dimensional input space requires that Semantic Interaction pipelines be bidirectional. Most examples of Semantic Interaction systems make use of simple and interpretable linear models for dimensionality reduction and clusterings such as LDA (Latent Dirichlet Allocation) and PCA (Principal Component Analysis) to be able to provide a straightforward bidirectional pipeline. By contrast, the state-of-the-art techniques for dimensionality reduction and clustering in visual analytics, such as t-distributed stochastic neighbor embedding (t-SNE) and uniform manifold approximation and projection (UMAP), are "black-box" models, which are neither linear nor directly interpretable. Furthermore, these techniques are computationally expensive, suffer from out-of-sample stability problems, and are complex to retrain for new instances, requiring precise hyper-parameter tuning. A novel Deep Surrogate model approach is proposed in this thesis to perform backward and forward computations within semantic interaction pipelines that were previously implemented with "black-box" models. This approach allows for the efficient "merging" of new instances into a previously trained model without retraining. It also provides a reverse link, allowing a trained model's parameters to be affected by user interactions with the visual representation of data. To demonstrate this approach's usefulness, I present the Zexplorer system, a tool for exploring Large Document Collections of Research papers with Semantic Interaction, as well as a user study to validate the approach. The Zexplorer system is built as an extension to Zotero, a widely-used open source bibliography system.
dcterms.extent	113 pages
dcterms.language	en
dcterms.publisher	University of Hawai'i at Manoa
dcterms.rights	All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
dcterms.type	Text
local.identifier.alturi	http://dissertations.umi.com/hawii:10921

Files

Original bundle

Now showing 1 - 1 of 1

Name:: GonzalezMartinez_hawii_0085A_10921.pdf
Size:: 4.17 MB
Format:: Adobe Portable Document Format

Download

Collections

Ph.D. - Computer Science