IRIS-DS: A New Approach for Identifiers and References Discovery in Document Stores
Date
2021-01-05
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
970
Ending Page
Alternative Title
Abstract
NoSQL stores offer a new cost-effective and schema-free system. Although it is widely accepted today, Business Intelligence & Analytics (BI&A) remains associated with relational databases. Exploiting schema-free data for analytical purposes is issuing a challenge since it requires reviewing all the BI&A phases, particularly the Extract-Transform-Load (ETL) process, to fit big data sources as document stores. In the ETL process, the join of several collections, with a lack of explicitly known join fields, is a significant challenge. Detecting these fields manually is time and effort consuming, and even infeasible in large-scale datasets. In this paper, we study the problem of discovering join fields automatically, and introduce an algorithm to detect both identifiers and references on several document stores. The modus operandi of our approach underscores two core stages: (i) discovery of identifier candidates; and (ii) identifying candidate pairs of identifier and reference fields. We use scoring features and pruning rules based on both syntactic and semantic aspects to efficiently discover true candidates from a huge number of initial ones. Finally, we report our experimental findings that show very promising results.
Description
Keywords
Big Data and Analytics: Pathways to Maturity, document-oriented stores, extract-transform-load, identifier discovery, join, nosql, reference discovery
Citation
Extent
10 pages
Format
Geographic Location
Time Period
Related To
Proceedings of the 54th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Local Contexts
Collections
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.