DOCUSAGE: HARNESSING HIERARCHICAL CLUSTERING IN SALIENCE-DRIVEN NARRATIVE SYNTHESIS
Date
2024
Authors
Contributor
Advisor
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Volume
Number/Issue
Starting Page
Ending Page
Alternative Title
Abstract
Text summarization remains a crucial yet challenging task in natural language processing, especially as the volume of text data grows exponentially. This thesis introduces Sumsage, a new optimization-based text summarization method that synthesizes concise yet informative summaries. Our work presents several notable contributions to the field. We developed the Syn-D-sum dataset from the CNN/DailyMail dataset, creating a robust resource for training and evaluating summarization models. We also propose the Sumsage algorithm, which leverages hierarchical clustering to extract key sentences and construct coherent summaries, closely emulating human summarizers. Additionally, we designed two new evaluation methods: the Symphony penalty and the Captured Importance Quantification scores, which assess the quality of generated summaries by considering both narrative structure and sentence order. Sumsage’s dynamic tree structure and hierarchical clustering approach enable efficient and scalable summarization while maintaining contextual relevance and minimizing hallucination. Additionally, our experiments show that Sumsage yields superior performance over GPT-3.5-turbo, generating summaries similar to those written by humans and capturing more essential information. Sumsage represents a novel advancement in text summarization, offering a robust and interpretable method for generating high-quality summaries. This approach not only addresses current challenges but also lays the foundation for future innovations in narrative synthesis and evaluation.
Description
Keywords
Artificial intelligence, Computer science, Dataset synthesis, Narrative synthesis, Natural language processing, Text summarization
Citation
Extent
67 pages
Format
Geographic Location
Time Period
Related To
Related To (URI)
Table of Contents
Rights
All UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
Rights Holder
Local Contexts
Collections
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.