DOCUSAGE: HARNESSING HIERARCHICAL CLUSTERING IN SALIENCE-DRIVEN NARRATIVE SYNTHESIS

dc.contributor.advisorBelcaid, Mahdi
dc.contributor.authorSadmanee, Akib
dc.contributor.departmentComputer Science
dc.date.accessioned2024-10-09T23:45:57Z
dc.date.available2024-10-09T23:45:57Z
dc.date.issued2024
dc.description.degreeM.S.
dc.identifier.urihttps://hdl.handle.net/10125/108680
dc.subjectArtificial intelligence
dc.subjectComputer science
dc.subjectDataset synthesis
dc.subjectNarrative synthesis
dc.subjectNatural language processing
dc.subjectText summarization
dc.titleDOCUSAGE: HARNESSING HIERARCHICAL CLUSTERING IN SALIENCE-DRIVEN NARRATIVE SYNTHESIS
dc.typeThesis
dcterms.abstractText summarization remains a crucial yet challenging task in natural language processing, especially as the volume of text data grows exponentially. This thesis introduces Sumsage, a new optimization-based text summarization method that synthesizes concise yet informative summaries. Our work presents several notable contributions to the field. We developed the Syn-D-sum dataset from the CNN/DailyMail dataset, creating a robust resource for training and evaluating summarization models. We also propose the Sumsage algorithm, which leverages hierarchical clustering to extract key sentences and construct coherent summaries, closely emulating human summarizers. Additionally, we designed two new evaluation methods: the Symphony penalty and the Captured Importance Quantification scores, which assess the quality of generated summaries by considering both narrative structure and sentence order. Sumsage’s dynamic tree structure and hierarchical clustering approach enable efficient and scalable summarization while maintaining contextual relevance and minimizing hallucination. Additionally, our experiments show that Sumsage yields superior performance over GPT-3.5-turbo, generating summaries similar to those written by humans and capturing more essential information. Sumsage represents a novel advancement in text summarization, offering a robust and interpretable method for generating high-quality summaries. This approach not only addresses current challenges but also lays the foundation for future innovations in narrative synthesis and evaluation.
dcterms.extent67 pages
dcterms.languageen
dcterms.publisherUniversity of Hawai'i at Manoa
dcterms.rightsAll UHM dissertations and theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission from the copyright owner.
dcterms.typeText
local.identifier.alturihttp://dissertations.umi.com/hawii:12294

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Sadmanee_hawii_0085O_12294.pdf
Size:
1.46 MB
Format:
Adobe Portable Document Format