Evaluating Summarization Quality of Locally Hosted 3-Billion Parameter Large Language Models
Loading...
Files
Date
Contributor
Advisor
Editor
Performer
Department
Instructor
Depositor
Speaker
Researcher
Consultant
Interviewer
Interviewee
Narrator
Transcriber
Annotator
Journal Title
Journal ISSN
Volume Title
Publisher
Journal Name
Volume
Number/Issue
Starting Page
1746
Ending Page
Alternative Title
Abstract
This study evaluates summarization performance of 3-billion parameter large language models that can run locally on consumer-grade hardware. Using a corpus of 1,000 articles from the XSum dataset, models from the LLaMA, Phi, and Qwen families generated single-sentence summaries with a unified zero-shot prompt. A total of 4,000 summaries, consisting of model-generated outputs and human-authored references, were analyzed using 74 extracted features capturing linguistic abstractiveness, extractiveness, and informativeness. From these, 41 metrics were selected for nonparametric statistical comparison to evaluate model performance relative to human-written summaries. Results show that Phi and LLaMA frequently outperformed the human baseline in informativeness and extractiveness, while struggling with abstraction. Qwen performed well in content retention but was less consistent overall. These findings suggest small-scale models can achieve near-human summarization quality across several metrics but still struggle with abstraction. The study underscores their promise for privacy-centric, resource-limited environments and the importance of transparent, multidimensional evaluation.
Description
Citation
DOI
Extent
10 pages
Format
Type
Conference Paper
Geographic Location
Time Period
Related To
Proceedings of the 59th Hawaii International Conference on System Sciences
Related To (URI)
Table of Contents
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International
Rights Holder
Catalog Record
Local Contexts
Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.
