Zero-shot Comparison of Large Language Models (LLMs) Reasoning Abilities on Long-text Analogies

Combs, Kara; Bihl, Trevor; Howlett, Spencer; Adams, Yuki

Zero-shot Comparison of Large Language Models (LLMs) Reasoning Abilities on Long-text Analogies

dc.contributor.author	Combs, Kara
dc.contributor.author	Bihl, Trevor
dc.contributor.author	Howlett, Spencer
dc.contributor.author	Adams, Yuki
dc.date.accessioned	2024-12-26T21:05:46Z
dc.date.available	2024-12-26T21:05:46Z
dc.date.issued	2025-01-07
dc.description.abstract	In recent years, large language models (LLMs) have made substantial strides in mimicking human language and coherently presenting information. However, researchers continue to debate the accuracy and robustness of LLMs’ reasoning abilities. The reasoning abilities of thirteen LLMs were tested on two long-text analogy datasets, named Rattermann and Wharton, which required them to rank a series of stories from most analogous to least analogous compared to a source story. On the Rattermann dataset, GPT-4 obtained the highest accuracy of 70%. As a whole, LLMs seem to struggle with over-emphasizing similar story entities (characters and settings) and a lack of awareness of higher-order relationship(s) between stories. LLMs struggled more with the Wharton dataset, with the highest accuracy achieved being 46.4% by GPT-4o, and all but nine LLMs performing below random chance accuracy. Although LLMs are improving, they still struggle with higher-cognitive tasks such as analogical reasoning.
dc.format.extent	10
dc.identifier.doi	10.24251/HICSS.2025.194
dc.identifier.isbn	978-0-9981331-8-8
dc.identifier.other	f0d5f5dd-f687-4c3c-bcf6-13976f52e3a9
dc.identifier.uri	https://hdl.handle.net/10125/109034
dc.relation.ispartof	Proceedings of the 58th Hawaii International Conference on System Sciences
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Natural Language Processing and Large Language Models Supporting Data Analytics for System Sciences
dc.subject	analogical reasoning, artificial intelligence, generative ai, large language models, zero-shot learning
dc.title	Zero-shot Comparison of Large Language Models (LLMs) Reasoning Abilities on Long-text Analogies
dc.type	Conference Paper
dc.type.dcmi	Text
prism.startingpage	1610

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 0158.pdf
Size:: 464.73 KB
Format:: Adobe Portable Document Format

Download

Collections

Natural Language Processing and Large Language Models Supporting Data Analytics for System Sciences