Zero-shot Comparison of Large Language Models (LLMs) Reasoning Abilities on Long-text Analogies

dc.contributor.authorCombs, Kara
dc.contributor.authorBihl, Trevor
dc.contributor.authorHowlett, Spencer
dc.contributor.authorAdams, Yuki
dc.date.accessioned2024-12-26T21:05:46Z
dc.date.available2024-12-26T21:05:46Z
dc.date.issued2025-01-07
dc.description.abstractIn recent years, large language models (LLMs) have made substantial strides in mimicking human language and coherently presenting information. However, researchers continue to debate the accuracy and robustness of LLMs’ reasoning abilities. The reasoning abilities of thirteen LLMs were tested on two long-text analogy datasets, named Rattermann and Wharton, which required them to rank a series of stories from most analogous to least analogous compared to a source story. On the Rattermann dataset, GPT-4 obtained the highest accuracy of 70%. As a whole, LLMs seem to struggle with over-emphasizing similar story entities (characters and settings) and a lack of awareness of higher-order relationship(s) between stories. LLMs struggled more with the Wharton dataset, with the highest accuracy achieved being 46.4% by GPT-4o, and all but nine LLMs performing below random chance accuracy. Although LLMs are improving, they still struggle with higher-cognitive tasks such as analogical reasoning.
dc.format.extent10
dc.identifier.doi10.24251/HICSS.2025.194
dc.identifier.isbn978-0-9981331-8-8
dc.identifier.otherf0d5f5dd-f687-4c3c-bcf6-13976f52e3a9
dc.identifier.urihttps://hdl.handle.net/10125/109034
dc.relation.ispartofProceedings of the 58th Hawaii International Conference on System Sciences
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectNatural Language Processing and Large Language Models Supporting Data Analytics for System Sciences
dc.subjectanalogical reasoning, artificial intelligence, generative ai, large language models, zero-shot learning
dc.titleZero-shot Comparison of Large Language Models (LLMs) Reasoning Abilities on Long-text Analogies
dc.typeConference Paper
dc.type.dcmiText
prism.startingpage1610

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
0158.pdf
Size:
464.73 KB
Format:
Adobe Portable Document Format