Zero-shot Comparison of Large Language Models (LLMs) Reasoning Abilities on Long-text Analogies
dc.contributor.author | Combs, Kara | |
dc.contributor.author | Bihl, Trevor | |
dc.contributor.author | Howlett, Spencer | |
dc.contributor.author | Adams, Yuki | |
dc.date.accessioned | 2024-12-26T21:05:46Z | |
dc.date.available | 2024-12-26T21:05:46Z | |
dc.date.issued | 2025-01-07 | |
dc.description.abstract | In recent years, large language models (LLMs) have made substantial strides in mimicking human language and coherently presenting information. However, researchers continue to debate the accuracy and robustness of LLMs’ reasoning abilities. The reasoning abilities of thirteen LLMs were tested on two long-text analogy datasets, named Rattermann and Wharton, which required them to rank a series of stories from most analogous to least analogous compared to a source story. On the Rattermann dataset, GPT-4 obtained the highest accuracy of 70%. As a whole, LLMs seem to struggle with over-emphasizing similar story entities (characters and settings) and a lack of awareness of higher-order relationship(s) between stories. LLMs struggled more with the Wharton dataset, with the highest accuracy achieved being 46.4% by GPT-4o, and all but nine LLMs performing below random chance accuracy. Although LLMs are improving, they still struggle with higher-cognitive tasks such as analogical reasoning. | |
dc.format.extent | 10 | |
dc.identifier.doi | 10.24251/HICSS.2025.194 | |
dc.identifier.isbn | 978-0-9981331-8-8 | |
dc.identifier.other | f0d5f5dd-f687-4c3c-bcf6-13976f52e3a9 | |
dc.identifier.uri | https://hdl.handle.net/10125/109034 | |
dc.relation.ispartof | Proceedings of the 58th Hawaii International Conference on System Sciences | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.subject | Natural Language Processing and Large Language Models Supporting Data Analytics for System Sciences | |
dc.subject | analogical reasoning, artificial intelligence, generative ai, large language models, zero-shot learning | |
dc.title | Zero-shot Comparison of Large Language Models (LLMs) Reasoning Abilities on Long-text Analogies | |
dc.type | Conference Paper | |
dc.type.dcmi | Text | |
prism.startingpage | 1610 |
Files
Original bundle
1 - 1 of 1