The Visual Analogs of Linguistic Concepts and Their Implications on Generative AI

Date

2025-01-07

Contributor

Advisor

Department

Instructor

Depositor

Speaker

Researcher

Consultant

Interviewer

Narrator

Transcriber

Annotator

Journal Title

Journal ISSN

Volume Title

Publisher

Volume

Number/Issue

Starting Page

713

Ending Page

Alternative Title

Abstract

Many visual generative artificial intelligence (AI) models use textual “prompts” as input(s) to guide the development of the resulting image(s). Converting text to images utilizes pragmatics and semantics, which can make an impact on the output. To facilitate more precise prompting, we propose the three-dimensional vector space of textual similarity which uses textual representation, auditory representation, and meaning similarity as its axes. Next, we show that meaning similarity between two words does not necessarily yield visual similarity between corresponding AI-generated images of those words. We quantitively justify this by leveraging eight image generators to generate images for abstract and concrete synonyms, antonyms, and hypernyms-hyponym pairs and compare their image-image CLIPScores to their corresponding text-text CLIPScores. Across all models and relationship types the average similarity comparing text-text and image-image similarity decreased from 92.8% to 70.1% for synonyms, 89% to 58.9% for antonyms, and 85.6% to 68.1% for hypernym-hyponym pairs.

Description

Keywords

Technological Advancements in Digital Collaboration with Generative AI and Large Language Models, generative ai, human computer interaction, large language models, prompt engineering, visual analogies

Citation

Extent

10

Format

Geographic Location

Time Period

Related To

Proceedings of the 58th Hawaii International Conference on System Sciences

Related To (URI)

Table of Contents

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Rights Holder

Local Contexts

Email libraryada-l@lists.hawaii.edu if you need this content in ADA-compliant format.