Article: Possible Fashion Images. Operative Ekphrasis and the Reduction of Fashion Through Multimodal AI
Author(s):
Abstract
With the rise of multimodal AI tools, such as DALL-E, Midjourney, or Stable Diffusion, it has never been easier and faster to produce compelling fashion images. While the proliferation of AI-generated fashion images can be seen as the ultimate triumph of the fashion image over both fashion text and fashionable garments, this paper critically examines how these images are produced, what exactly they show, and ultimately, how this impacts our understanding of fashion. Building on current research on multimodal AI, specifically text-to-image models, I argue that AI-generated fashion images are produced through textual means, that is, through “operative ekphrasis” (Bajohr 2024: 83). This means that the generation of fashion images created via multimodal AI is based on textual prompts that extract statistically probable images from a latent space, which, in turn, is the result of previously indexed and labeled image-text pairs. Because the indexing of images requires a reduction of complexity, images generated through multimodal AI are often generic. In the case of fashion images, this is reflected in the lack of texture and the reproduction of fashion photography’s conventions. As fashion photography is not necessarily a representation of ‘the world’ or even fashionable garments, it resembles AI-generated images more than other photographic genres. AI-generated fashion images are then not images of fashionable garments, but images about fashion images—a statistically probable permutation of fashion photographs from the past (cf. Meyer 2023: 108). As such, AI-generated fashion images are not only the product of a mediated collective imaginary but also feed back into it. The view on fashion shaped by AI-generated fashion images is then based on reduction and stylized normativity.
The item has been published with the following license: Unter Urheberrechtsschutz
