Prompt Engineering

Driven by the rapid development of Artificial Intelligence, new possibilities of image production are emerging at an enormous speed. Reactions are equally strong, ranging from creators being afraid of being replaced by machines to fascination about the democratization of design processes and the emergence of new genre and self-designations ("AI Art / AI Artist"). The wide range of responses demonstrates the enormous power of new technologies , such as text-to-image models, that will fundamentally change the way creatives work.

In order to understand and actively shape this change , the Prompt Engineering Project, as an interdisciplinary project, aims to gather different positions and foster exchange between them. We want to connect researchers and practitioners from different disciplines, from tech to art and beyond, in order to find answers and solutions.

Prompt Engineering?

In computer science prompts are commonly known as the task you give a program to execute something. Established in natural language processing, prompt engineering means designing your input in a way that the machine learning model produces an output according to the user’s expectations. This also applies to text-to-image models, where prompt engineering is already discussed as a new profession.

Key Concepts

"Good" images - alignment to what and to whom

The enormously fast technical progress makes it possible to generate better and better images. But what does better mean here? An accurate reproduction of the training data? An image that is as photorealistic as possible? A precise or literal translation of text prompts into images? From amateurs who can suddenly create works they would have had to spend years learning techniques to produce, to professional creators who want to incorporate the algorithms into their workflow, to developing tech companies: What ideas of a "good picture" do the target groups have?

"New" images - agents or models

A program that can generate an infinite number of new images - that's what text-to-image models promise. But what does it mean to create something new? By training with billions of already existing images, established styles and content are reproduced. How can we escape this aesthetic echo chamber? Can something innovative actually be created, for example, through an unexpected combination and bringing together of concepts never before associated? Is this even a necessary criterion for such technologies?

Human-Computer Interaction - Ease of Use or Control

The computer painted a picture or the machine produced a painting - such statements give the impression that programs can autonomously produce a work of art. However, the prompt, i.e. the instruction to the program, still has to be conceptualized, entered, evaluated and adapted by a human being. So what might meaningful collaboration between humans and computers look like? How much control do we need over the output and how many decisions do we leave to the program? What might an interface look like and what brushes or tools do we need, if any, besides the text prompt? Is our language powerful enough to give us enough control?

Semiotics / Multimodality - Tokenization or Cultural Complexity

If we show the same picture to a hundred different people and ask them to describe it, we are likely to get a hundred different answers. Both image and language have numerous, socio-cultural dimensions and, as two distinct modalities, are different in structure. Can this complexity of our image and language world be modeled algorithmically at all? Most models use the "alt tags" of images published on websites, resulting in a simplified, literal description. What other image-text relations would be conceivable, helpful, and implementable?

Escalation - hurdle or way opener

Escalation of our image canon: Whose images are reproduced, who is represented? Escalation of the speed of production: Are we drowning in a flood of images of the same aesthetics or are we generating new creative potential through the quick and easy production of variant images? Escalation of the question of originality: What role does an image have, if now seemingly everyone can produce a painting in the style of van Gogh or credibly represent alternative facts in formerly reliable media such as photo or video?

Now we have all the questions, it’s time to find answers!

We are looking for people from various fields who engage with multimodal AI models to build a database of practitioners, artists, designers, researchers and developers and map out your answers and thoughts. Want to participate? Write us an E-Mail or contact via Social Media!


This project is the follow-up project to the final thesis Algorithms designing our future – Critical engagement with text-to-image algorithms and call for interdisciplinary research by Katharina Mumme at the Department of Design at Hamburg University of Applied Sciences. Feel free to check out the research on this website. Or get in contact to read the thesis.

This project is part of the research project at the Department of Design at Hamburg University of Applied Sciences.

The underlined words are used as prompts in Stable Diffusion for the images.