Diffusions


“The real is produced from miniaturized cells, matrices, and memory banks, models of control – and it can be reproduced an indefinite number of times from these. It no longer needs to be rational, because it no longer measures itself against an ideal or negative instance. It is no longer anything but operational. In fact, it is no longer really the real, because no imaginary envelops it anymore. It is a hyperreal, produced from a radiating synthesis of combinatory models in a hyperspace without atmosphere.”


Since last year, with the introduction of Stable Diffusion, Midjourney and Dall.E2, several computer programs are available on the internet which, when you enter a text, produce sophisticated images. The images are detailed and show seemingly effortless complex shapes, structures and textures. Working with these programmes can be a lot of fun, especially because of the surprising combinations of the known and the unknown the software produces, because you don't know what is really happening, and because it delivers results easily and incredibly quickly.D The results are improving almost every month when measured against their ability to convincingly depict in detail an apparent or possible reality. They will at least produce new aesthetic sensibilities and maybe even more.

Pictorial Turn

Web magazines have surpassed the amounts of subscriptions and page views traditional architectural magazines used to have. ArchDaily boasts 285 million monthly page views, 17,9 million monthly visits, 3,4 million Facebook Fans and 4,2 million Instagram followers. Not even in their heyday did print magazines reach such numbers. The largest architectural magazines had maybe 30.000 to 60.000 subscriptions, never more, and today there aren’t many left with more than 10.000 subscriptions. The new web magazines have a global reach – both in terms of content and in terms of readers. Whether through photographs or through renderings, images – digital images – are the dominant and most important medium to communicate architecture today. The sophistication of these images in depicting detail, sharpness, textures, weather, and atmospheres is high and incredibly seductive – whether the project has been realized or not.

In a reaction to the smoothness of most of these images, collage, as a technique to communicate architectural ideas, has celebrated an unexpected comeback. Already in 2013, Pedro Gadanho curated the exhibition Cut ’n’ Paste: From Architectural Assemblage to Collage City in the Museum of Modern Art in New York and ever since there has been a steady flow of exhibitions, publications, and symposia around the theme of collage in architecture. Sam Jacob already spoke of its return with a vengeance.1

According to Jacob, the comeback of the collage is a return to drawing after rendering software had taken that out of the hands of the architects. It’s part of a fight against the alienation new technologies causes. “Growing computational power was harnessed to produce rendered images - glossy visions of soon-to-be-built projects, usually blue-skyed, lush-leafed, and populated by groups of groomed and grinning clip-art figures, where buildings appeared with a polished sheen and lens flares proliferated.”2 This may very well be true, even if the software Jacob is referring to is also increasingly used to produce staggering dystopian fantasies about, and dark comments on, our built-up reality, not least as part of the special effects industry in filmmaking. This is where this software originally came from before it fell into architects’ hands in the famous Paperless Studio in Columbia University in the early 1990s. These fantasies may very well be considered a contemporary equivalent to drawing and painting. The cinematic dystopian architectural and urban worlds of Liam YoungA, are a kind of homecoming in this respect.

Also, several Italian architects have recently taken up the technique of collage again to communicate their ideas about architecture. The best-known examples are Carmelo Baglivo, Luca Galofaro and Beniamino Servino, but others, like Davide Trabucco, follow in their footsteps.3 One of the advantages of collage is that one can introduce (parts of) photographic representations of reality, including styles, materials, and textures. But the biggest advantage is that making a collage spreads relatively quickly. Even if Baglivo, Galofaro and Servino also produce books, these collages are posted on social media platforms like Facebook and Instagram first, where they seem to be more at home. They often take the form of memes: combinations of quick remixed graphic ideas with simple texts that work within particular cultural discourses. In the Italian context, one can see different authors communicating with each other through collages. Text plays a minor role.

Diffusion

It is in this context that AI programmes appear on the scene that can generate equally sophisticated digital imagery as photographs and renderings and produce images even quicker than one could do with a collage. Stable Diffusion, Midjourney and Dall.E2, does that for you when you just enter a text. The image appears in less than sixty seconds. The images are detailed and show seemingly complex shapes, structures, styles, and textures. Manifestations of nature – skin, hair, and greenery – are effortlessly depicted. However, this does not mean diffusion software is well suited to depict an existing reality. When, for example, trying to represent a portrait of an existing person or city, it only works to a certain extent with examples that are famous in the United States – say, Donald Trump, or Manhattan seen from the Brooklyn BridgeB – all appearing with major faults. Images of less famous people or scenes sometimes only have a faint resemblance to the original. Hands and texts are notoriously problematic, but Midjourney in particular is getting better in terms of realism with every new version, as they work with the largest datasets.  

Even if we speak of Artificial Intelligence, we should remember that the current text to image models or diffusion models are forms of “machine learning". They are trained on extremely large datasets of titled images, but these are not simply used as they are. Noise is added, which basically destroys the original images. After that, they can remove the noise and “recognize” the prompted image from selected material. If we want a realistic depiction of someone or something, we still better go to Google Image Search. We are far removed here from programs like ChatGPT, which – with some reservations – might come close to delivering results that may compete with Google and Wikipedia in their ability to build up arguments or have conversations that go beyond their predecessors. At the same time, we should not understand these AI programs as forms of intelligence that can generate something really new and unexpected, but only as other orders of existing things. Only every now and then something may go wrong and something unknown may emerge by chance. It is then said that the programmes hallucinate.

When working with diffusion models, one tries to understand how one can control the technology after all, but it's not that simple. It's always about the "prompt": the text that sets everything in motion. But they do more than that: prompts also frame the result and can control the content of the result and its aesthetics to a certain degree. "Prompt engineers" are already very experienced at achieving results that come close to the expected result or even beyond that. Websites like PromptHero show examples and offer courses in formulating prompts to achieve ever more perfect results: whatever perfection may be in this case. If it’s about achieving a detailed, seemingly realistic image, that might soon be possible. If it’s about realizing an image that comes close to an image one has in mind from the beginning, it might remain problematic. But one of the more amusing aspects may very well be that the model produces something unexpected.

Learning and unlearning

The learning material defines many of the biases of diffusion models. The datasets are provided by firms like the German non-profit organization LAION and the American Common Crawl. The latter collects 3 billion Internet pages per month. According to The Guardian, “Researchers at LAION took a chunk of the Common Crawl data and pulled out every image with an “alt” tag, a line or so of text meant to be used to describe images on web pages. After some trimming, links to the original images and the text describing them are released in vast collections: LAION-5B, released in March 2022, contains more than five billion text-image pairs. These images are “public” images in the broadest sense: any image ever published on the internet may be gathered up into them, with exactly the kind of strange effects one may expect.”4

Still, Midjourney's learning material clearly has its focus on American examples. Then comes Europe, and eventually the rest of the world. It affirms the biases of Western society and has clear racial and gender prejudices. If one wants a female professional or a person of color in an image, one must put that into the prompt explicitly. On top of that, all representations have crucial flaws. It is not without a reason that the first thing you type in the Midjourney bot is /imagine. It produces an imaginary world, a possible world, a proto-surrealist world, the laws of which are the laws of Alfred Jarry’s (1873–1907) ‘pataphysics, a physics of the possible beyond metaphysics’?. It’s a world without moral impetus – apart from the biases and censorship introduced by its makers. The censorship reflects a morale that is currently dominant in the US: violence is allowed, any word that might vaguely point at love is forbidden, even when names are concerned. Still, it’s an endless source of creativity.   

In fact, the resulting images are mainly communicated on the Internet and function similarly as memes in social networks. Intriguingly, Midjourney is even accessed through a social platform, Discord, which was originally developed for online gaming. All images one produces with Midjourney also appear automatically on Discord. They can be downloaded for other purposes from one’s own personal homepage at Midjourney.com. Realizing that they themselves will become part of the datasets, this might produce biases to worry about, like the way opinions on social media can grow into echo-chambers.

Blind eyes

The new text to image software attracts an immense amount of attention, not just in professional magazines and in universities, but also in the daily press. At the moment of writing this essay, there’s hardly a day without an article on AI in the press. There are probably countless other ways in which Artificial Intelligence is changing and could change our world, some obvious, some more hidden, but it’s the strong visual impact of Stable Diffusion, Midjourney and Dall.E2 and their easy accessibility and use that makes people jump on them. Since Le Corbusier (1887-1965) accused architects, they have “eyes that do not see” a hundred years ago, architects do their best to be early adopters of these new technologies.5 The visual aspect is thereby quintessential. Le Corbusier thought new technologies would mainly change the way buildings would be organized and constructed and therefore the way they looked. It was only much later, that new technologies also forced him to change the organization of his office. Today, it seems the other way around. Developments in computerization in architecture since the 1990s have completely changed every architectural practice, even if it’s not always obvious by the way architecture looks – apart from those cases in which architects already consciously introduced computation in the earliest part of the design process. Although there are exceptions that are celebrated, the inherent conservatism of the building industry still slows down the realization of such projects. Also, the design of such projects is still a decent amount of work for skilled architects. The introduction of software that generates sophisticated images of architectural designs from the start, makes architects willing to speculate about the possible impact of AI on their work to keep up with developments.

There are still some problems to be solved to achieve the results architects are waiting for, notably the current impossibility to relate the imagery to plans and sections. Also, it’s not yet possible to insert the AI-generated project in a concrete situation.C No doubt there are and will be solutions for that. The fear that AI will take away work and make many superfluous seems to have vanished.

Text prompts

One of the most intriguing aspects of text to image software is the new relation between the two. The prompt is a shorter or longer text. It’s a command that generates the image, no longer a description of an image that is already there. Similar phenomena play a role in illustrating and in conceptual art. Of course, illustrations to a scientific text or to a manual are supposed to be as precise as possible, but those made for a newspaper article, children’s books, or comics are much more open to the personal interpretation of the artist. This seems one of the most promising fields in which text to image software can find a use. Cartoons and caricatures, which exaggerate a certain situation, are other options.

In art, the title or description is usually added after the visual work has been realized. The idea is, that the visual work speaks for itself – even if that’s not necessarily the case. From the end of the 19th century and particularly in the 20th century, the title and even longer texts relating to the visual work became more important. In conceptual art, the complex relationship between image and text became a recurring issue. This is already evident in the work of Marcel Duchamp (1887-1968), who changed the meaning of everyday objects by putting them in an art context and adding a title, often in the form of a pun. Duchamp’s Green Box from 1934 is already more ambivalent, as it contains notes and sketches related to his magnum opus The Bride Stripped Bare by Her Bachelors, Even, or Large Glass, on which he worked between 1915 and 1923. Some of the notes and sketches in the Green Box anticipate parts of the Large Glass, some are facsimiles, some describe or depict parts of the Large Glass that were never realized, some relate to other works and thus embed the Large Glass in an even larger universe. The combination of the Large Glass (which remained unfinished and was accidentally broken) and the Green Box produces a complex world of ideas, open to different interpretations. But Duchamp also collected his puns as works in themselves, published them, and recorded a spoken version of them, triggering the imagination of the audience in another way. In the 1960s and 1970s, artists as different as Joseph Kosuth, Robert Barry, Lawrence Weiner (1942-2021), Marcel Broothaers (1924-1976), Sol Lewitt (1928-2007), Joseph Beuys (1921-1986) and many others would produce works that would either just consist of texts or texts by means of which someone else could realize a work – maybe even in different contexts.  

In his book The Second Digital Turn, Mario Carpo reminds us that before the renaissance, “the main vehicle for recording and transmission of visual data was verbal, not visual: images were described using words; written words were forwarded in space and time, images were not’. And he refers to Isidore of Seville (560-636), who epitomized the ancient mistrust of all forms of visual communication, and stated that ‘images are always deceitful, never reliable, and never true to reality”.6 If it is true that, as Carpo writes, “the rapid progress of contemporary digital technologies from verbal to visual to spatial media in the course of the last thirty years curiously reenacts, in a telescoped timeline, the entire development of Western cultural technologies” many of these issues will probably be solved.7

I notice in my own experiments that some people can now be made to believe that the results are photos when I post them on Facebook or Instagram. For example, there’s a series in which I prompted Midjourney to generate young versions of famous architects, with attributes that are semi-related to certain familiar narratives associated with them. Most people, of course, do not know what these people looked like when they were young. Nevertheless, many accept Midjourney’s suggestion. Mostly, one finds only vague hints of the real person in them, as many or as few as if one had a current or timeless portrait made. The only difference is that people accept it more when they are portrayed 'young' in a photorealistic way, because most people looked different when they were young. The other way around, I notice that people start doubting real photos when I post them after Midjourney images. This is understandable, as many of these images have been photoshopped before they were posted or printed. This anticipates some of Midjourney’s aesthetic biases and has prepared us to accept them. 

Acceptance may have a lot to do with the speed and superficiality of these media, with the textual descriptions, and not least with what people want to see or accept as true. The role of the descriptions is central here: they are not added later, but they are the origins of the images. In this way, they also challenge us to see the images as realisations of these commands. At the same time, Midjourney makes it clear that not everything is to be understood as text and that a linguistic summary of a reality or idea is always a simplification. The images are much richer in information than the prompts.

Guilty Pleasures

The deceitfulness and unreliability of Midjourney images are also inherent to the very essence of diffusion models. They feed on the Internet and feed the Internet themselves in an incestuous process. It’s all Simulacra and Simulations, as Jean Baudrillard (1929-2007) would say. He wrote already in 1994 that “today abstraction is no longer that of the map, the double, the mirror, or the concept. Simulation is no longer that of a territory, a referential being, or a substance. It is the generation by models of a real without origin or reality: a hyperreal”.8 In the case of text to image software there may be millions or even billions of origins to the produced image, but these are all blurred and deconstructed. Baudrillard defines the successive phases of the image as first the reflection of a profound reality; second the masking and denaturization of the image; third the masking and denaturization of a profound reality; fourth the masking of the absence of a profound reality; and finally, the phase in which the image has no relation to any reality whatsoever and becomes its own pure simulacrum. This is obviously the phase we have reached now.9

By far the majority of images generated by the new AI programmes belong unmistakably to the categories of fantasy, science fiction and horror, including the scary psychedelic colours that go with them. Areas, in other words, that traditionally consist of a mixture of exaggerated realism, historical references and complete nonsense. As Roland Barthes (1915-1918) already wrote about Martians, the whole psychosis is based on the myth of the identical, the double.10 This is more than satisfied by Midjourney. Its strength, its incredible detail and richness of texture, also becomes a weakness here. The images generated, precisely because of the abundance of clichés, details, textures and moods, inevitably become kitsch. And according to Umberto Eco (1932-2016), kitsch is "the ideal food for an indolent public that wants to access and enjoy beauty without having to try too hard."11

Does this mean that Midjourney is fundamentally useless? On the contrary. We are only at the beginning, even if, as the name suggests, we are in the middle of the journey. And this journey is as fascinating as it is dangerous. The best thing to do, instead of having Martians designed, is to enter this world as Martians ourselves, like a foreign planet on which we try to get along in all our innocence. I suppose many people who enjoy working with Midjourney know it is a hyperreal world of simulacra, and they know that a large part of the production is kitsch. They consider this tongue in cheek as a guilty pleasure, in other words: as a form of Camp.

According to Susan Sontag (1933-2004), Camp is a style that is ironic, theatrical, and exaggerated, characterized by a love of the unnatural, artifice, and the artificial. She argues that Camp is a way of seeing things that goes beyond mere style or taste and that it involves a certain degree of aestheticism and frivolity. She also notes that Camp is closely related to the concept of "bad taste" and that it often involves an appreciation for things that are traditionally considered low or vulgar. In fact, many of the examples of Camp Sontag gives in her famous essay, could be Midjourney favorites. Under version 3, results often combined a kind of impressionist painting style from around 1900 with a preference for Art Nouveau-like forms. Sontag calls Art Nouveau the most typical and fully developed Camp style. “Art Nouveau objects, typically, convert one thing into something else: the lighting fixtures in the form of flowering plants, the living room which is really a grotto.D A remarkable example: the Paris Metro entrances designed by Hector Guimard in the late 1890s in the shape of cast-iron orchid stalks.”12 Sontag argues that Camp is often most effective when it appropriates elements of low culture, transforming them into something that is both ridiculous and sublime. In most cases, this is exactly what the diffusion models do. Sontag sees Camp also as a mode of cultural production that is both celebratory and critical, a way of embracing and reveling in the absurdity and excess of modern life while simultaneously exposing the artifice and artificiality that underlie it.

The enormous impact of text to image models will probably change aesthetic sensibilities in architecture and design. And maybe someday we can design with this confusing new AI infused software and thus project it back into reality. After all, in the early 1990s, when special effects software like Maya only ran on extremely expensive Silicon Graphics machines, its everyday use in present day could not have been anticipated either. And this development is accelerating. The streets no longer lead to fashion’s future;E today trends break out on the internet” wrote Dean Kissick of the fashion magazine i-D. The same will be true for architecture, design and probably the whole of visual culture.13

This essay was written for the book Diffusions in Architecture: Artificial Intelligence and Image Generators by Matias del Campo (ed.). The book will be published by John Wiley & Sons, London, in the fall of 2023. The author and publisher have kindly allowed this prepublication.

Diffusions

7/27/2023


“The real is produced from miniaturized cells, matrices, and memory banks, models of control – and it can be reproduced an indefinite number of times from these. It no longer needs to be rational, because it no longer measures itself against an ideal or negative instance. It is no longer anything but operational. In fact, it is no longer really the real, because no imaginary envelops it anymore. It is a hyperreal, produced from a radiating synthesis of combinatory models in a hyperspace without atmosphere.”

Jean Baudrillard, 1981

1 Sam Jacob, «Architecture Enters the Age of Post-Digital Drawing», in: Metropolis (16.07.2017).

2 Ibid.

Planet City

Sisyphus


Since last year, with the introduction of Stable Diffusion, Midjourney and Dall.E2, several computer programs are available on the internet which, when you enter a text, produce sophisticated images. The images are detailed and show seemingly effortless complex shapes, structures and textures. Working with these programmes can be a lot of fun, especially because of the surprising combinations of the known and the unknown the software produces, because you don't know what is really happening, and because it delivers results easily and incredibly quickly. The results are improving almost every month when measured against their ability to convincingly depict in detail an apparent or possible reality. They will at least produce new aesthetic sensibilities and maybe even more.

Pictorial Turn

Web magazines have surpassed the amounts of subscriptions and page views traditional architectural magazines used to have. ArchDaily boasts 285 million monthly page views, 17,9 million monthly visits, 3,4 million Facebook Fans and 4,2 million Instagram followers. Not even in their heyday did print magazines reach such numbers. The largest architectural magazines had maybe 30.000 to 60.000 subscriptions, never more, and today there aren’t many left with more than 10.000 subscriptions. The new web magazines have a global reach – both in terms of content and in terms of readers. Whether through photographs or through renderings, images – digital images – are the dominant and most important medium to communicate architecture today. The sophistication of these images in depicting detail, sharpness, textures, weather, and atmospheres is high and incredibly seductive – whether the project has been realized or not.

In a reaction to the smoothness of most of these images, collage, as a technique to communicate architectural ideas, has celebrated an unexpected comeback. Already in 2013, Pedro Gadanho curated the exhibition Cut ’n’ Paste: From Architectural Assemblage to Collage City in the Museum of Modern Art in New York and ever since there has been a steady flow of exhibitions, publications, and symposia around the theme of collage in architecture. Sam Jacob already spoke of its return with a vengeance.1

According to Jacob, the comeback of the collage is a return to drawing after rendering software had taken that out of the hands of the architects. It’s part of a fight against the alienation new technologies causes. “Growing computational power was harnessed to produce rendered images - glossy visions of soon-to-be-built projects, usually blue-skyed, lush-leafed, and populated by groups of groomed and grinning clip-art figures, where buildings appeared with a polished sheen and lens flares proliferated.”2 This may very well be true, even if the software Jacob is referring to is also increasingly used to produce staggering dystopian fantasies about, and dark comments on, our built-up reality, not least as part of the special effects industry in filmmaking. This is where this software originally came from before it fell into architects’ hands in the famous Paperless Studio in Columbia University in the early 1990s. These fantasies may very well be considered a contemporary equivalent to drawing and painting. The cinematic dystopian architectural and urban worlds of Liam Young, are a kind of homecoming in this respect.

Tempio Moderno – © Baglivo
Albergo per pellegrini – © Baglivo
Curzio Malaparte, Villa Malaparte, 1937-1943, Capri VS Carolyn Davidson, Nike Swoosh, 1971 – © Davide Trabucco
Spanish Riviera, Castalla VS Le Corbusier, Maison Dom-Ino, 1914/1915 – © Davide Trabucco
Ridolfian-Hollywoodian. Architettura e controarchitettura – © Beniamino Servino
Sironiana con  gasometro – © Beniamino Servino
Stazione spaziale – © Luca Galofaro
Life on Mars – © Luca Galofaro
01 | 09
Tempio Moderno – © Baglivo

3 See. Ferrando, Lootsma, Trakulyingcharoen: Italian Collage. Siracusa 2020.

4 James Bridle: "The stupidity of AI", in: The Guardian (17.03.2023).

5 See. Karen Michels, Der Sinn der Unordnung. Arbeitsformen im Atelier Le Corbusier, Braunschweig/Wiesbaden 1989.

6 Mario Carpo: The Second Digital Turn. Design Beyond Intelligence. Cambridge/London 2017, pp. 102–103.

7 Ibid.

Brooklyn Bridge

House for a family of four

Also, several Italian architects have recently taken up the technique of collage again to communicate their ideas about architecture. The best-known examples are Carmelo Baglivo, Luca Galofaro and Beniamino Servino, but others, like Davide Trabucco, follow in their footsteps.3 One of the advantages of collage is that one can introduce (parts of) photographic representations of reality, including styles, materials, and textures. But the biggest advantage is that making a collage spreads relatively quickly. Even if Baglivo, Galofaro and Servino also produce books, these collages are posted on social media platforms like Facebook and Instagram first, where they seem to be more at home. They often take the form of memes: combinations of quick remixed graphic ideas with simple texts that work within particular cultural discourses. In the Italian context, one can see different authors communicating with each other through collages. Text plays a minor role.

Diffusion

It is in this context that AI programmes appear on the scene that can generate equally sophisticated digital imagery as photographs and renderings and produce images even quicker than one could do with a collage. Stable Diffusion, Midjourney and Dall.E2, does that for you when you just enter a text. The image appears in less than sixty seconds. The images are detailed and show seemingly complex shapes, structures, styles, and textures. Manifestations of nature – skin, hair, and greenery – are effortlessly depicted. However, this does not mean diffusion software is well suited to depict an existing reality. When, for example, trying to represent a portrait of an existing person or city, it only works to a certain extent with examples that are famous in the United States – say, Donald Trump, or Manhattan seen from the Brooklyn Bridge – all appearing with major faults. Images of less famous people or scenes sometimes only have a faint resemblance to the original. Hands and texts are notoriously problematic, but Midjourney in particular is getting better in terms of realism with every new version, as they work with the largest datasets.  

Even if we speak of Artificial Intelligence, we should remember that the current text to image models or diffusion models are forms of “machine learning". They are trained on extremely large datasets of titled images, but these are not simply used as they are. Noise is added, which basically destroys the original images. After that, they can remove the noise and “recognize” the prompted image from selected material. If we want a realistic depiction of someone or something, we still better go to Google Image Search. We are far removed here from programs like ChatGPT, which – with some reservations – might come close to delivering results that may compete with Google and Wikipedia in their ability to build up arguments or have conversations that go beyond their predecessors. At the same time, we should not understand these AI programs as forms of intelligence that can generate something really new and unexpected, but only as other orders of existing things. Only every now and then something may go wrong and something unknown may emerge by chance. It is then said that the programmes hallucinate.

When working with diffusion models, one tries to understand how one can control the technology after all, but it's not that simple. It's always about the "prompt": the text that sets everything in motion. But they do more than that: prompts also frame the result and can control the content of the result and its aesthetics to a certain degree. "Prompt engineers" are already very experienced at achieving results that come close to the expected result or even beyond that. Websites like PromptHero show examples and offer courses in formulating prompts to achieve ever more perfect results: whatever perfection may be in this case. If it’s about achieving a detailed, seemingly realistic image, that might soon be possible. If it’s about realizing an image that comes close to an image one has in mind from the beginning, it might remain problematic. But one of the more amusing aspects may very well be that the model produces something unexpected.

Learning and unlearning

The learning material defines many of the biases of diffusion models. The datasets are provided by firms like the German non-profit organization LAION and the American Common Crawl. The latter collects 3 billion Internet pages per month. According to The Guardian, “Researchers at LAION took a chunk of the Common Crawl data and pulled out every image with an “alt” tag, a line or so of text meant to be used to describe images on web pages. After some trimming, links to the original images and the text describing them are released in vast collections: LAION-5B, released in March 2022, contains more than five billion text-image pairs. These images are “public” images in the broadest sense: any image ever published on the internet may be gathered up into them, with exactly the kind of strange effects one may expect.”4

Still, Midjourney's learning material clearly has its focus on American examples. Then comes Europe, and eventually the rest of the world. It affirms the biases of Western society and has clear racial and gender prejudices. If one wants a female professional or a person of color in an image, one must put that into the prompt explicitly. On top of that, all representations have crucial flaws. It is not without a reason that the first thing you type in the Midjourney bot is /imagine. It produces an imaginary world, a possible world, a proto-surrealist world, the laws of which are the laws of Alfred Jarry’s (1873–1907) ‘pataphysics, a physics of the possible beyond metaphysics’?. It’s a world without moral impetus – apart from the biases and censorship introduced by its makers. The censorship reflects a morale that is currently dominant in the US: violence is allowed, any word that might vaguely point at love is forbidden, even when names are concerned. Still, it’s an endless source of creativity.   

In fact, the resulting images are mainly communicated on the Internet and function similarly as memes in social networks. Intriguingly, Midjourney is even accessed through a social platform, Discord, which was originally developed for online gaming. All images one produces with Midjourney also appear automatically on Discord. They can be downloaded for other purposes from one’s own personal homepage at Midjourney.com. Realizing that they themselves will become part of the datasets, this might produce biases to worry about, like the way opinions on social media can grow into echo-chambers.

Blind eyes

The new text to image software attracts an immense amount of attention, not just in professional magazines and in universities, but also in the daily press. At the moment of writing this essay, there’s hardly a day without an article on AI in the press. There are probably countless other ways in which Artificial Intelligence is changing and could change our world, some obvious, some more hidden, but it’s the strong visual impact of Stable Diffusion, Midjourney and Dall.E2 and their easy accessibility and use that makes people jump on them. Since Le Corbusier (1887-1965) accused architects, they have “eyes that do not see” a hundred years ago, architects do their best to be early adopters of these new technologies.5 The visual aspect is thereby quintessential. Le Corbusier thought new technologies would mainly change the way buildings would be organized and constructed and therefore the way they looked. It was only much later, that new technologies also forced him to change the organization of his office. Today, it seems the other way around. Developments in computerization in architecture since the 1990s have completely changed every architectural practice, even if it’s not always obvious by the way architecture looks – apart from those cases in which architects already consciously introduced computation in the earliest part of the design process. Although there are exceptions that are celebrated, the inherent conservatism of the building industry still slows down the realization of such projects. Also, the design of such projects is still a decent amount of work for skilled architects. The introduction of software that generates sophisticated images of architectural designs from the start, makes architects willing to speculate about the possible impact of AI on their work to keep up with developments.

There are still some problems to be solved to achieve the results architects are waiting for, notably the current impossibility to relate the imagery to plans and sections. Also, it’s not yet possible to insert the AI-generated project in a concrete situation. No doubt there are and will be solutions for that. The fear that AI will take away work and make many superfluous seems to have vanished.

Text prompts

One of the most intriguing aspects of text to image software is the new relation between the two. The prompt is a shorter or longer text. It’s a command that generates the image, no longer a description of an image that is already there. Similar phenomena play a role in illustrating and in conceptual art. Of course, illustrations to a scientific text or to a manual are supposed to be as precise as possible, but those made for a newspaper article, children’s books, or comics are much more open to the personal interpretation of the artist. This seems one of the most promising fields in which text to image software can find a use. Cartoons and caricatures, which exaggerate a certain situation, are other options.

In art, the title or description is usually added after the visual work has been realized. The idea is, that the visual work speaks for itself – even if that’s not necessarily the case. From the end of the 19th century and particularly in the 20th century, the title and even longer texts relating to the visual work became more important. In conceptual art, the complex relationship between image and text became a recurring issue. This is already evident in the work of Marcel Duchamp (1887-1968), who changed the meaning of everyday objects by putting them in an art context and adding a title, often in the form of a pun. Duchamp’s Green Box from 1934 is already more ambivalent, as it contains notes and sketches related to his magnum opus The Bride Stripped Bare by Her Bachelors, Even, or Large Glass, on which he worked between 1915 and 1923. Some of the notes and sketches in the Green Box anticipate parts of the Large Glass, some are facsimiles, some describe or depict parts of the Large Glass that were never realized, some relate to other works and thus embed the Large Glass in an even larger universe. The combination of the Large Glass (which remained unfinished and was accidentally broken) and the Green Box produces a complex world of ideas, open to different interpretations. But Duchamp also collected his puns as works in themselves, published them, and recorded a spoken version of them, triggering the imagination of the audience in another way. In the 1960s and 1970s, artists as different as Joseph Kosuth, Robert Barry, Lawrence Weiner (1942-2021), Marcel Broothaers (1924-1976), Sol Lewitt (1928-2007), Joseph Beuys (1921-1986) and many others would produce works that would either just consist of texts or texts by means of which someone else could realize a work – maybe even in different contexts.  

In his book The Second Digital Turn, Mario Carpo reminds us that before the renaissance, “the main vehicle for recording and transmission of visual data was verbal, not visual: images were described using words; written words were forwarded in space and time, images were not’. And he refers to Isidore of Seville (560-636), who epitomized the ancient mistrust of all forms of visual communication, and stated that ‘images are always deceitful, never reliable, and never true to reality”.6 If it is true that, as Carpo writes, “the rapid progress of contemporary digital technologies from verbal to visual to spatial media in the course of the last thirty years curiously reenacts, in a telescoped timeline, the entire development of Western cultural technologies” many of these issues will probably be solved.7

"Young Zaha Hadid as a diva", Midjourney – © Bart Lootsma
"Young Mies van der Rohe smoking a cigar", Midjourney – © Bart Lootsma
"Young Louis Kahn on the floor of a restroom in Penn Station", Midjourney – © Bart Lootsma
"Young Rem behind a movie camera", Midjourney – © Bart Lootsma
"Young Bjarke Ingels Reading Comics", Midjourney – © Bart Lootsma
"Frank Gehry as a punk", Midjourney – © Bart Lootsma
"Young Philip Johnson in a uniform", Midjourney – © Bart Lootsma
01 | 08
"Young Zaha Hadid as a diva", Midjourney – © Bart Lootsma

8 Jean Baudrillard: Simulacra and Simulations. Ann Arbor (Michigan) 1994, p. 1.

9 Ibid., p. 6.

10 Roland Barthes, "Marsmenschen", in: Mythen des Alltags, Berlin 2010, pp. 53–55, p. 55

11 Umberto Eco: "Die Struktur des schlechten Geschmacks", in: Im Labyrinth der Vernunft, Texte über Kunst und Zeichen. Leipzig 1990, p. 246.

12 Susan Sontag, "Notes on Camp", in: Against Interpretation and other essays, New York 1966, p. 279.

13 Dean Kissick: "Didn’t I see you on the cover of i-D?", in: i-D, No. 326, 2013.

Camp

Dean Kissick

I notice in my own experiments that some people can now be made to believe that the results are photos when I post them on Facebook or Instagram. For example, there’s a series in which I prompted Midjourney to generate young versions of famous architects, with attributes that are semi-related to certain familiar narratives associated with them. Most people, of course, do not know what these people looked like when they were young. Nevertheless, many accept Midjourney’s suggestion. Mostly, one finds only vague hints of the real person in them, as many or as few as if one had a current or timeless portrait made. The only difference is that people accept it more when they are portrayed 'young' in a photorealistic way, because most people looked different when they were young. The other way around, I notice that people start doubting real photos when I post them after Midjourney images. This is understandable, as many of these images have been photoshopped before they were posted or printed. This anticipates some of Midjourney’s aesthetic biases and has prepared us to accept them. 

Acceptance may have a lot to do with the speed and superficiality of these media, with the textual descriptions, and not least with what people want to see or accept as true. The role of the descriptions is central here: they are not added later, but they are the origins of the images. In this way, they also challenge us to see the images as realisations of these commands. At the same time, Midjourney makes it clear that not everything is to be understood as text and that a linguistic summary of a reality or idea is always a simplification. The images are much richer in information than the prompts.

Guilty Pleasures

The deceitfulness and unreliability of Midjourney images are also inherent to the very essence of diffusion models. They feed on the Internet and feed the Internet themselves in an incestuous process. It’s all Simulacra and Simulations, as Jean Baudrillard (1929-2007) would say. He wrote already in 1994 that “today abstraction is no longer that of the map, the double, the mirror, or the concept. Simulation is no longer that of a territory, a referential being, or a substance. It is the generation by models of a real without origin or reality: a hyperreal”.8 In the case of text to image software there may be millions or even billions of origins to the produced image, but these are all blurred and deconstructed. Baudrillard defines the successive phases of the image as first the reflection of a profound reality; second the masking and denaturization of the image; third the masking and denaturization of a profound reality; fourth the masking of the absence of a profound reality; and finally, the phase in which the image has no relation to any reality whatsoever and becomes its own pure simulacrum. This is obviously the phase we have reached now.9

By far the majority of images generated by the new AI programmes belong unmistakably to the categories of fantasy, science fiction and horror, including the scary psychedelic colours that go with them. Areas, in other words, that traditionally consist of a mixture of exaggerated realism, historical references and complete nonsense. As Roland Barthes (1915-1918) already wrote about Martians, the whole psychosis is based on the myth of the identical, the double.10 This is more than satisfied by Midjourney. Its strength, its incredible detail and richness of texture, also becomes a weakness here. The images generated, precisely because of the abundance of clichés, details, textures and moods, inevitably become kitsch. And according to Umberto Eco (1932-2016), kitsch is "the ideal food for an indolent public that wants to access and enjoy beauty without having to try too hard."11

Does this mean that Midjourney is fundamentally useless? On the contrary. We are only at the beginning, even if, as the name suggests, we are in the middle of the journey. And this journey is as fascinating as it is dangerous. The best thing to do, instead of having Martians designed, is to enter this world as Martians ourselves, like a foreign planet on which we try to get along in all our innocence. I suppose many people who enjoy working with Midjourney know it is a hyperreal world of simulacra, and they know that a large part of the production is kitsch. They consider this tongue in cheek as a guilty pleasure, in other words: as a form of Camp.

According to Susan Sontag (1933-2004), Camp is a style that is ironic, theatrical, and exaggerated, characterized by a love of the unnatural, artifice, and the artificial. She argues that Camp is a way of seeing things that goes beyond mere style or taste and that it involves a certain degree of aestheticism and frivolity. She also notes that Camp is closely related to the concept of "bad taste" and that it often involves an appreciation for things that are traditionally considered low or vulgar. In fact, many of the examples of Camp Sontag gives in her famous essay, could be Midjourney favorites. Under version 3, results often combined a kind of impressionist painting style from around 1900 with a preference for Art Nouveau-like forms. Sontag calls Art Nouveau the most typical and fully developed Camp style. “Art Nouveau objects, typically, convert one thing into something else: the lighting fixtures in the form of flowering plants, the living room which is really a grotto. A remarkable example: the Paris Metro entrances designed by Hector Guimard in the late 1890s in the shape of cast-iron orchid stalks.”12 Sontag argues that Camp is often most effective when it appropriates elements of low culture, transforming them into something that is both ridiculous and sublime. In most cases, this is exactly what the diffusion models do. Sontag sees Camp also as a mode of cultural production that is both celebratory and critical, a way of embracing and reveling in the absurdity and excess of modern life while simultaneously exposing the artifice and artificiality that underlie it.

The enormous impact of text to image models will probably change aesthetic sensibilities in architecture and design. And maybe someday we can design with this confusing new AI infused software and thus project it back into reality. After all, in the early 1990s, when special effects software like Maya only ran on extremely expensive Silicon Graphics machines, its everyday use in present day could not have been anticipated either. And this development is accelerating. The streets no longer lead to fashion’s future; today trends break out on the internet” wrote Dean Kissick of the fashion magazine i-D. The same will be true for architecture, design and probably the whole of visual culture.13

This essay was written for the book Diffusions in Architecture: Artificial Intelligence and Image Generators by Matias del Campo (ed.). The book will be published by John Wiley & Sons, London, in the fall of 2023. The author and publisher have kindly allowed this prepublication.

Index

Feedback Newsletter
Daidalos thanks:
Become a Sponsor
Article 24/04
4/25/2024Tibor Joanelly

Follow the Ladder!

Kazuo Shinohara's Urban Turn transforms his buildings into urban landscapes in which the effects of space and time blend with movement and perception. read
24/04
Follow the Ladder! II
Article 24/03
3/22/2024Tibor Joanelly

Follow the Ladder!

In his reflections on Kazuo Shinohara, Tibor Joanelly alongside Paul Cézanne also encounters the Third Person in the Japanese master's work. read
24/03
Follow the Ladder! I
Article 24/02
2/23/2024Dieter Geissbühler

Predictable Decline

Behind the façade of the Mall of Switzerland, Dieter Geissbühler glimpses the aesthetics of the ruin. However, this is suffocated by the designs irrelevance. read
24/02
Predictable Decline
Article 24/01
1/18/2024Ana Catarina Silva

Housing. Not flats

Architect Philipp Esch spoke to Ana Catarina Silva about undetermined spaces, architecture as a process and beauty as the most enduring measure of sustainability. read
24/01
Housing. Not flats
Article 23/11
12/14/2023Jorge Melguizo

Medellín

Once the most dangerous city in the world, Medellín became a model for urban change. Its architecture is the image of what is even more important. read
23/11
Medellín
Article 23/10
10/27/2023Salvatore Dellaria

The Southgate Myth

Built and demolished within less than thirty years, Stirling's Southgate Estate stands for what it was planned for and against which it had to fail: Britain's neoliberalism. read
23/10
The Southgate Myth
Article 23/09
9/26/2023Randa A. Mahmoud

Lost in Gourna

Hassan Fathy was brilliant and visionary, but an early project was strongly rejected by its residents. Randa A. Mahmoud studied Gourna to get behind the paradox of Egypt's Great Architect. read
23/09
Lost in Gourna
Article 23/08
8/29/2023Grisi Ganzer

Pandora's Boxes

Grisi Ganzer’s report on the collaboration on the German Pavilion for the Venice Architecture Biennale features his impressions and experiences building a bar counter for the Pandora Culture Centre. read
23/08
Pandora's Boxes
Article 23/07
7/27/2023Bart Lootsma

Diffusions

Text-based AI generates realistic images of diffuse origin. Imperfect and open-ended, they irritate our aesthetic sensibilities and change the entire visual culture. read
23/07
Diffusions
Article 23/06
6/28/2023Denis Andernach

Andernach's Houses

Free of constraints, Denis Andernach draws his houses as pure architectures in abandoned landscapes. He unites elementary forms with imagined purposes. read
23/06
Andernach's Houses
Article 23/05
5/24/2023Pedro Gadanho

Learning from Hippie Modernism

An environmental avant-garde grew out of the resistance against the post-war society of the late 1960s. While their efforts were derided as esoteric, time has come to learn from their approaches. read
23/05
Hippie Modernism
Article 23/04
4/27/2023Giacomo Pala

Pineapple Modernity

The intersection of globalization and modernity: the pineapple and the emergence of a new architectural paradigm since the 18th century. read
23/04
Pineapple Modernity
Article 23/03
3/29/2023Claudia Kromrei

Case come noi

An island, three writers and three houses in which they lived, loved and worked. In Capri's idyll, the buildings unfold the personality of their builders and stage their self-absorption. read
23/03
Case come noi
Article 23/02
2/23/2023Bahar Avanoğlu

[Un]built

Separating "unbuilt" architecture from the one "not built", Raimund Abraham's oeuvre is a vital reminder of architecture as a work of memory and desire and as an independent art of building the [Un]built. read
23/02
[Un]built
Article 23/01
1/18/2023Wolfgang Bachmann

New Land

An excursion into an unknown area: In his travelogue about Lusatia, Wolfgang Bachmann speaks of official GDR stage scenery,, West German-influenced reappraisal – and Baroque splendour. read
23/01
New Land
Article 22/07
11/23/2022Bettina Köhler

Liebe du Arsch!*

Can one discard buildings? Can one overcome ignorance and greed? Does love help? Bettina Köhler’s answer to these questions is “yes” in her investigation of beauty as the custodian of durability. read
22/07
Liebe du Arsch!*
Article 22/06
10/19/2022Fala

Fala meets Siza

Fala and Álvaro Siza are bound by origins but separated by age. In a personal encounter, the 89-year-old Pritzker Prize winner talks about that which is still reflected in Fala's own work today. read
22/06
Fala meets Siza
Article 22/05
9/22/2022Anna Beeke

Trailer Treasures

Within mobile home parks, Anna Beeke encounters a clear desire for individualized place. In her photographs she shows how prefabricated units are the same, but different. read
22/05
Trailer Treasures
Article 22/04
8/20/2022Mario Rinke

Open Meta-landscapes

Mario Rinke pleads for supporting structures that are not conceived for a use, but out of the place. In these meta-landscapes, architectures can occur episodically. read
22/04
Open Meta-landscapes
Article 22/03
7/1/2022Virginia de Diego
caption

Reductio ad absurdum

Through deliberate destruction a former bunker can be preserved. Its relevance is created out ouf its absurdity. read
22/03
Reductio ad absurdum
Article 22/02
7/1/2022Jerome BeckerMatthias Moroder

The balance of chaos and structure

In conversation with Jerome Becker and Matthias Moroder, Marc Leschelier emphasises his aversion to functionalism and stresses the importance of architecture as a form of expression. read
22/02
Chaos and Structure
Article 22/01
7/1/2022Gerrit Confurius
Teatro di Marcello, Rom, Giovanni Battista Piranesi (1720-1778), ca. 1757

Permanence as a principle

Gerrit Confurius recalls the end of the printed edition of Daidalos and recommends the principle of permanence as a strategy for the future tasks of architecture as well. read
22/01
Permanence as a principle
Don't miss any articles thanks to our newsletter.
#