Visual and Multimodal Rhetoric: How Images Persuade

The classical rhetorical tradition was developed for, and primarily theorized in terms of, oral and written verbal communication. Words were the medium; the speaker's voice and body were the instruments; and while delivery addressed the visual dimension of performance, rhetoric's analytical apparatus; topoi, figures, canons; was essentially linguistic. The 20th century's explosion of visual media; film, television, photography, graphic design, digital imagery; demanded an extension of rhetorical theory to account for how images, symbols, and multimodal texts persuade.

Visual and multimodal rhetoric addresses this challenge by asking: how do images work rhetorically? What are the modes of persuasion available to non-verbal communication? How do visual and verbal elements interact in multimodal texts? And what is the relationship between visual representation and ideology?

Roland Barthes and the Rhetoric of the Image

Roland Barthes's essay "Rhetoric of the Image" (1964) is the foundational text of visual rhetoric. Analyzing a Panzani pasta advertisement, Barthes identified three kinds of message at work in every photograph: the linguistic message (text, captions, labels), the denoted image (the literal, documentary record of what is in front of the camera), and the connoted image (the ideological, cultural, and rhetorical meanings activated by the image's composition, content, and context).

Barthes's most significant contribution was his analysis of how images naturalize ideology; how the connoted meanings (national identity, gender roles, class associations, aspirational lifestyles) are attached to a denoted image in ways that make them seem natural and inevitable rather than constructed and arguable. When a food advertisement makes its product seem connected to Italian warmth and family tradition, it is not making an argument that can be explicitly evaluated; it is creating an association that operates below the threshold of rational scrutiny. This is the distinctive rhetorical power of images: they can argue without appearing to argue.

Sonja Foss and Visual Rhetoric

Sonja Foss's systematic development of visual rhetoric as a rhetorical subdiscipline; particularly her work on the constitutive rhetoric of images; established the theoretical vocabulary for analyzing visual texts as full rhetorical acts. Foss identified two primary functions of visual rhetoric: the communicative function (images that primarily convey information or argument to audiences) and the constitutive function (images that primarily create or maintain identity, community, and shared reality).

The constitutive function is particularly significant for understanding how visual culture operates politically and socially: when national monuments, commemorative photography, or news images represent particular versions of a community's identity and history, they are not merely recording but constructing; constituting the reality they appear to document. The Vietnam Veterans Memorial, the iconic photograph of the "Tank Man" in Tiananmen Square, the images from Abu Ghraib; all function as rhetorical acts that shape collective understanding and political reality.

Social Semiotics and the Grammar of Visual Design

Gunther Kress and Theo van Leeuwen's Reading Images: The Grammar of Visual Design (1996) provided the most systematic analytical framework for visual and multimodal rhetoric. Drawing on Michael Halliday's systemic functional linguistics; the theory that language serves three simultaneous functions (ideational, interpersonal, and textual); Kress and van Leeuwen developed a parallel analysis of how visual elements communicate through systems of choices with social and ideological implications.

Their framework identifies the visual equivalents of grammatical choices: how the spatial arrangement of elements in an image creates salience and hierarchy; how vectors (lines of implied movement or gaze) create narrative connections; how the represented and interactive participants in an image establish relationships of power and solidarity; and how framing, color, perspective, and modality (the degree of reality claimed) all contribute to the image's total communicative meaning.

Multimodality and Contemporary Communication

Contemporary communication is almost entirely multimodal; it combines verbal, visual, audio, gestural, and spatial modes in varying combinations across different media. A television advertisement, a website, a PowerPoint presentation, a social media post; all deploy multiple communicative modes simultaneously, and the rhetorical meaning emerges from the interaction between them, not from any single mode considered in isolation.

Multimodal rhetoric examines how these modes interact: how visual and verbal messages reinforce, extend, or contradict each other; how layout and design organize attention and create hierarchies of information; and how the affordances and constraints of specific media (the scroll of a webpage, the persistence of a photograph, the ephemerality of a story) shape what rhetorical moves are available.

The analysis of memes; the dominant unit of digital visual rhetoric; requires exactly this multimodal framework: the meme's meaning is typically produced by the interaction between a recognizable visual template (which carries its own set of cultural associations), new text, and the intertextual references that both activate. Understanding why a meme is funny, cutting, or powerful requires understanding its visual rhetoric as much as its verbal content.

Go Deeper

Take our free one-hour interactive course on the complete foundations of rhetoric.

Start the Free Course →