Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


___-1

___0

___+1

Distant Viewing

112449_14046_ch01_4P.indd   1112449_14046_ch01_4P.indd   1 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


-1___

0___

+1___

112449_14046_ch01_4P.indd   2112449_14046_ch01_4P.indd   2 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


___-1

___0

___+1

The MIT Press
Cambridge, Massachusetts
London, England

Taylor Arnold and Lauren Tilton

Distant Viewing

Computational Exploration of Digital Images

112449_14046_ch01_4P.indd   3112449_14046_ch01_4P.indd   3 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


-1___

0___

+1___

© 2023 Massachusetts Institute of Technology

This work is subject to a Creative Commons CC- BY- NC- ND license.

Subject to such license, all rights are reserved.

The open access edition of this book was made possible by generous funding from 

the National Endowment for the Humanities and the University of Richmond.

The MIT Press would like to thank the anonymous peer reviewers who provided 

comments on drafts of this book. The generous work of academic experts is essential 

for establishing the authority and quality of our publications. We acknowledge with 

gratitude the contributions of these otherwise uncredited readers.

This book was set in Stone Serif and Stone Sans by Westchester Publishing Services. 

Library of Congress Cataloging- in- Publication Data

Names: Arnold, Taylor, author. | Tilton, Lauren, author.

Title: Distant viewing : computational exploration of digital images / 

Taylor Arnold and Lauren Tilton.

Description: Cambridge, Massachusetts : The MIT Press, [2023] | 

Includes bibliographical references and index.

Identifiers: LCCN 2022052202 (print) | LCCN 2022052203 (ebook) | 

ISBN 9780262546133 (paperback) | ISBN 9780262375177 (epub) | 

ISBN 9780262375160 (pdf)

Subjects: LCSH: Computer vision. | Image data mining. | Image processing—Digital 

techniques. | Visual sociology— Technique.

Classification: LCC TA1634 .A76 2023 (print) | LCC TA1634 (ebook) | 

DDC 006.4/2— dc23/eng/20230111

LC record available at https:// lccn . loc . gov / 2022052202

LC ebook record available at https:// lccn . loc . gov / 2022052203

112449_14046_ch01_4P.indd   4112449_14046_ch01_4P.indd   4 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024

https://lccn.loc.gov/2022052202
https://lccn.loc.gov/2022052203


___-1

___0

___+1

Acknowledgments    vii

 Introduction    1

1 Distant Viewing: Theory    9

2 Distant Viewing: Method    33

3 Advertising in Color: Movie Posters and Genre    57

4 Seeing Is Believing: Themes in Great Depression  

and World War II Photography    97

5 Feast for the Eyes: Locating Visual Style in Network- Era  

American Situation Comedies    145

6 Opening the Archive: Visual Search and Discovery  

of the Met’s Open Access Program    177

 Conclusion    221

Glossary    225

Notes    227

Bibliography    249

Datasets    265

Index    269

Contents

112449_14046_ch01_4P.indd   5112449_14046_ch01_4P.indd   5 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


-1___

0___

+1___

112449_14046_ch01_4P.indd   6112449_14046_ch01_4P.indd   6 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


___-1

___0

___+1

Projects take time to percolate and refine. This one is no exception, and was 

easily over a decade in the making. For all our appreciation of the speed of 

algorithms, the time to think slowly and carefully were key. The final let-

ters on the page are a fraction of the words typed, code processed, and ideas 

explored. And most importantly, this book was only possible because of the 

support, generosity, and patience of colleagues, friends, and family.

A significant amount of this book was written amid a global pandemic. 

Zoom calls, texts, and emails with Claudia Calhoun, Jordana Cox, Molly 

Fair, Joshua Glick, Eva Hageman, Devin McGeehen- Muchmore, Kristine 

Nolin, Jeri Wieringa, and Caroline Weist kept us grounded. Miriam Posner, 

Lauren Klein, and Jessica Marie Johnson have been constant guides, pio-

neering the intersection of digital humanities and data science.

Teaching and research are intimately connected. Working to understand 

and then explain to someone else one’s theories and methods, and then the 

grappling with the questions that follow, is a humbling and rewarding pro-

cess. Thank you, Jennifer Guiliano and David Wrisley, for the opportunities 

to teach the concepts that became foundational to this book as well as to 

learn from the HILT (Humanities Intensive Learning and Teaching) and 

the NYU Abu Dhabi Winter Institute in Digital Humanities communities. 

Thank you to the students at the University of Richmond who shared their 

excitement to try out new methods and conduct research. Salar Ather and 

Aalok Sathe’s work on sitcom laugh tracks as independent studies still has 

us giggling at some of the results.

Workshopping and presenting this work at various stages offered influ-

ential opportunities to develop this project. Along with the great feedback 

at conferences, we are grateful to the colleagues who invited us to share our 

Acknowledgments

112449_14046_ch01_4P.indd   7112449_14046_ch01_4P.indd   7 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


viii Acknowledgments

-1___

0___

+1___

work, including Susan Aasman, Saskia Asser, Nicolas Ballier, Elisabeth Burr, 

James Connolly, Alexander Dunst, Jasmine van Gorp, Ann Hanlon, Tim van 

der Heijden, Vilja Hulden, Mike Kane, Ulf Otto, Nora Probst, Vincent Renner, 

Douglas Seefeldt, Thomas Smits, Stewart Varner, and Jun Yan. Thank you, 

and thanks to your colleagues, for thinking with us. A special thank you to 

our co- conspirator Mark Williams for his relentless efforts to amplify this 

project at every stage. Gratitude as well to the reviewers and editors of our 

first article on distant viewing in Digital Scholarship in the Humanities and our 

next article on sitcoms with co- author Annie Berke in the Journal of Cultural 

Analytics, which would become the basis for chapter 5. We appreciate Annie’s 

support for our adapting this work for the book and her brilliant insights into 

post– World War II TV, which she expands on in her new book, Their Own Best 

Creations: Women Writers in Postwar Television. Finally, thank you to the peer 

reviewers including Lev Manovich for your insightful feedback that helped 

us clarify key ideas while helping us reach an interdisciplinary audience.

Research support made this project possible. The University of Rich-

mond’s institutional commitment to digital humanities and data science has 

been key. Deans Kathleen Skerrett and Patrice Rankine understood that hir-

ing us together was critical to the success of our individual research, and that 

we would do even more together. Our departments helped us navigate being 

new professors, making sure we had time to pursue research and to integrate 

our scholarship into teaching. The Department of Rhetoric and Communica-

tion’s embrace of digital humanities under the leadership of Chairs Nicole 

Maurantonio and Tim Barney, with the enthusiastic support of Mari Lee Mif-

sud and Paul Achter, has made for an environment that one could only dream 

of. With enthusiasm and patience, Brenda Thomas in the Foundation, Cor-

porate, and Government Relations office navigated us through grant appli-

cations and management. Associate Provost Carol Parish’s efforts to build 

an institution that supports interdisciplinary computational research steeped 

in the liberal arts has allowed us to scale up our research to the next level. 

Grant support from the Mellon Foundation funded the Collections as Data 

Initiative, which provided an invaluable opportunity to think and imagine 

with Carol Chiodo about how distant viewing could support museums and 

libraries. We are also appreciative of the visiting positions at the Université 

Paris- Diderot and the Collegium de Lyon made possible through Nicolas Bal-

lier and Vincent Renner. We are grateful to Patricia Hswe and the Mellon 

Foundation for the opportunity to develop software to make distant viewing 

112449_14046_ch01_4P.indd   8112449_14046_ch01_4P.indd   8 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Acknowledgments ix

___-1

___0

___+1

more accessible in the years to come. Finally, the National Endowment for 

the Humanities’ Office of Digital Humanities has been with us at each step 

over the past decade. We are incredibly grateful to Brett Bobley, Sheila Bren-

nan, Perry Collins, and Jennifer Serventi for their support of Photogrammar 

(HAA- 284853- 22 and HD- 51421- 11) and the Distant Viewing Toolkit (HAA- 

261239– 18). Along with sharing your excitement about innovation and a 

culture of openness, you provided us with the time, space, and resources, 

and, perhaps most importantly, built the interdisciplinary digital humanities 

community that made this open- access book possible.

As with so many exciting opportunities in our careers, Laura Wexler has 

been central. In addition to sharing her passion for the power of visual 

culture to explain our world, she connected us with a remarkable network 

of intersectional feminist media scholars including Elizabeth Losh. Our 

deepest gratitude goes to Liz, who took significant time to provide forma-

tive feedback on the first few chapters of this book before anyone else had 

seen a page. The chance to work with colleagues who take the time to help 

you make a work the best version of itself is one more reason we are also 

indebted to both of these pioneers of the digital humanities for connecting 

us with the MIT Press and Noah Springer, who has been a tireless advocate 

of this project. Thank you as well to Kathleen A. Caruso and Paula Woolley 

for their careful reading of this book. We are lucky to work with such a sup-

portive and ambitious team.

Looking nationally and internationally, we are indebted to the individu-

als and institutions across the world that have advocated for open access, 

open data, and open source. We are grateful for the United States Library 

of Congress, particularly the Prints and Photographs Division and LC Labs. 

Colleagues such as Beverly Brannan and Meghan Ferriter have worked in 

the background to make our kind of research possible. Our scholarship 

would not have been possible without the continued work and support of 

the open- source software communities within data science and computer 

vision. Further, a future of computer vision that is explainable and commit-

ted to intersectional feminism and anti- racism continues to require relent-

less advocacy. Thank you to groups such as Data & Society, the AI Now 

Institute, and the Algorithmic Justice League for your work to hold all of us 

accountable while building a more just and equitable future.

Finally, we want to acknowledge our families. Along with their uncondi-

tional support and love, they kindly listened to us tease out the core ideas 

112449_14046_ch01_4P.indd   9112449_14046_ch01_4P.indd   9 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


x Acknowledgments

-1___

0___

+1___

of this book over lobster rolls in Maine or red beans and old- fashioneds in 

New Orleans, for years. While the actual book project started shortly after 

we became an official family, the process leading to the first typed pages 

taught us that forging a collaboration would take a loving amount of work, 

kindness, and vulnerability. Only our dear dog Sargent, who passed away 

as this book came to completion, knows all the coffee, walks, and dinner 

debates that this has taken; Roux has big shoes to fill.

112449_14046_ch01_4P.indd   10112449_14046_ch01_4P.indd   10 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


___-1

___0

___+1

In the fall of 2010, we began working on our first collaboration. In what 

would eventually become the public digital project Photogrammar, we set 

out to build an interactive space that allowed visitors to visualize the more 

than 170,000 digitized Farm Security Administration / Office of War Infor-

mation (FSA- OWI) photographs produced by the US government between 

1935 and 1944. Our collaboration served as an ideal mixture of our inter-

ests. Taylor was a graduate student in a statistics department with a research 

focus on exploring and visualizing complex datasets. Lauren was a graduate 

student in American Studies with a concentration in public humanities with 

a focus on twentieth- century American film and photography. Combining 

the FSA- OWI collection’s rich metadata and meticulously digitized public- 

domain images to create a publicly accessible interface overlaid perfectly 

with both of our areas of research. A proof of concept developed in Laura 

Wexler’s public humanities graduate seminar turned into a full- fledged digi-

tal humanities public project thanks to the National Endowment for the 

Humanities Office of Digital Humanities.

Our work became a popular public project that would welcome millions 

of visitors and encourage numerous extensions and revisions. We created 

interactive visualizations that allowed exploration of almost all the col-

lection’s available metadata. Visitors could view one of several interactive 

maps, follow the journeys of individual photographers, search by themes, and 

explore the photographic captions. The critical context and the contribu-

tion of these elements to visitors’ understanding of the FSA- OWI collection 

should not be understated. However, something seemed missing. We were 

working with an extensive collection of documentary photography, and it 

was ultimately the photographs that drew us and others to this collection. 

Introduction

112449_14046_ch01_4P.indd   1112449_14046_ch01_4P.indd   1 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


2 Introduction

-1___

0___

+1___

Our work, however, facilitated an aggregative analysis of every element but 

the photographs. The images were only accessible by looking at them indi-

vidually, with no way to search by visual themes or identify objects and 

people present within the frame. There was a disconnect between the main 

objects of interest and the affordances provided by our work.

The absence of image- based methods in our initial iterations of Photo-

grammar was driven by a scarcity of readily available tools, not a lack of 

interest. Digital images are challenging to work with computationally, for 

reasons that we interrogate in the following chapters. The best available 

methods we could find performed poorly on historic black- and- white pho-

tographs. Face detection methods missed more faces than they found, fail-

ing to find faces that were not at a particular angle and unable to detect 

anyone wearing a hat. Algorithms for detecting objects in images were 

more likely to produce comically bizarre predictions than usable informa-

tion. Methods for aggregating based on dominant colors fared better but 

were not well suited to our collection of predominantly black- and- white 

photographs. Illustrations of these predictions were a consistent element 

of the earliest talks we gave on the work. Our favorite example came from 

the photograph featured in chapters 1 and 2 of a shepherd riding a horse 

in a field next to his sheepdog. Even though it was in vivid color and con-

tained three distinct objects within the lexicon of popular computer vision 

algorithms, most failed to identify any element of the image correctly. The 

experience led us to ask more questions about exactly how these algorithms 

were built and for whom. We kept asking ourselves: what ways of seeing are 

these built to view and what if we thought differently about why we should 

use them.

By 2016 the landscape of available tools for working computationally 

with images had undergone a dramatic expansion. Software libraries such 

as darknet (2016), TensorFlow (2015), keras (2015), PyTorch (2016), and 

Detectron (2017) suddenly provided out- of- the- box access to increasingly 

accurate and powerful computer vision algorithms.1 Scholars working 

with digital collections of images began to use this new set of approaches. 

Applications appeared in venues such as the Culture Analytics program at 

UCLA’s Institute for Pure and Applied Mathematics, workshops held by the 

special interest group for audiovisual material within the Association of 

Digital Humanities Organizations (ADHO), and articles in the newly cre-

ated International Journal for Digital Art History. Our work shifted as well. 

112449_14046_ch01_4P.indd   2112449_14046_ch01_4P.indd   2 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Introduction 3

___-1

___0

___+1

Presentations that previously ended by critiquing algorithmic results were 

replaced with forward- looking examples of how computer vision was help-

ing us reimagine the FSA- OWI collection by providing approaches for a 

visual search and discovery interface. Rather than relying solely on existing 

computer vision algorithms, we began to customize and build algorithms 

that viewed in the ways that furthered our areas of interest.

Our excitement about the improvements in computer vision algorithms 

was tempered by our prior experiences that had highlighted the compara-

tive difficulty of training computers to understand digital images. The tools 

seemed to be producing helpful information, but what features of the images 

continued to be lost through their algorithmic transformation? Many of 

these new tools were created or sponsored by large corporations and gov-

ernment entities. What are the implications of aligning our analyses with 

the interests of these organizations? Software for data exploration and visu-

alization was not built around the study of digital images. How can our 

exploratory methods catch up with the new methods in computer vision? 

Numerous scholars in media and visual culture studies— such as John Berger, 

David Bordwell, Lisa Cartwright, Stuart Hall, Lev Manovich, Lisa Nakamura, 

Leigh Raiford, Marita Sturken, and Laura Wexler2— have stressed the impor-

tance of thinking carefully about how images are created, circulated, and 

interpreted. When applying complex computational approaches to the study 

of digital images, it is as vital as ever to consider these questions. To enable 

the careful and critical computational exploration of digitized visual collec-

tions, we need a cohesive theory for how computer vision creates meaning 

and a methodological specificity that takes into account the intricacies of 

digital images as a form and format.

In this text, we present a theory and methodological description of what 

we refer to as distant viewing, the application of computer vision methods to 

the computational analysis of digital images. Our goal is to offer a construc-

tive and generative critique of computer vision that focuses on enabling 

fruitful applications. To the best of our knowledge, this text is the first book- 

length treatment that approaches the application of computer vision to the 

study of visual messages as its own object of study. The distinction here is 

important because our approach allows for a critical understanding of the 

possibilities and limitations of existing computer vision techniques. It also 

provides a framework for a reflexive understanding of computer vision as a 

way of circulating and producing knowledge.

112449_14046_ch01_4P.indd   3112449_14046_ch01_4P.indd   3 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


4 Introduction

-1___

0___

+1___

The focus of distant viewing on digital images is a pragmatic one, result-

ing from the fact that the application of computer vision requires machine- 

readable inputs. However, this does not limit our objects of study to 

born- digital materials. Distant viewing can be applied to digitized collections 

originally produced in almost any medium. For example, we can apply our 

approach to digitized collections of photographs, photographic negatives, 

newspapers, comics, and posters. We can also work with digital images of 

material culture, something we return to in chapter 6. Distant viewing is also 

not limited to still images; it can be used to study collections of objects from 

media such as television, film, and video games. An example of distant view-

ing applied to a pair of television series is illustrated, for example, in chapter 

5. In most cases, when one is applying computer vision to a digital image, we 

argue that this is distant viewing.

Our terminology is motivated by the concept of distant reading from the 

field of computational literary studies. The specific meaning and impor-

tance of the term distant reading has been extensively discussed; it is not our 

goal to make specific connections or proclamation within these debates.3 

Rather, our terminology signals a general interest in adapting the compu-

tational literary studies approach of applying computational and statistical 

techniques to large corpora in the service of humanistic research questions.4 

While certainly not without their critics, these approaches have opened 

exciting new lines of scholarship.5 Our terminology also signals a departure 

from the textual focus of literary studies. The process of interpreting a visual 

message is semiologically and phenomenologically different from the act of 

reading a text, which we theorize in chapter 1. As we will explore in the 

following chapters, these differences lead to important changes in the way 

that we can apply and interpret the results of computational analyses.

In the tradition of visual culture studies and computer vision as well as 

our history of collaboration, we take a transdisciplinary perspective to our 

work. Both of us were trained in interdisciplinary fields that taught us the 

power of thinking across boundaries of disciplines and fields. We primar-

ily draw from and engage with scholarship from film and media studies, 

visual semiotics, digital humanities, information science, computer sci-

ence, and data science. The text’s structure and focus are designed to be leg-

ible and useful to audiences coming from any of these varied perspectives.

The first two chapters establish our main theoretical and methodologi-

cal claims about distant viewing. Chapter 1 begins by investigating what it 

112449_14046_ch01_4P.indd   4112449_14046_ch01_4P.indd   4 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Introduction 5

___-1

___0

___+1

means to say that computer vision “understands” visual inputs. We draw 

from information science and semiotics to illustrate why the way that digi-

tal images convey information necessitates a different approach. Specifi-

cally, we see that this process involves creating annotations that capture 

some, though never all nor ever perfectly, of the information present in the 

images. We conclude the chapter by showing how the process of annota-

tion can be seen as a machine- mediated way of viewing images; this leads 

us to understand how existing scholarship in media studies shapes the 

application of computer vision. In chapter 2, we engage with the method-

ological aspects of working with computer vision annotations. We inves-

tigate how standard approaches used in data science to explore data must 

be adjusted when working with computer vision, resulting in four phases 

of analysis. Namely, we must annotate our collection using computer vision 

algorithms, then organize the annotations and metadata, explore the data 

and our research questions, and finally communicate the results. The first 

two chapters engage in a close analysis of a single image, the FSA- OWI 

photograph of a shepherd mentioned above. We hope to model how com-

putational analyses should also help highlight, rather than supplant, the 

close reading of individual images.

Chapters 3 through 6 present the use of distant viewing within four dif-

ferent application domains. As readers move from chapter to chapter, the 

complexity of the computer vision models build. Each chapter is structured 

around the first three phases of the distant viewing method described 

in chapter 2: annotate, organize, and explore; the fourth phase, communi-

cation, is this book. After establishing a research question, we start by 

understanding one or more annotations provided by computer vision algo-

rithms, organize other metadata attached to the collection, and finish by 

conducting an exploration of the organized data. Along the way, we discuss 

the limitations of these algorithms as we think carefully about exactly what 

these computer vision algorithms view, and do not view. Chapter 3 inves-

tigates the use of color in movie posters and its relationship to genre. We 

see how distant viewing can address complex research questions even when 

using relatively low- level annotations. In chapter 4, we apply a region seg-

mentation algorithm to the photographs from the FSA- OWI archive. This 

chapter shows how computer vision annotations can both support and sup-

plant the organizational logic of the archive. We illustrate in chapter 5 how 

distant viewing can also be used with moving images. We see how formal 

112449_14046_ch01_4P.indd   5112449_14046_ch01_4P.indd   5 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


6 Introduction

-1___

0___

+1___

film elements can be applied to study issues of gender and power within a 

pair of network- era sitcoms. Finally, in chapter 6, we apply distant viewing 

to a collection of images from a large encyclopedic museum to see how 

computer vision can open digital collections through public interfaces.

Our excitement about the possibilities for the computational analysis 

of collections of digital images has been shared by many other research 

groups. Some of the earliest examples come from the manual annotation 

of film and television metrics by Barry Salt, Gunars Civjans, Yuri Tsivian, 

and Jeremy Butler.6 Recently, the journal Digital Humanities Quarterly (DHQ) 

sponsored a special issue focused on film and video analysis in 2020, with 

articles describing projects such as Barbara Flueckiger’s FilmColors project 

and Masson et al.’s Sensory Moving Image Archive project (SEMIA).7 Along 

with Stefania Scagliola and Jasmijn Van Gorp, we edited another DHQ 

special issue titled “AudioVisual Data in DH” with over twenty research 

articles from a wide variety of disciplines and nation contexts.8 Interest has 

also expanded from the growing field of digital art history, which has had 

several special issues, conferences, and a new journal that have included a 

significant amount of computational work. Notably, in the first issue of the 

International Journal for Digital Art History, K. Bender made the first use of 

the term “distant viewing” within his study of the iconography of Aphro-

dite/Venus.9 Numerous other exciting research papers have been published 

in other journals, such as Nanne Van Noord, Ella Hendriks, and Eric Post-

ma’s study of artistic style and Laure Thompson and David Mimno’s analy-

sis of the study of Dadaism.10 We hope that our work in this book further 

enables and encourages more developments in these and other areas.

The book is designed to be read and used. Along with being open access, 

the text is organized such that the chapters should be readable in any 

order. One reader might be interested in the theory and then an application. 

Another reader might interested in a particular application and therefore 

wish to start with one of the applications before engaging with the more 

theoretical opening chapters. Many of the results in the following chapters 

are presented as tables. We chose to communicate results using numeric 

tables because of the limitations of other visualization types within the 

existing print form, such as the lack of interactivity and color. Other ways 

of visualizing these results are also given in the supplementary materials.

While making novel contributions to the fields of data science and digi-

tal humanities, we have avoided superfluous technical jargon.11 There is 

112449_14046_ch01_4P.indd   6112449_14046_ch01_4P.indd   6 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Introduction 7

___-1

___0

___+1

significant translation work to do when talking across fields and forging 

transdisciplinary scholarship. We aimed for a writing style that is inclusive 

yet precise, while the footnotes provide more technical descriptions. A glos-

sary of common terms, particularly for terms that may be used differently 

in different communities, is included at the end of the text to aid in this 

process. In addition, we have published datasets, code, and many addi-

tional visualizations under an open- source license that replicate and further 

explore the applications described in the text. All of these can be viewed 

and downloaded on the book’s accompanying website, found here:

https:// distantviewing . org / book

Finally, we have developed the Distant Viewing Toolkit, open- source soft-

ware made possible through generous funding from the National Endow-

ment for the Humanities and the Mellon Foundation, that puts theory and 

method into practice. Information on how to install a current version of 

the Distant Viewing Toolkit can also be found at the link above.

By theorizing and offering a method, our approach of distant viewing 

participates in the call for a more careful use of algorithms in our soci-

ety. When we understand computer vision as a way of seeing, we are then 

accountable to the histories of vision and the ways we train algorithms 

to see, look, and view. We are also accountable for what they do not see, 

look, and view. We have had a plethora of conversations with colleagues 

who attest to the neutrality of algorithms and resist ideas of algorithms as a 

technology of vision and mode of communication inculcated in social and 

cultural pasts, presents, and futures. Distant viewing challenges such claims 

and calls on us to ask each time a computer vision algorithm looks at an 

image what is this algorithm viewing, mislabeling, and missing as well as 

why did we design this algorithm to view in this way. By doing so, we can 

more carefully engage with the computational analysis of digital images. 

Now, let’s go distant viewing.

112449_14046_ch01_4P.indd   7112449_14046_ch01_4P.indd   7 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024

https://distantviewing.org/book


-1___

0___

+1___

112449_14046_ch01_4P.indd   8112449_14046_ch01_4P.indd   8 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


___-1

___0

___+1

Whether gazing at a piece of art through the lens of an iPhone in the Lou-

vre or tapping away at Instagram Stories among the flicker of the screens 

in Times Square, we are surrounded by images. Visual messages impact our 

daily lives and have done so for centuries. More recently, technologies that 

have enabled born- digital images are also facilitating mass digitization as 

institutions such as libraries, archives, and museums produce digital images 

stored on servers across the world. How does one go about analyzing the 

messages carried by these visual forms?

There is a plethora of different approaches for studying the messages 

conveyed by visual media. Often these consist of applying theories from 

fields such as art history, film studies, and media studies to a focused set of 

images. These approaches can inform insights such as the topics depicted 

at a certain moment in time or contribute to an analysis of how formal 

elements lend the medium its claims to truth.1 Examples of powerful close 

analyses of visual messages include Elizabeth Abel’s study of Jim Crow 

politics in the American South, John Berger’s study of gender and western 

art, Herman Gray’s study of race in 1980s and 1990 TV, and Laura Wex-

ler’s study of Alice Austen’s photographs from Ellis Island.2 Many ques-

tions require studying a small set of images in relation to a larger whole, 

which can be accomplished by meticulously combing through and viewing 

images stored in a physical or digital archive.3 Combined with informa-

tion from other archival sources, this approach captures a large portion 

of research methods in the study of visual messages and has frequently 

formed an important aspect of our own work.

Some questions regarding visual messages require identifying subtle pat-

terns across a large collection of images. For example, we might want to 

understand how improving television quality during the twentieth century 

1 Distant Viewing: Theory

112449_14046_ch01_4P.indd   9112449_14046_ch01_4P.indd   9 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


10 Chapter 1

-1___

0___

+1___

changed the way shot angles were used to tell a story. Or, how lighting deci-

sions in Hollywood films from the 1970s were used to challenge or establish 

gendered or racial stereotypes. Similarly, we might be interested in under-

standing the themes in a photographic archive with hundreds of thousands 

of images or visualizing different themes in paintings held by a large ency-

clopedic art museum. Addressing these questions begins to exceed our abil-

ity to remember and to view all of the relevant images. One approach to 

working with these collections is to use a quantitative social science meth-

odology, such as content analysis, in which a random subset of images is 

manually labeled according to the information being studied.4 However, 

this approach has the downside of viewing only part of a collection and, 

because of the labor involved in creating the labels, is only able to address a 

small set of predefined research questions. What we want, rather, is to view, 

explore, and interpret visual messages across a collection of digital images. 

This should be an iterative, exploratory process that mirrors the approaches 

that we turn to when working with a smaller collection of images. Such a 

process, it would seem, requires a different methodology.

Excitement has recently increased about the use of computer vision 

algorithms— computational methods that try to replicate elements of the 

human visual system— to assist in the study of visual collections. Algorithms 

applied to digital images through software can process large amounts of 

data in significantly less time than would be required to perform a simi-

lar task manually. Computers can aggregate patterns and surface connec-

tions that may be difficult to detect otherwise. These insights can lead to 

new ways of seeing and exploring visual data. Melvin Wevers and Thomas 

Smits, for example, have called for a “turn toward the visual” within the 

digital humanities through the application of computer vision techniques 

that “open up a new world of intuitive and serendipitous exploration of 

the visual side of digital archives.”5 Similarly, in his recent book Cultural 

Analytics, Lev Manovich argues for and demonstrates the use of computer 

vision and other computational approaches for the “exploration and analy-

sis of contemporary culture at scale.”6 As highlighted in the introduction, 

we have also been excited by the possibilities of using computer vision 

algorithms in our work on twentieth- century US documentary photogra-

phy. A cornerstone of visual culture studies is that images make meaning 

differently than other forms of expression do; we must account for this 

fact when applying computational techniques to large collections of visual 

materials. By combining the growing body of applications of computer 

112449_14046_ch01_4P.indd   10112449_14046_ch01_4P.indd   10 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 11

___-1

___0

___+1

vision to the study of digital images with the work of visual culture studies, 

we will offer a cohesive theory that explores the methodological and epis-

temological implications of using computer vision as a tool for the study of 

visual messages.

In this chapter we present the theory of distant viewing as a way of 

understanding how computer vision works and enables the exploration 

of collections of digital images. Specifically, we will develop a theoretical 

understanding of how distant viewing—  the application of computer vision 

methods to the computational analysis of digital images— works, and why 

it is needed. This work emerges from the intersection of theories from sev-

eral fields, including visual semiotics, media studies, communication stud-

ies, information science, and data science. By drawing on work across the 

humanities, social sciences, and sciences, we offer an interdisciplinary the-

ory that interweaves the ways of knowing and understanding that animate 

a range of fields to understand the relationship between computer vision 

and digital images.

Attention to features of image analysis becomes critical when using com-

putational methods. Visual culture studies and visual semiotics have estab-

lished that visual materials make and transmit meaning differently than 

other forms of communication.7 It is necessary, then, to consider how these 

differences affect the modes of research afforded by computational tech-

niques and how the unique ways that images make meaning are accounted 

for within specific computational methods such as computer vision. At the 

same time, using algorithms to assist in the processing of visual materi-

als mediates the task of understanding through a digital, computational 

process. The technologies of computer vision are being trusted to “view” 

collections of still and moving images. This raises several pressing questions 

about the ways of seeing that specific algorithms engage in and why. For 

example, applications to support the military and surveillance state have 

motived many of the most studied and accessible computer vision meth-

ods, such as face detection and object- tracking. Applying these methods is 

never culturally neutral. Distant viewing responds to the need for a theory 

of how computer vision algorithms serve as a means of analysis to study 

visual materials, and can account for the ways that these algorithms medi-

ate the interpretation of digital images.

Our use of the terms distant and viewing each signals a component of 

generating annotations from images. The first is the distance from the eye, 

for computational methods “see” by calculating images as numbers. The 

112449_14046_ch01_4P.indd   11112449_14046_ch01_4P.indd   11 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


12 Chapter 1

-1___

0___

+1___

second is distance as scale. Through computer vision, interpretation of 

images can exceed a person’s physical ability to view and remember. At the 

same time, distance is not objective. Viewing makes explicit that the com-

putational processing of images is shaped by ways of seeing and practices 

of looking and, therefore, by a set of social and cultural decisions. Using a 

term explicitly linked to visuality also signals how images convey messages 

differently than forms such as text. By theorizing computer vision as a com-

putational mode of communication for decoding messages, claims to objec-

tivity through computational interpretation no longer hold. Instead, the 

terms together, distant and viewing, make explicit that computer vision is a 

technology of communication shaped by people and imbued with cultural 

and social values. Chapter 2, where we turn to a methodological under-

standing of distant viewing, provides a detailed analysis of how the pro-

cess of annotation integrates with existing data analysis pipelines. Before 

addressing the practices of distant viewing, we consider the epistemologi-

cal and semiological implications that come from exploring digital images 

computationally using algorithmically generated information.

In the following sections, we start by drawing on concepts from visual 

semiotics and information science to establish that the use of computer 

vision to create structured annotations is necessary because of the way digital 

images are stored and the way images convey meaning. We then establish a 

framework for the specific ways that distant viewing uses computer vision. 

We show how images are converted into structured annotations that serve 

as mediators throughout the process of computation. Then, we illustrate 

how the application of computer vision can be understood as machine- 

based ways of seeing. The process of seeing through a computer is subject to 

culturally influenced factors, often mirroring human- based ways of seeing. 

Recognizing how these influences affect what information is privileged and 

hidden by modern computer vision algorithms allows us to understand the 

possibilities and limitations of distant viewing.

Meaning Making through Images

The process of interpreting meanings encoded in an image is a part of our 

daily lives, often implicit, and occurring incredibly quickly. For example, 

what do we interpret when we view a photograph in an online newspaper of 

a sandy beach with palm trees bent over precariously in the same direction, 

112449_14046_ch01_4P.indd   12112449_14046_ch01_4P.indd   12 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 13

___-1

___0

___+1

large waves crashing onto the shore, and dark clouds on the horizon? Many 

people, even before reading an accompanying caption or headline, will have 

a near- instantaneous realization that the news is covering the existence of a 

severe weather event. How is this kind of information transferred through 

an image, such as the watercolor or photograph in figure 1.1? Meaning is 

primarily interpreted based on elements that resemble a storm: the effects of 

dark skies and oncoming heavy winds. The same image could be meaning-

fully printed in newspapers around the world, regardless of the language of 

the newspaper. Attention to how images make meaning becomes a neces-

sity to understand how to apply computer vision and interpret the results. 

Comparison with text demonstrates the challenges.

Textual data is described by characters, words, and syntax. Read within 

a particular cultural setting, these elements are interpreted as having mean-

ing. Linguistic elements, such as words, serve as explicit signs that corre-

spond to objects primarily by convention.8 The word pencil in English is 

typically used to refer to a long cylindrical object that contains inner solid 

marking material (such as graphite or charcoal) surrounded by an outer 

material (such as wood) for writing or drawing. Millions of English speakers 

Figure 1.1
On the left, a Winslow Homer watercolor depicting a scene from Nassau, Bahamas, 

in 1898 (Metropolitan Museum of Art, 06.1234; Palm Tree, Nassau (1898), https:// 

www . metmuseum . org / art / collection / search / 11131). On the right, a photograph by 

Brigitte Werner taken at Hayman Island, Australia, in 2019 (https:// pixabay . com 

/ photos / hayman - island - australia - travel - 745789 / ).

112449_14046_ch01_4P.indd   13112449_14046_ch01_4P.indd   13 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024

https://www.metmuseum.org/art/collection/search/11131
https://www.metmuseum.org/art/collection/search/11131
https://pixabay.com/photos/hayman-island-australia-travel-745789/
https://pixabay.com/photos/hayman-island-australia-travel-745789/


14 Chapter 1

-1___

0___

+1___

induce the link between the six- letter word and the definition by their 

shared usage. That is, most words function as symbols, a socially agreed- 

upon connection between the word and the concept represented by the 

word.9 Grammatical constructs such as verb conjugation, plurality, and 

object- verb relationships operate similarly within a particular language to 

produce higher- level meanings between individual words.

Visual forms such as film and photographs convey meaning in a different 

way than text.10 They do not convey information primarily through agreed- 

upon relationships.11 A photograph, for example, in its most basic form 

is a measurement of light through the use of either chemically sensitive 

materials or a digital sensor.12 The objects represented within a photograph 

can typically be identified by people who speak different languages.13 As 

Roland Barthes argues, there is a “special status of the photographic image: 

it is a message without a code.”14 In photography, it is not necessary to con-

struct an explicit mapping between the visual representation and what is 

being represented. The relationship between the photograph and the object 

represented by the photograph is signified by shared features between the 

object and its photo.15 In other words, meaning is conveyed through the 

photograph’s mimetic qualities.

A similar relationship holds for other visual forms. Both paintings and 

photographs illustrate and circulate concepts through characteristics such 

as lines, color, shape, and size.16 Images serve as a link to the object being 

represented by sharing similar qualities. It is possible to recognize a paint-

ing of a particular person by noticing that the painted object and person 

in question share properties such as hairstyle, eye color, nose shape, and 

clothing. The representational strategies of images, therefore, differ from 

those of language. While often rendered meaningful in different ways 

through language, visual material is pre- linguistic.17 The French poet Paul 

Valéry eloquently described this phenomenon as “looking, in other words 

forgetting the name of the things that one sees.”18

Interpreting images is further complicated by the amount of variation 

in images intended to convey the same or similar concepts. Photography 

offers an example again. The culturally coded elements of photographic 

images coexist with the raw measurements of light. The cultural elements 

are exposed through the productive act of photography— what Barthes refers 

to as the image’s “connotation.”19 Consider the images in figure 1.1. They 

were created over a century apart on nearly opposite sides of the world. 

112449_14046_ch01_4P.indd   14112449_14046_ch01_4P.indd   14 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 15

___-1

___0

___+1

On the left, the painting was created by pushing pigments suspended in 

water across a piece of paper. On the right, we have a digital image cre-

ated by capturing the components of light observed over a small fraction 

of a second by an array of digital camera sensors. Yet, despite these differ-

ences, viewing each image conveys a similar scene of palm trees blowing 

in the wind in front of an ocean bay. Further connotations of the images 

can be built up from these elements. We may view either of these images 

as connoting scenes of idyllic luxury and relaxation, like the scenes one 

might post on social media from a vacation at a tropical beach. Or, possibly 

with some additional context, we might understand the movement of the 

palm trees as representing a destructive and dangerous oncoming storm. 

Interpreting the messages of a single image requires decoding its individual 

elements, an act that becomes even more important when working with a 

large collection.

Interpreting the connotation of visual messages is further shaped by 

people’s beliefs and values.20 As media theorist Stuart Hall argues, the mes-

sages encoded and decoded from images are not objective but shaped by 

the cultural, social, and political meanings that people want to convey 

and are positioned to interpret. For example, the photographer may have 

taken the photograph shown in figure 1.2 to convey loyalty. Whether the 

viewer decodes that message is not necessarily a given, depending on their 

background. A viewer from a different position may interpret the image as 

about the man’s dominance of the landscape and, therefore, ideas about 

masculinity. How one interprets an image is shaped by the larger cultural 

and social ideologies that inform how one interprets the world.21 The mes-

sages that are encoded may not be decoded, and messages that were not 

intentionally encoded may be decoded. How information is encoded and 

decoded is shaped by cultural ideologies, embodied ways of knowing, social 

scripts, and grammars of everyday life from which we learn and which we 

rely on to interpret the world.22 How one views an image through compu-

tational processes is a part of the same process.

We can extend these considerations to the computational analysis of digi-

tal images. While a person looking at an image can decode the objects and 

meaning of the visual messages, the process of making these explicit decisions 

is what makes images so challenging to study computationally. We rely on 

learned semiotic and ideological systems to interpret the meaning of visual 

material. This interpretive process must be made explicit in computational 

112449_14046_ch01_4P.indd   15112449_14046_ch01_4P.indd   15 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


16 Chapter 1

-1___

0___

+1___

processes. Therefore, the computational analysis of images requires an algo-

rithmic interpretation of the meaning of digital images. Distant viewing asks 

that we acknowledge and take into account the interpretative, ideological 

work of algorithmically analyzing digital images. To further deduce how the 

process of computationally decoding the messages in an image works, we 

must understand how images are stored and analyzed by a computer.

Working with Digital Images

The unique features that humanistic inquiry have identified about work-

ing with visual materials apply to analyzing digital images. The way images 

Figure 1.2
A color photograph digitized as part of the Farm Security Administration / Office of 

War Information (FSA- OWI) archive, held by the US Library of Congress. The pho-

tograph is credited to staff photographer Russell Lee and cataloged as being taken in 

August 1942. The item’s caption reads: “Shepherd with his horse and dog on Grav-

elly Range, Madison County, Montana” (Library of Congress, https:// www . loc . gov 

/ pictures / item / 2017878800 / ).

112449_14046_ch01_4P.indd   16112449_14046_ch01_4P.indd   16 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024

https://www.loc.gov/pictures/item/2017878800/
https://www.loc.gov/pictures/item/2017878800/


Distant Viewing: Theory 17

___-1

___0

___+1

are stored as pixels mirrors the semiotic differences between forms such as 

text and image. In lieu of the human eye, computational interpretation 

through computer vision is used to decode the meaning of images. See-

ing through computer vision converts the act of viewing into a computa-

tional process. The speed of computational processes thus enables analysis 

at scale and the decoding of millions of images in a short period of time. 

Whether applied to a single image or at scale, the information captured by 

computer vision becomes a mode of communication for interpreting mes-

sages in digital images.

Digital images are stored in formats that make it possible to see images 

on a digital screen. A computer displays images as pixels, the “minute indi-

vidual elements in a digital image.”23 The word “pixel” itself reflects this rela-

tionship. It is a combination of pix, the plural form of pic (which is short 

for picture), and el, which is an abbreviation of element.24 The term emerged 

alongside the terms pix and pel in the 1960s among researchers working on 

image processing who were trying to find ways to describe the basic elements 

of images. The digital image processing and artificial intelligence communi-

ties embraced the term pixel during the 1970s, followed by the television and 

image sensor communities in the 1980s.25 Debates and norms across research 

disciplines and industries over the last several decades have resulted in slight 

variations in definitions and uses of the term. The most common definition 

today is that a pixel is the smallest element of a digital image.

Computers work with digital images as a set of numbers that comprise 

pixels. As shown in figure 1.3, a pixel is stored as three numbers, indicating 

the amount of red, green, and blue light needed to represent the color of 

one point in the image. It is possible to create almost any color with this 

method.26 Adding the maximum amount of red and green and turning off 

the blue, for example, results in yellow. The complete digital image in figure 

1.2 is represented by a computer by storing three rectangular grids corre-

sponding to the red, green, and blue light intensities for every pixel in the 

image.

Returning to figure 1.2 helps illustrate a disconnect between the com-

puter’s storage and our understanding of the image. Figure 1.4 shows a 

grayscale version of the same image at four different zoom levels. Each 

zoom level is centered on the left eye of the horse. When we look at the 

largest image, it seems apparent that this is a photograph of a horse and 

that its left eye is at the center of our cropped image. When we look at the 

112449_14046_ch01_4P.indd   17112449_14046_ch01_4P.indd   17 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


18 Chapter 1

-1___

0___

+1___

two highest zoom levels in isolation, it seems impossible to guess that these 

are images of a horse’s eye. However, these same pixels were identified as an 

eye in the lower zoom levels. How is it possible that the same numbers can 

be interpreted so differently?

When we look at the display of a digital image, we understand the pix-

els in context with one another. We can only identify the eye after putting 

Figure 1.3
The upper- left figure is a cropped and lower-resolution version of the shepherd seen 

in figure 1.2. The other panels show the red (upper right), green (lower left), and blue 

(lower right) pixel intensities. The numbers indicate how bright each color channel 

is as an integer from 0 to 255.

112449_14046_ch01_4P.indd   18112449_14046_ch01_4P.indd   18 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 19

___-1

___0

___+1

it into perspective with other pixels that resemble the horse’s ears, nose, 

hair, and neck. These features are similarly understood only by putting 

them in perspective with all the other pixels. The implication for ana-

lyzing digital images, then, is that a substantial gap exists between the 

numeric representation of the image by the computer and the parts of 

the image that one sees when viewing the image. The pixels that represent 

figure 1.1’s two images— from a digital scan of the watercolor image and 

Figure 1.4
Four zoomed and grayscale versions of the image in figure 1.2. The highest level of 

zoom (upper left) also includes the grayscale pixel intensities, an integer from 0 (black) 

to 255 (white). All of the images are centered on the left eye of the horse in the image.

112449_14046_ch01_4P.indd   19112449_14046_ch01_4P.indd   19 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


20 Chapter 1

-1___

0___

+1___

the born- digital image on the right— provide another example. The spe-

cific pixels used and printed from the two images are completely different, 

yet we are able to understand the images as both representing scenes with 

elements of palm trees, an ocean, and the sky. With all of this variation, 

how do we tell the computer which pixels to look for and what those sets 

of pixels mean?

The challenge of interpreting pixels is further complicated by how 

images are stored compared to other data types. We return to the com-

parison between images and text, but this time focusing on how they are 

digitally stored. Text is written as a one- dimensional stream of characters. 

These characters are written in an encoding scheme that defines an agreed- 

upon mapping from 0s and 1s into symbols.27 Compression software can be 

used to store the encoded text using a smaller amount of disk space, but this 

compression must be reversible. Going from the compressed format back 

into the raw text must be possible without losing any information.

Since images are composed of pixels, they are in a different format. 

While displayed as an array of pixels on the screen, images can be stored in 

several compressed formats. Many standard compression formats, such as 

JPEG, use a lossy compression algorithm. These algorithms only approxi-

mately reconstruct the original image. Similarly, it is possible in any storage 

format to rescale an image to a lower resolution. This process saves storage 

space but results in a grainier version of the original. Differences in stor-

age methods between text and image data correspond to the semiological 

differences argued by scholars in media studies and visual culture studies. 

The fact that digital images can be scaled to a smaller size highlights the 

lack of explicitly coded elements within images. If an image consists of a 

code system, lossy compression will require losing some coded elements. 

However, images reproduced from compressed files have no detectable dif-

ferences from the original file for a moderate amount of compression.28 An 

illustration of how lossy compression affects an image is given in figure 1.5. 

The colors, shapes, and objects within the frame remain discernable even 

under extreme forms of information compression.

The format of digital images reinforces the semiological properties of 

visual materials. Pixels convey meaning only when put into context with 

one another by mimicking the act of viewing objects, people, and environ-

ments directly through the human visual system. The rectangular grid of 

pixels printed in different shades of gray in figure 1.4 convey the image of a 

112449_14046_ch01_4P.indd   20112449_14046_ch01_4P.indd   20 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 21

___-1

___0

___+1

horse, for example, because looking at the resulting print shares similarities 

to the act of looking at a horse in real life. The complex ways that images 

make meaning are a large part of what make visual forms popular kinds of 

cultural expression and what makes visual materials particularly exciting 

objects of study. These complexities, however, must be accounted for when 

applying a computational analysis to the study of visual materials, a task 

that we now turn toward to fully theorize distant viewing.

Figure 1.5
The image from figure 1.2 is shown (upper- left) along with five levels of increased 

compression. The compression algorithm uses a singular value decomposition on the 

individual color channels, reducing the dimensionality of the matrix of pixel intensi-

ties to 100, 50, 25, 15, and 5, respectively. We used this approach because it does a 

good job of showing what happens to the image under extreme forms of compression.

112449_14046_ch01_4P.indd   21112449_14046_ch01_4P.indd   21 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


22 Chapter 1

-1___

0___

+1___

Computational Exploration with Computer Vision

Returning to our primary task of working computationally with a collec-

tion of digital images, recall that standard methods for the visualization and 

exploration of data cannot be applied as- is to image data. Having seen the 

ways that visual materials convey meaning, we now have a more concrete 

way of understanding why applying computational methods to a collection 

of images presents certain difficulties. Fortunately, as is often the case, iden-

tifying the source of these obstacles will become the first step in producing 

a solution. An example will help further illuminate the challenges that we 

need to solve.

Consider the task of trying to automatically detect and characterize 

themes found within a large digital collection of newspaper articles. The 

explicit codes of written language provide a powerful tool for this task. 

Each of our texts can be split into individual words, the smallest linguis-

tic unit meaningfully understood in isolation, and the number of times 

a word is used can serve to encode information about each document.29 

Then, we could use the word counts to automatically find sets of terms that 

tend to co- occur within the same documents with a high probability. In 

other words, if one word from a detected topic is used in each document, 

other words within the same topic are also more likely to be used. With 

no explicit tagging of the dataset, this kind of model could, for example, 

detect a set of co- occurring words such as cloudy, cold, wind, and afternoon. 

Manual intervention is only needed to interpret this topic to conclude that 

these words all focus on the concept of “weather.” Both the interpretive 

acts of determining the keywords to associate with each article and assign-

ing a meaning to the co- occurring words may be delayed until after a model 

has been applied. These are crucial features of many methods for textual 

analysis; the power of word counts to convey a reasonable approximation 

of document- level meaning has been one of the most important tools for 

the computational analysis of textual data.

Now, consider the parallel visual task of finding and identifying themes 

within a collection of digital newspaper photographs. As we have argued in 

the previous two sections, there is no equivalent way of grouping together 

and counting raw pixel values in the way that we were able to do by group-

ing letters into words. For this reason, before we do a computational analy-

sis of a collection of images, one needs to interpret the messages encoded 

112449_14046_ch01_4P.indd   22112449_14046_ch01_4P.indd   22 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 23

___-1

___0

___+1

in the images. In other words, the images must first be viewed. There is no 

equivalent option, as there is with structured and textual data, to delay 

interpretation of the images’ meaning(s) before applying computational 

methods. In other words, the first step of interpreting each image must 

be to create a structured, or indexical, system that we can then aggregate 

and analyze. Because these interpretations will not perfectly capture all the 

information present in the image, care must be given to the ways that the 

images are interpreted relative to the guiding areas of inquiry.

To be more concrete, the computational analysis of a collection of digi-

tal images requires as a first step that the messages encoded within the 

images are interpreted through the construction of structured labels.30 We 

will refer to these labels as annotations and use annotation to describe the 

process by which these labels are created. Annotations can take a variety of 

different forms. They can be a single number, such as an indication of how 

many people are in an image. A single word can also be used as an annota-

tion, such as an indication of the name of an object at the center of the 

image. Other annotations include automatically generated sentence- length 

captions or a large set of numbers representing the amounts of predefined 

colors present in the image’s frame. Annotations can even take the form 

of images themselves, such as through the tagging of images with other 

images that contain the same people. Often, a collection of different kinds 

of annotations will need to be produced to address a specific question that 

one is exploring with a given collection of images. How, then, do we go 

about producing these annotations?

Manually creating structured annotations for a collection of digital 

images can be a laborious task. Even the production of a single, relatively 

clear label such as counting the number of people in the frame, can become 

prohibitively time- consuming when working with a large collection of 

images. More intricate annotations, such as outlining and describing every 

object in each image, are essentially impossible to construct manually for 

all but the smallest collections. Methods such as content analysis avoid 

these difficulties by only labeling a random sample of images and gener-

ally restricting the annotations to a small set of straightforward and rela-

tively easy to produce categories.31 These approaches make it impossible 

to iteratively explore a collection of images and limit our analysis to only 

a subset of possible research questions. Further, many more complex rela-

tionships between visual features and archival data can only be established 

112449_14046_ch01_4P.indd   23112449_14046_ch01_4P.indd   23 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


24 Chapter 1

-1___

0___

+1___

by working with the entirety of a collection. Another approach is clearly 

needed.

The process of creating and interpreting digital images through annota-

tions generated by computer vision is at the center of distant viewing.32 

Using algorithms to generate annotations allows us to work with the 

entirety of a collection. It also allows us to create intricate annotations that 

can be visualized, modeled, remixed, and aggregated through an iterative 

exploratory analysis. Our name for our approach to working with collec-

tions of digital images highlights the fact that the creation of annotations 

should be seen as a process of viewing, or interpreting, and that this view-

ing is done at a distance because it is mediated through the algorithmic pro-

cess of computer vision. An understanding of the field of computer vision 

will further elucidate the possibilities and limitations of distant viewing.

The field of computer vision focuses on how computers automatically pro-

duce ways of understanding digital images. The field is interdisciplinary by 

design, drawing on research in areas as diverse as computer science, physics, 

engineering, statistics, physiology, and psychology. Most tasks in computer 

vision are oriented around algorithms that mimic the human visual system. 

Tasks include detecting and identifying objects, categorizing motion, and 

describing an image’s context. While visual understanding is at the center of 

the field, computer vision algorithms may also take a multimodal approach. 

Algorithms may use, for example, image captions or film soundtracks to aug-

ment visual components in much the same way that humans integrate all of 

their sensory inputs to understand the world around them.

Our ability to construct high- quality computer vision annotations is 

driven by current research focuses within the field of computer vision. 

These directions, likewise, are influenced by the industry and government 

applications that fund the research. Some of the earliest computer vision 

algorithms were designed to identify images of numbers; this research was 

explicitly funded to sort mail envelopes based on detecting handwritten 

postal codes.33 High- accuracy algorithms exist for the detection and identi-

fication of faces within an image. The research behind these tasks has been 

driven in no small part by applications in surveillance, which we should 

engage with cautiously. Several tools provide high- quality annotations for 

the detection of cars, people, crosswalks, and stoplights. These annotations 

are the direct consequence of computer vision applications within the tech-

nology of self- driving cars. When we use computer vision algorithms to 

112449_14046_ch01_4P.indd   24112449_14046_ch01_4P.indd   24 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 25

___-1

___0

___+1

produce automatically generated annotations, it is crucial to remember the 

role these funding streams play in the content and structure of available 

algorithms.

The availability of accurate annotations is also a function of the level to 

which an annotation is abstract or culturally mediated. Some tasks are well- 

positioned to be addressed with computer vision. For example, computer 

vision algorithms are better than human evaluators at detecting defects 

in agricultural products.34 Similarly, a simple model can be used to iden-

tify the orientation of photographs with almost perfect accuracy.35 Both 

of these tasks have concrete “answers” and can be identified by looking 

at only a small portion of the image, making them relatively easy tasks 

within computer vision. Other annotations present more difficulties and 

bring to the fore ethical and social questions. The goal of automatically 

identifying a person’s emotion based on a still image is very difficult and 

shaped by cultural politics.36 Computer vision algorithms struggle to attain 

human- like accuracy even when classifying strong emotions within a single 

cultural context.37 When dealing with more subtle emotions across a range 

of cultures, the task becomes nearly impossible even for human annotators, 

much less a computational algorithm. The types of questions and analyses 

available through distant viewing are shaped by the relative difficulty and 

constructiveness of building the algorithms that are annotating. The chal-

lenge of determining if a task is amenable to exploration through computer 

vision is further complicated by determining exactly what we are decoding.

Decoding through Viewing

The process of distant viewing applies computer vision algorithms to auto-

matically interpret a layer of meaning within images through the creation 

of structured annotations. As signaled in our terminology, computer vision 

algorithms engage in the process of viewing an image. This characterization 

allows for a reflexive formulation of distant viewing. Whereas we have so 

far used visual semiotics to argue for the necessity of computer vision, we 

can similarly take computer vision as an object study itself which can be 

analyzed through the application of media theories. Theories of communi-

cation around decoding become key.

The process of annotating images with computer vision can be under-

stood as a mode of communication that transmits a message between the 

112449_14046_ch01_4P.indd   25112449_14046_ch01_4P.indd   25 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


26 Chapter 1

-1___

0___

+1___

materials of interest— the digitized images— and human audiences. To com-

putationally decode the messages, people must decide which annotations 

to look for in an image using computer vision. The human eye is replaced by 

computational processes that identify the features in images through num-

bers. As a result, computer vision decodes to interpret and convey encoded 

messages in digital images. At the same time, computational methods are 

created by people and are therefore not outside of cultural, social, and his-

torical ways of seeing and practices of looking.38 By theorizing computer 

vision as a computational process of decoding messages (“viewing”), the 

method of distant viewing makes explicit that computer vision is a tech-

nology of communication produced by people and therefore imbued with 

cultural, political, and social values.

How to interpret visual media in a digital form is a question about con-

veying and interpreting meaning. In his model of communication, Stuart 

Hall argued that messages are encoded and decoded.39 The sender produces 

a message in a form such as television. The message is circulated, and audi-

ences then interpret the message. The message encoded may not be the 

message decoded. The values and beliefs of the creator and receiver shape 

the messages that are conveyed and interpreted. The form of the medium 

also impact which messages are communicated. For example, digital images 

make and convey meaning differently than audio. The same image, such as 

a meme, will often be interpreted differently by an audience in the United 

States than an audience in France. Computer vision has become another 

powerful actor in the process of encoding and decoding digital images. 

Nevertheless, exactly how is it possible to understand this newer technol-

ogy of interpreting meaning in images?

Images encode messages. Exactly how they send those messages and which 

parts of the message are decoded are shaped by what is recognized and how 

the signs and symbols that comprise an image are interpreted. Visual cul-

ture studies, informed by semiotics and rhetorical studies, explores how 

images signify and communicate, which differs from how other forms of 

knowledge, such as text, do so.40 We return again to the relationship with 

text. Even at the level of an individual object, meaning is encoded in images 

differently than in text, as theorized by semioticians across fields such as 

linguistics, media studies, and visual culture studies.

Computer vision has become a way for people to create annotations to 

decode visual messages. As a set of computational processes designed to 

112449_14046_ch01_4P.indd   26112449_14046_ch01_4P.indd   26 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 27

___-1

___0

___+1

interpret images, computer vision emerged to address the issue of how to 

understand images. In other words, computer vision is a computational 

model of communication designed to interpret information from digital 

images. To do so requires building annotations that replicates the features 

necessary to interpret the meaning of an image. Therefore, computer vision 

algorithms look for specific features by following processes to recognize 

patterns that we determine based on a task.

The process of computational viewing through computer vision algo-

rithms produces new structures that attempt to capture layers of meaning 

within images. Creating structured data is often described as information 

“extraction” and aligns with the popular channel encoding model of com-

munication proposed by Claude Shannon. The model provides a commu-

nication framework describing how a fixed message is passed between two 

parties. It focuses on the amount of intrinsic information contained in a 

message and the amount of redundancy needed to ensure a high probabil-

ity that the resulting message will be transmitted between the two parties 

without any errors.41

Due to the nature of visual messages, however, Shannon’s communica-

tion model does not accurately capture the process of producing structured 

data through computer vision.42 The decoded messages do not symmetri-

cally represent an image’s intended meaning. Instead, in the language of 

Stuart Hall, computer vision algorithms are active participants in the pro-

cess of knowledge production through the act of decoding. During this 

process of decoding, computer vision algorithms produce structured data 

from visual inputs. The knowledge produced by the algorithms— such as 

label names for detected objects or a probability that the image was taken 

outdoors— are not objective or intrinsic to the images themselves. Instead, 

the kinds of data labels that are privileged, and the internal mechanisms 

used to produce them, are significantly influenced by the social contexts 

that motivated, produced, and circulated the algorithms themselves. In 

other words, the algorithms produce knowledge by interpreting visual 

materials within the frame of their own artificially produced social context. 

Framing the use of computer vision as an imperfect decoding process high-

lights the need to consider the underlying decisions privileged by existing 

algorithms.

Computer vision expands the rate and scale of interpretation by becom-

ing an intermediary between the eye and an image. Algorithms can iterate 

112449_14046_ch01_4P.indd   27112449_14046_ch01_4P.indd   27 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


28 Chapter 1

-1___

0___

+1___

over millions of images looking for features. The rate is increasing as hard-

ware such as high- performance computing and GPUs reduce the time for 

analysis. The ability to zoom out and view at a large scale is a powerful 

affordance of these recent advances. Messages that may have been difficult 

to interpret by looking at just a few images can be decoded through large- 

scale analysis. This changes not only the kinds of messages that we can 

decode but also what we can encode, since we now have a new mode of 

communication with which to interpret and convey messages.

Distance from the human eye combined with large- scale computation 

could lead to claims about objectivity. After all, powerful discourses have lent 

fields built on numeracy and quantitative evidence claims to neutrality and 

objectivity.43 However, theorizing computer vision as a mode of communica-

tion inculcated in the process of sending and receiving messages challenges 

such claims. Instead, computer vision becomes a cultural and social process. 

The annotations that we adopt, create, and resist through computer vision 

are in conversation with existing cultural, social, and technical values shaped 

by visual cultures. The concept of “viewing” becomes particularly important.

Decoding images requires decisions about which annotations to view 

with. How and what we choose to see and look for— where we direct our 

gaze, how we see, and how we perceive and discern visually— are culturally 

and socially shaped.44 Decoding, therefore, is not independent of visual cul-

tures. Instead, viewing is not simply a biological process but relies on ways 

of seeing and practices of looking that people learn from each other to inter-

pret the world through.45 Viewing, therefore, conveys that decoding through 

computer vision is a set of decisions about how to interpret visual messages 

that is shaped by cultural and social values, in addition to producing them.

The distinction, as theorized in visual culture studies, between seeing and 

looking becomes necessary for further expanding on the stakes of using the 

term viewing.46 Seeing occurs through the physical process of receiving light 

when one’s eyes are open. This does not mean that one is looking, which can 

be defined as actively seeking to see through visual perception. For example, 

one might be in a room and see a photograph but choose not to look at the 

image. One can also try to look and not see. For example, one may return 

to the room to look at that photograph but not be able to locate it because 

the lights are turned off. Seeing occurs when the eyes are open and whether 

we want to see or not. The act of looking is an intentional process where we 

decide what we want to see. Types of looking not only include what to look 

112449_14046_ch01_4P.indd   28112449_14046_ch01_4P.indd   28 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 29

___-1

___0

___+1

for but ways of looking, such as watching. Viewing, therefore, indicates the 

entanglement of seeing and looking in analyzing digital images. Scholars 

such as John Berger, Lisa Cartwright, Lisa Nakamura, and Marita Sturken 

have further theorized these distinctions as producing visual cultures that 

we learn, circulate, and rely on to decode the meaning of images.

Ways of seeing and practices of looking shape the encoded meaning in 

images. As John Berger argued in the popular 1972 BBC television series 

and subsequent book Ways of Seeing, “an image became a record of how X 

had seen Y.”47 He analyzed how X, a community such as White male Euro-

pean oil painters, had seen Y, women as forms to depict in the nude, and 

argued that they produced this way of seeing for the male gaze and thereby 

revealed as well as produced problematic gendered power relations. Lisa 

Cartwright and Marita Sturken expanded on this concept, calling for visual 

culture studies to address “practices of looking” to emphasize the inten-

tionality of looking in place of the passiveness of the biological process of 

seeing. One can see and not look. One can look and not see. Scholars such 

as Stuart Hall and Lisa Nakamura have further argued that these ways of 

seeing and practices of looking are shaped by and produce ideologies such 

as gendered and racialized visual cultures.48 Therefore, the ways of looking 

replicated through media shape which messages are encoded and decoded 

as well as being imbued with beliefs, ideologies, and values.

How we view is also a culturally, socially, and historically informed deci-

sion.49 Until the 1950s, most technologies of looking still involved the eye, 

such as a magnifying glass, telescope, and camera. The advent of comput-

ing and computer vision has enabled a way of looking that no longer relies 

on the physical process of the human eye.50 Yet, the term computer vision 

and scholarship in machine learning naturalize a computational process 

through the language of biological seeing and the eye.51 This research is not 

a biological process but rather is focused on emulating humans’ ways of see-

ing and practices of looking through the fundamental way that computers 

“see,” which is through computations based on the pixel intensities. Ways 

of seeing, such as color perception, and practices of looking, such as iden-

tifying people, are computationally decoded to interpret images. The result 

is an epistemological and ontological shift in what and how we view that 

does not easily fit into current theories.52

Therefore, we see that annotation through computer vision can be charac-

terized as decoding messages in digital images, as a form of communication 

112449_14046_ch01_4P.indd   29112449_14046_ch01_4P.indd   29 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


30 Chapter 1

-1___

0___

+1___

that changes how we “see,” and as a new scale with which we can view 

images. When annotations are built through computer vision, ways of see-

ing and practices of looking are encoded into the computational processes 

to decode information about the images. This allows us to shift how we 

understand computer vision. Rather than focusing on whether a computer 

vision algorithm can be objective, we focus on defining which ways of view-

ing are encoded and decoded, and assessing the possibilities, limitations, 

and effects of these decisions.

Conclusions

In this chapter, we have focused on the epistemological and semiological 

implications of applying computer vision algorithms to the study of digital 

images, which we have theorized as distant viewing. The theory is based on 

the ways that visual objects make meaning, which is further supported by 

the way that digital images are stored and displayed. These features neces-

sitate the creation of annotations that capture a way of viewing images, 

which requires the use of computer vision in order to apply distant viewing 

to larger corpora.

As a method, distant viewing also provides ways of reflexively and criti-

cally engaging in the computational analysis of images. While the next 

chapter will go into further detail, a discussion of a few possible avenues of 

inquiry using distant viewing is warranted. Ways of seeing and practices of 

looking shape our daily lives and are entangled in questions about power. 

Who gets to look, who is the subject of looking, and which practices of 

looking circulate are not neutral processes. Whether efforts by US commu-

nities of color to assert full citizenship through twentieth- century African 

American portrait photography, the use of the close shot in film on female 

bodies for the male gaze and thereby producing misogynist ways of see-

ing, or the use of the skin brightening and warming Instagram filters to 

assert ageist and racialized standards of beauty, visual cultures are being 

encoded and decoded through images constantly.53 Images, therefore, are 

shaped by and circulate practices of looking. However, these decisions are 

often implicit and therefore difficult to recognize and make explicit. Dis-

tant viewing allows for decoding digital images and their visual cultures, 

which we demonstrate in chapters 3 through 6 by viewing a range of still 

and moving images.

112449_14046_ch01_4P.indd   30112449_14046_ch01_4P.indd   30 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Theory 31

___-1

___0

___+1

As we go about distant viewing, there are significant reasons to be cau-

tious about using the method to identify and challenge ways of viewing. 

The annotations for computer vision are largely driven by industry and 

government applications. Whether they are trying to identify trucks and 

light posts for self- driving cars or people for surveillance technology, com-

puter vision methods are often driven by entanglements with capitalism, 

state power, and militarism.54 However, computational image analysis 

should not just be the domain of multinational corporations and govern-

ment. We can use distant viewing to ask different questions and to critique 

and question our visual cultures of computer vision. Furthermore, perhaps 

the most exciting part is that we can recreate and reimagine the role and 

possibilities of computer vision. Reconfiguring how to use and remake ways 

of viewing through computer vision algorithms can be a timely and labori-

ous project. Distant viewing offers one way that we can question existing 

annotations and remake computer vision.

Designed with the intention to mimic the human eye and neural pro-

cesses, computer vision algorithms look for certain features by following 

processes for calculating pixels to recognize patterns. Computer vision 

“sees” through numeracy and “looks” based on the assigned numerical 

patterns. Therefore, computer vision enables identification of practices of 

looking in visual materials and algorithmically creates practices of looking. 

Computer vision encodes social, cultural, historical, and political values 

algorithmically. Distant viewing, therefore, enables a reflexive view of com-

puter vision. We can use distant viewing of digital visual materials to inter-

rogate the ways of viewing embedded in computer vision.

Distant viewing provides a call to action. We are slowly cracking away 

at the facade that algorithms are unbiased and recognizing that algorithms 

can do incredible harm as well as good. Yes, there are algorithms of oppres-

sion.55 Yes, there are weapons of math destruction.56 However, we have fewer 

capacious theories for understanding the computational processes that 

produce ways of viewing at a large scale. So, we heed media scholar Steve 

Anderson’s call to interrogate and theorize vision technologies through the 

lens provided by media and visual culture studies, and zoom out and in 

through data science and digital humanities.57 As long as we are involved 

in the process of building annotations for seeing, and therefore viewing, 

then having a method and theory for analyzing, interpreting, and critiqu-

ing these computational processes matters. So, we need distant viewing.

112449_14046_ch01_4P.indd   31112449_14046_ch01_4P.indd   31 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


-1___

0___

+1___

112449_14046_ch01_4P.indd   32112449_14046_ch01_4P.indd   32 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


___-1

___0

___+1

Images. Images everywhere. We see, look, analyze, and interpret images, 

often in seconds. A newspaper cover photo of a palm tree bent over in the 

wind and waves crashing ashore can quickly convey a storm. A histori-

cal photograph from the 1930s of a man, horse, and dog standing upright 

together in a rural plain sends messages of dominance, fortitude, and loy-

alty. To add a different kind of example, think of the feed on Instagram 

and how quickly one swipes up while still interpreting the images posted 

by the accounts that one follows. Messages are quickly decoded based on 

ways of seeing and practices of looking that we have learned, constructed, 

and resisted.

What happens when we slow down and ask: What is this an image of, 

what are the messages being conveyed, and how do we know this? In 

other words, what if we want to analyze how the messages interpreted were 

visually constructed? What if we want to explore whether there are mes-

sages that were missed on the first view? And, what if we want to view these 

images through certain ways of seeing and practices of looking? A person 

can sit down and closely analyze an image, but this takes time and becomes 

a challenge when the number of images increases.

Consider the challenge of analyzing images from another angle. Librar-

ies, archives, and museums have made significant commitments to digitiz-

ing visual media. Priorities for digitization include collections with a large 

audience, those that the institution wants to bring attention to, or collec-

tions in a quickly degrading format. What if we want to describe each of 

these images? The decisions that one might make for an exhibition may 

not be the same as decisions designed for facilitating access, discovery, and 

analysis. What, then, do we do if we decide to change how to describe the 

images? How does one do this at large scale?

2 Distant Viewing: Method

112449_14046_ch01_4P.indd   33112449_14046_ch01_4P.indd   33 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


34 Chapter 2

-1___

0___

+1___

The approach we offer to this challenge, distant viewing, uses computer 

vision to computationally explore digital images. The previous chapter pre-

sented a theoretical treatment of the possibilities, limitations, and implica-

tions of the distant viewing approach. Here, we focus on the practical aspects 

of applying computer vision to a set of digital images. Building upon the 

stakes outlined in the previous chapter, we situate distant viewing within 

the field of data science. Our analysis focuses on how the method of distant 

viewing engages with existing data science methodologies while accounting 

for the unique ways that images make meaning and explicates the modeling 

assumptions that underlie the computational analysis of visual data.

As a starting point, consider a typical workflow for computational analy-

sis from data science, an interdisciplinary field that applies and develops 

methods to understand collections of data.1 The first diagram in figure 2.1 

illustrates a series of high- level steps involved in the processing of data. 

The steps and names are adapted from Hadley Wickham and Garrett Grol-

emund’s well- known text on the subject.2 The pipeline shown here is focused 

on the algorithmic aspects of working with the data and therefore does not 

include steps such as designing a research question and collecting data. The 

starting point, instead, involves loading and organizing information. This 

structured data is often stored in the form of a relational database or files 

in a well- known format. The pipeline’s first step, organization, includes 

standardizing names and units, identifying data errors, and combining 

or separating data according to the format needed for subsequent analyses. 

After the data are organized, the pipeline moves to the process of explora-

tion. Here, an iterative mix of visualizations, transformations, and model-

ing is used to understand the data and address various research questions. 

Finally, the third step involves communicating the results of the explora-

tion. The communication step can take various forms depending on the 

desired audience, such as short presentations, peer- reviewed papers, and 

digital projects. The arrows in the figure show the conceptual flow of infor-

mation. However, data analyses almost always require a more flexible itera-

tion back and forth between each part.

The ways that visual materials make meaning combined with the mechan-

ics of digital images and computer vision require a modification of the stan-

dard data science pipeline. Implicit in the organization step of the pipeline 

is the notion that producing structured data from the available inputs 

requires only a reorganization of the original dataset.3 However, as shown in 

112449_14046_ch01_4P.indd   34112449_14046_ch01_4P.indd   34 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


Distant Viewing: Method 35

___-1

___0

___+1

chapter 1, knowledge produced from computer vision algorithms works dif-

ferently. Creating a structured representation of messages encoded in digital 

images involves a process in which information is lost as well as created.

Informed by the theory of distant viewing, our method adds a new step 

into the standard data science pipeline, which is incorporated into the sec-

ond row of figure 2.1. This new first step encapsulates the process in which 

digital images are annotated with computer vision techniques. The con-

struction of these annotations is inserted prior to the step of data organiza-

tion.4 Rather than assuming the data exists or is given, we make explicit 

through this addition the creation of annotations from the visual data.5 

Unlike the process of organizing structured data, annotations are not just a 

reorganization of the original inputs. Rather, annotations capture elements 

of the images using algorithms that only view in a certain way. Determining 

Figure 2.1
Two pipelines for working with data. The top represents a typical workflow when work-

ing with structured data, adapted from work by Hadley Wickham and Garrett Grole-

mund. The pipeline on the bottom illustrates a modification built using distant viewing, 

which first annotates the visual materials. (Steps such as developing a research question 

or hypothesis and collecting the data are integral parts of data science but omitted 

here for clarity.) The arrows show the conceptual flow of information; analyses of data 

almost always require an iterative approach moving back and forth between each part.

112449_14046_ch01_4P.indd   35112449_14046_ch01_4P.indd   35 14/07/23   7:07 PM14/07/23   7:07 PM

Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024


36 Chapter 2

-1___

0___

+1___

which annotations to create and how to create them, therefore, becomes an 

important step in the analysis of digital images.

Following the annotation step, distant viewing engages with the process 

of organizing the data. This involves aggregating the annotation from the 

computer vision algorithms and combining with other structured informa-

tion, such as a description of the digitization formats used or information 

about each digital image’s creator. The tasks and goals of the exploration 

and communication tasks mirror those of a standard data science pipeline 

but require some unique considerat