Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 ___-1 ___0 ___+1 Distant Viewing 112449_14046_ch01_4P.indd 1112449_14046_ch01_4P.indd 1 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 -1___ 0___ +1___ 112449_14046_ch01_4P.indd 2112449_14046_ch01_4P.indd 2 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 ___-1 ___0 ___+1 The MIT Press Cambridge, Massachusetts London, England Taylor Arnold and Lauren Tilton Distant Viewing Computational Exploration of Digital Images 112449_14046_ch01_4P.indd 3112449_14046_ch01_4P.indd 3 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 -1___ 0___ +1___ © 2023 Massachusetts Institute of Technology This work is subject to a Creative Commons CC- BY- NC- ND license. Subject to such license, all rights are reserved. The open access edition of this book was made possible by generous funding from the National Endowment for the Humanities and the University of Richmond. The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers. This book was set in Stone Serif and Stone Sans by Westchester Publishing Services. Library of Congress Cataloging- in- Publication Data Names: Arnold, Taylor, author. | Tilton, Lauren, author. Title: Distant viewing : computational exploration of digital images / Taylor Arnold and Lauren Tilton. Description: Cambridge, Massachusetts : The MIT Press, [2023] | Includes bibliographical references and index. Identifiers: LCCN 2022052202 (print) | LCCN 2022052203 (ebook) | ISBN 9780262546133 (paperback) | ISBN 9780262375177 (epub) | ISBN 9780262375160 (pdf) Subjects: LCSH: Computer vision. | Image data mining. | Image processing—Digital techniques. | Visual sociology— Technique. Classification: LCC TA1634 .A76 2023 (print) | LCC TA1634 (ebook) | DDC 006.4/2— dc23/eng/20230111 LC record available at https:// lccn . loc . gov / 2022052202 LC ebook record available at https:// lccn . loc . gov / 2022052203 112449_14046_ch01_4P.indd 4112449_14046_ch01_4P.indd 4 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 https://lccn.loc.gov/2022052202 https://lccn.loc.gov/2022052203 ___-1 ___0 ___+1 Acknowledgments vii Introduction 1 1 Distant Viewing: Theory 9 2 Distant Viewing: Method 33 3 Advertising in Color: Movie Posters and Genre 57 4 Seeing Is Believing: Themes in Great Depression and World War II Photography 97 5 Feast for the Eyes: Locating Visual Style in Network- Era American Situation Comedies 145 6 Opening the Archive: Visual Search and Discovery of the Met’s Open Access Program 177 Conclusion 221 Glossary 225 Notes 227 Bibliography 249 Datasets 265 Index 269 Contents 112449_14046_ch01_4P.indd 5112449_14046_ch01_4P.indd 5 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 -1___ 0___ +1___ 112449_14046_ch01_4P.indd 6112449_14046_ch01_4P.indd 6 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 ___-1 ___0 ___+1 Projects take time to percolate and refine. This one is no exception, and was easily over a decade in the making. For all our appreciation of the speed of algorithms, the time to think slowly and carefully were key. The final let- ters on the page are a fraction of the words typed, code processed, and ideas explored. And most importantly, this book was only possible because of the support, generosity, and patience of colleagues, friends, and family. A significant amount of this book was written amid a global pandemic. Zoom calls, texts, and emails with Claudia Calhoun, Jordana Cox, Molly Fair, Joshua Glick, Eva Hageman, Devin McGeehen- Muchmore, Kristine Nolin, Jeri Wieringa, and Caroline Weist kept us grounded. Miriam Posner, Lauren Klein, and Jessica Marie Johnson have been constant guides, pio- neering the intersection of digital humanities and data science. Teaching and research are intimately connected. Working to understand and then explain to someone else one’s theories and methods, and then the grappling with the questions that follow, is a humbling and rewarding pro- cess. Thank you, Jennifer Guiliano and David Wrisley, for the opportunities to teach the concepts that became foundational to this book as well as to learn from the HILT (Humanities Intensive Learning and Teaching) and the NYU Abu Dhabi Winter Institute in Digital Humanities communities. Thank you to the students at the University of Richmond who shared their excitement to try out new methods and conduct research. Salar Ather and Aalok Sathe’s work on sitcom laugh tracks as independent studies still has us giggling at some of the results. Workshopping and presenting this work at various stages offered influ- ential opportunities to develop this project. Along with the great feedback at conferences, we are grateful to the colleagues who invited us to share our Acknowledgments 112449_14046_ch01_4P.indd 7112449_14046_ch01_4P.indd 7 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 viii Acknowledgments -1___ 0___ +1___ work, including Susan Aasman, Saskia Asser, Nicolas Ballier, Elisabeth Burr, James Connolly, Alexander Dunst, Jasmine van Gorp, Ann Hanlon, Tim van der Heijden, Vilja Hulden, Mike Kane, Ulf Otto, Nora Probst, Vincent Renner, Douglas Seefeldt, Thomas Smits, Stewart Varner, and Jun Yan. Thank you, and thanks to your colleagues, for thinking with us. A special thank you to our co- conspirator Mark Williams for his relentless efforts to amplify this project at every stage. Gratitude as well to the reviewers and editors of our first article on distant viewing in Digital Scholarship in the Humanities and our next article on sitcoms with co- author Annie Berke in the Journal of Cultural Analytics, which would become the basis for chapter 5. We appreciate Annie’s support for our adapting this work for the book and her brilliant insights into post– World War II TV, which she expands on in her new book, Their Own Best Creations: Women Writers in Postwar Television. Finally, thank you to the peer reviewers including Lev Manovich for your insightful feedback that helped us clarify key ideas while helping us reach an interdisciplinary audience. Research support made this project possible. The University of Rich- mond’s institutional commitment to digital humanities and data science has been key. Deans Kathleen Skerrett and Patrice Rankine understood that hir- ing us together was critical to the success of our individual research, and that we would do even more together. Our departments helped us navigate being new professors, making sure we had time to pursue research and to integrate our scholarship into teaching. The Department of Rhetoric and Communica- tion’s embrace of digital humanities under the leadership of Chairs Nicole Maurantonio and Tim Barney, with the enthusiastic support of Mari Lee Mif- sud and Paul Achter, has made for an environment that one could only dream of. With enthusiasm and patience, Brenda Thomas in the Foundation, Cor- porate, and Government Relations office navigated us through grant appli- cations and management. Associate Provost Carol Parish’s efforts to build an institution that supports interdisciplinary computational research steeped in the liberal arts has allowed us to scale up our research to the next level. Grant support from the Mellon Foundation funded the Collections as Data Initiative, which provided an invaluable opportunity to think and imagine with Carol Chiodo about how distant viewing could support museums and libraries. We are also appreciative of the visiting positions at the Université Paris- Diderot and the Collegium de Lyon made possible through Nicolas Bal- lier and Vincent Renner. We are grateful to Patricia Hswe and the Mellon Foundation for the opportunity to develop software to make distant viewing 112449_14046_ch01_4P.indd 8112449_14046_ch01_4P.indd 8 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Acknowledgments ix ___-1 ___0 ___+1 more accessible in the years to come. Finally, the National Endowment for the Humanities’ Office of Digital Humanities has been with us at each step over the past decade. We are incredibly grateful to Brett Bobley, Sheila Bren- nan, Perry Collins, and Jennifer Serventi for their support of Photogrammar (HAA- 284853- 22 and HD- 51421- 11) and the Distant Viewing Toolkit (HAA- 261239– 18). Along with sharing your excitement about innovation and a culture of openness, you provided us with the time, space, and resources, and, perhaps most importantly, built the interdisciplinary digital humanities community that made this open- access book possible. As with so many exciting opportunities in our careers, Laura Wexler has been central. In addition to sharing her passion for the power of visual culture to explain our world, she connected us with a remarkable network of intersectional feminist media scholars including Elizabeth Losh. Our deepest gratitude goes to Liz, who took significant time to provide forma- tive feedback on the first few chapters of this book before anyone else had seen a page. The chance to work with colleagues who take the time to help you make a work the best version of itself is one more reason we are also indebted to both of these pioneers of the digital humanities for connecting us with the MIT Press and Noah Springer, who has been a tireless advocate of this project. Thank you as well to Kathleen A. Caruso and Paula Woolley for their careful reading of this book. We are lucky to work with such a sup- portive and ambitious team. Looking nationally and internationally, we are indebted to the individu- als and institutions across the world that have advocated for open access, open data, and open source. We are grateful for the United States Library of Congress, particularly the Prints and Photographs Division and LC Labs. Colleagues such as Beverly Brannan and Meghan Ferriter have worked in the background to make our kind of research possible. Our scholarship would not have been possible without the continued work and support of the open- source software communities within data science and computer vision. Further, a future of computer vision that is explainable and commit- ted to intersectional feminism and anti- racism continues to require relent- less advocacy. Thank you to groups such as Data & Society, the AI Now Institute, and the Algorithmic Justice League for your work to hold all of us accountable while building a more just and equitable future. Finally, we want to acknowledge our families. Along with their uncondi- tional support and love, they kindly listened to us tease out the core ideas 112449_14046_ch01_4P.indd 9112449_14046_ch01_4P.indd 9 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 x Acknowledgments -1___ 0___ +1___ of this book over lobster rolls in Maine or red beans and old- fashioneds in New Orleans, for years. While the actual book project started shortly after we became an official family, the process leading to the first typed pages taught us that forging a collaboration would take a loving amount of work, kindness, and vulnerability. Only our dear dog Sargent, who passed away as this book came to completion, knows all the coffee, walks, and dinner debates that this has taken; Roux has big shoes to fill. 112449_14046_ch01_4P.indd 10112449_14046_ch01_4P.indd 10 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 ___-1 ___0 ___+1 In the fall of 2010, we began working on our first collaboration. In what would eventually become the public digital project Photogrammar, we set out to build an interactive space that allowed visitors to visualize the more than 170,000 digitized Farm Security Administration / Office of War Infor- mation (FSA- OWI) photographs produced by the US government between 1935 and 1944. Our collaboration served as an ideal mixture of our inter- ests. Taylor was a graduate student in a statistics department with a research focus on exploring and visualizing complex datasets. Lauren was a graduate student in American Studies with a concentration in public humanities with a focus on twentieth- century American film and photography. Combining the FSA- OWI collection’s rich metadata and meticulously digitized public- domain images to create a publicly accessible interface overlaid perfectly with both of our areas of research. A proof of concept developed in Laura Wexler’s public humanities graduate seminar turned into a full- fledged digi- tal humanities public project thanks to the National Endowment for the Humanities Office of Digital Humanities. Our work became a popular public project that would welcome millions of visitors and encourage numerous extensions and revisions. We created interactive visualizations that allowed exploration of almost all the col- lection’s available metadata. Visitors could view one of several interactive maps, follow the journeys of individual photographers, search by themes, and explore the photographic captions. The critical context and the contribu- tion of these elements to visitors’ understanding of the FSA- OWI collection should not be understated. However, something seemed missing. We were working with an extensive collection of documentary photography, and it was ultimately the photographs that drew us and others to this collection. Introduction 112449_14046_ch01_4P.indd 1112449_14046_ch01_4P.indd 1 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 2 Introduction -1___ 0___ +1___ Our work, however, facilitated an aggregative analysis of every element but the photographs. The images were only accessible by looking at them indi- vidually, with no way to search by visual themes or identify objects and people present within the frame. There was a disconnect between the main objects of interest and the affordances provided by our work. The absence of image- based methods in our initial iterations of Photo- grammar was driven by a scarcity of readily available tools, not a lack of interest. Digital images are challenging to work with computationally, for reasons that we interrogate in the following chapters. The best available methods we could find performed poorly on historic black- and- white pho- tographs. Face detection methods missed more faces than they found, fail- ing to find faces that were not at a particular angle and unable to detect anyone wearing a hat. Algorithms for detecting objects in images were more likely to produce comically bizarre predictions than usable informa- tion. Methods for aggregating based on dominant colors fared better but were not well suited to our collection of predominantly black- and- white photographs. Illustrations of these predictions were a consistent element of the earliest talks we gave on the work. Our favorite example came from the photograph featured in chapters 1 and 2 of a shepherd riding a horse in a field next to his sheepdog. Even though it was in vivid color and con- tained three distinct objects within the lexicon of popular computer vision algorithms, most failed to identify any element of the image correctly. The experience led us to ask more questions about exactly how these algorithms were built and for whom. We kept asking ourselves: what ways of seeing are these built to view and what if we thought differently about why we should use them. By 2016 the landscape of available tools for working computationally with images had undergone a dramatic expansion. Software libraries such as darknet (2016), TensorFlow (2015), keras (2015), PyTorch (2016), and Detectron (2017) suddenly provided out- of- the- box access to increasingly accurate and powerful computer vision algorithms.1 Scholars working with digital collections of images began to use this new set of approaches. Applications appeared in venues such as the Culture Analytics program at UCLA’s Institute for Pure and Applied Mathematics, workshops held by the special interest group for audiovisual material within the Association of Digital Humanities Organizations (ADHO), and articles in the newly cre- ated International Journal for Digital Art History. Our work shifted as well. 112449_14046_ch01_4P.indd 2112449_14046_ch01_4P.indd 2 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Introduction 3 ___-1 ___0 ___+1 Presentations that previously ended by critiquing algorithmic results were replaced with forward- looking examples of how computer vision was help- ing us reimagine the FSA- OWI collection by providing approaches for a visual search and discovery interface. Rather than relying solely on existing computer vision algorithms, we began to customize and build algorithms that viewed in the ways that furthered our areas of interest. Our excitement about the improvements in computer vision algorithms was tempered by our prior experiences that had highlighted the compara- tive difficulty of training computers to understand digital images. The tools seemed to be producing helpful information, but what features of the images continued to be lost through their algorithmic transformation? Many of these new tools were created or sponsored by large corporations and gov- ernment entities. What are the implications of aligning our analyses with the interests of these organizations? Software for data exploration and visu- alization was not built around the study of digital images. How can our exploratory methods catch up with the new methods in computer vision? Numerous scholars in media and visual culture studies— such as John Berger, David Bordwell, Lisa Cartwright, Stuart Hall, Lev Manovich, Lisa Nakamura, Leigh Raiford, Marita Sturken, and Laura Wexler2— have stressed the impor- tance of thinking carefully about how images are created, circulated, and interpreted. When applying complex computational approaches to the study of digital images, it is as vital as ever to consider these questions. To enable the careful and critical computational exploration of digitized visual collec- tions, we need a cohesive theory for how computer vision creates meaning and a methodological specificity that takes into account the intricacies of digital images as a form and format. In this text, we present a theory and methodological description of what we refer to as distant viewing, the application of computer vision methods to the computational analysis of digital images. Our goal is to offer a construc- tive and generative critique of computer vision that focuses on enabling fruitful applications. To the best of our knowledge, this text is the first book- length treatment that approaches the application of computer vision to the study of visual messages as its own object of study. The distinction here is important because our approach allows for a critical understanding of the possibilities and limitations of existing computer vision techniques. It also provides a framework for a reflexive understanding of computer vision as a way of circulating and producing knowledge. 112449_14046_ch01_4P.indd 3112449_14046_ch01_4P.indd 3 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 4 Introduction -1___ 0___ +1___ The focus of distant viewing on digital images is a pragmatic one, result- ing from the fact that the application of computer vision requires machine- readable inputs. However, this does not limit our objects of study to born- digital materials. Distant viewing can be applied to digitized collections originally produced in almost any medium. For example, we can apply our approach to digitized collections of photographs, photographic negatives, newspapers, comics, and posters. We can also work with digital images of material culture, something we return to in chapter 6. Distant viewing is also not limited to still images; it can be used to study collections of objects from media such as television, film, and video games. An example of distant view- ing applied to a pair of television series is illustrated, for example, in chapter 5. In most cases, when one is applying computer vision to a digital image, we argue that this is distant viewing. Our terminology is motivated by the concept of distant reading from the field of computational literary studies. The specific meaning and impor- tance of the term distant reading has been extensively discussed; it is not our goal to make specific connections or proclamation within these debates.3 Rather, our terminology signals a general interest in adapting the compu- tational literary studies approach of applying computational and statistical techniques to large corpora in the service of humanistic research questions.4 While certainly not without their critics, these approaches have opened exciting new lines of scholarship.5 Our terminology also signals a departure from the textual focus of literary studies. The process of interpreting a visual message is semiologically and phenomenologically different from the act of reading a text, which we theorize in chapter 1. As we will explore in the following chapters, these differences lead to important changes in the way that we can apply and interpret the results of computational analyses. In the tradition of visual culture studies and computer vision as well as our history of collaboration, we take a transdisciplinary perspective to our work. Both of us were trained in interdisciplinary fields that taught us the power of thinking across boundaries of disciplines and fields. We primar- ily draw from and engage with scholarship from film and media studies, visual semiotics, digital humanities, information science, computer sci- ence, and data science. The text’s structure and focus are designed to be leg- ible and useful to audiences coming from any of these varied perspectives. The first two chapters establish our main theoretical and methodologi- cal claims about distant viewing. Chapter 1 begins by investigating what it 112449_14046_ch01_4P.indd 4112449_14046_ch01_4P.indd 4 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Introduction 5 ___-1 ___0 ___+1 means to say that computer vision “understands” visual inputs. We draw from information science and semiotics to illustrate why the way that digi- tal images convey information necessitates a different approach. Specifi- cally, we see that this process involves creating annotations that capture some, though never all nor ever perfectly, of the information present in the images. We conclude the chapter by showing how the process of annota- tion can be seen as a machine- mediated way of viewing images; this leads us to understand how existing scholarship in media studies shapes the application of computer vision. In chapter 2, we engage with the method- ological aspects of working with computer vision annotations. We inves- tigate how standard approaches used in data science to explore data must be adjusted when working with computer vision, resulting in four phases of analysis. Namely, we must annotate our collection using computer vision algorithms, then organize the annotations and metadata, explore the data and our research questions, and finally communicate the results. The first two chapters engage in a close analysis of a single image, the FSA- OWI photograph of a shepherd mentioned above. We hope to model how com- putational analyses should also help highlight, rather than supplant, the close reading of individual images. Chapters 3 through 6 present the use of distant viewing within four dif- ferent application domains. As readers move from chapter to chapter, the complexity of the computer vision models build. Each chapter is structured around the first three phases of the distant viewing method described in chapter 2: annotate, organize, and explore; the fourth phase, communi- cation, is this book. After establishing a research question, we start by understanding one or more annotations provided by computer vision algo- rithms, organize other metadata attached to the collection, and finish by conducting an exploration of the organized data. Along the way, we discuss the limitations of these algorithms as we think carefully about exactly what these computer vision algorithms view, and do not view. Chapter 3 inves- tigates the use of color in movie posters and its relationship to genre. We see how distant viewing can address complex research questions even when using relatively low- level annotations. In chapter 4, we apply a region seg- mentation algorithm to the photographs from the FSA- OWI archive. This chapter shows how computer vision annotations can both support and sup- plant the organizational logic of the archive. We illustrate in chapter 5 how distant viewing can also be used with moving images. We see how formal 112449_14046_ch01_4P.indd 5112449_14046_ch01_4P.indd 5 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 6 Introduction -1___ 0___ +1___ film elements can be applied to study issues of gender and power within a pair of network- era sitcoms. Finally, in chapter 6, we apply distant viewing to a collection of images from a large encyclopedic museum to see how computer vision can open digital collections through public interfaces. Our excitement about the possibilities for the computational analysis of collections of digital images has been shared by many other research groups. Some of the earliest examples come from the manual annotation of film and television metrics by Barry Salt, Gunars Civjans, Yuri Tsivian, and Jeremy Butler.6 Recently, the journal Digital Humanities Quarterly (DHQ) sponsored a special issue focused on film and video analysis in 2020, with articles describing projects such as Barbara Flueckiger’s FilmColors project and Masson et al.’s Sensory Moving Image Archive project (SEMIA).7 Along with Stefania Scagliola and Jasmijn Van Gorp, we edited another DHQ special issue titled “AudioVisual Data in DH” with over twenty research articles from a wide variety of disciplines and nation contexts.8 Interest has also expanded from the growing field of digital art history, which has had several special issues, conferences, and a new journal that have included a significant amount of computational work. Notably, in the first issue of the International Journal for Digital Art History, K. Bender made the first use of the term “distant viewing” within his study of the iconography of Aphro- dite/Venus.9 Numerous other exciting research papers have been published in other journals, such as Nanne Van Noord, Ella Hendriks, and Eric Post- ma’s study of artistic style and Laure Thompson and David Mimno’s analy- sis of the study of Dadaism.10 We hope that our work in this book further enables and encourages more developments in these and other areas. The book is designed to be read and used. Along with being open access, the text is organized such that the chapters should be readable in any order. One reader might be interested in the theory and then an application. Another reader might interested in a particular application and therefore wish to start with one of the applications before engaging with the more theoretical opening chapters. Many of the results in the following chapters are presented as tables. We chose to communicate results using numeric tables because of the limitations of other visualization types within the existing print form, such as the lack of interactivity and color. Other ways of visualizing these results are also given in the supplementary materials. While making novel contributions to the fields of data science and digi- tal humanities, we have avoided superfluous technical jargon.11 There is 112449_14046_ch01_4P.indd 6112449_14046_ch01_4P.indd 6 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Introduction 7 ___-1 ___0 ___+1 significant translation work to do when talking across fields and forging transdisciplinary scholarship. We aimed for a writing style that is inclusive yet precise, while the footnotes provide more technical descriptions. A glos- sary of common terms, particularly for terms that may be used differently in different communities, is included at the end of the text to aid in this process. In addition, we have published datasets, code, and many addi- tional visualizations under an open- source license that replicate and further explore the applications described in the text. All of these can be viewed and downloaded on the book’s accompanying website, found here: https:// distantviewing . org / book Finally, we have developed the Distant Viewing Toolkit, open- source soft- ware made possible through generous funding from the National Endow- ment for the Humanities and the Mellon Foundation, that puts theory and method into practice. Information on how to install a current version of the Distant Viewing Toolkit can also be found at the link above. By theorizing and offering a method, our approach of distant viewing participates in the call for a more careful use of algorithms in our soci- ety. When we understand computer vision as a way of seeing, we are then accountable to the histories of vision and the ways we train algorithms to see, look, and view. We are also accountable for what they do not see, look, and view. We have had a plethora of conversations with colleagues who attest to the neutrality of algorithms and resist ideas of algorithms as a technology of vision and mode of communication inculcated in social and cultural pasts, presents, and futures. Distant viewing challenges such claims and calls on us to ask each time a computer vision algorithm looks at an image what is this algorithm viewing, mislabeling, and missing as well as why did we design this algorithm to view in this way. By doing so, we can more carefully engage with the computational analysis of digital images. Now, let’s go distant viewing. 112449_14046_ch01_4P.indd 7112449_14046_ch01_4P.indd 7 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 https://distantviewing.org/book -1___ 0___ +1___ 112449_14046_ch01_4P.indd 8112449_14046_ch01_4P.indd 8 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 ___-1 ___0 ___+1 Whether gazing at a piece of art through the lens of an iPhone in the Lou- vre or tapping away at Instagram Stories among the flicker of the screens in Times Square, we are surrounded by images. Visual messages impact our daily lives and have done so for centuries. More recently, technologies that have enabled born- digital images are also facilitating mass digitization as institutions such as libraries, archives, and museums produce digital images stored on servers across the world. How does one go about analyzing the messages carried by these visual forms? There is a plethora of different approaches for studying the messages conveyed by visual media. Often these consist of applying theories from fields such as art history, film studies, and media studies to a focused set of images. These approaches can inform insights such as the topics depicted at a certain moment in time or contribute to an analysis of how formal elements lend the medium its claims to truth.1 Examples of powerful close analyses of visual messages include Elizabeth Abel’s study of Jim Crow politics in the American South, John Berger’s study of gender and western art, Herman Gray’s study of race in 1980s and 1990 TV, and Laura Wex- ler’s study of Alice Austen’s photographs from Ellis Island.2 Many ques- tions require studying a small set of images in relation to a larger whole, which can be accomplished by meticulously combing through and viewing images stored in a physical or digital archive.3 Combined with informa- tion from other archival sources, this approach captures a large portion of research methods in the study of visual messages and has frequently formed an important aspect of our own work. Some questions regarding visual messages require identifying subtle pat- terns across a large collection of images. For example, we might want to understand how improving television quality during the twentieth century 1 Distant Viewing: Theory 112449_14046_ch01_4P.indd 9112449_14046_ch01_4P.indd 9 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 10 Chapter 1 -1___ 0___ +1___ changed the way shot angles were used to tell a story. Or, how lighting deci- sions in Hollywood films from the 1970s were used to challenge or establish gendered or racial stereotypes. Similarly, we might be interested in under- standing the themes in a photographic archive with hundreds of thousands of images or visualizing different themes in paintings held by a large ency- clopedic art museum. Addressing these questions begins to exceed our abil- ity to remember and to view all of the relevant images. One approach to working with these collections is to use a quantitative social science meth- odology, such as content analysis, in which a random subset of images is manually labeled according to the information being studied.4 However, this approach has the downside of viewing only part of a collection and, because of the labor involved in creating the labels, is only able to address a small set of predefined research questions. What we want, rather, is to view, explore, and interpret visual messages across a collection of digital images. This should be an iterative, exploratory process that mirrors the approaches that we turn to when working with a smaller collection of images. Such a process, it would seem, requires a different methodology. Excitement has recently increased about the use of computer vision algorithms— computational methods that try to replicate elements of the human visual system— to assist in the study of visual collections. Algorithms applied to digital images through software can process large amounts of data in significantly less time than would be required to perform a simi- lar task manually. Computers can aggregate patterns and surface connec- tions that may be difficult to detect otherwise. These insights can lead to new ways of seeing and exploring visual data. Melvin Wevers and Thomas Smits, for example, have called for a “turn toward the visual” within the digital humanities through the application of computer vision techniques that “open up a new world of intuitive and serendipitous exploration of the visual side of digital archives.”5 Similarly, in his recent book Cultural Analytics, Lev Manovich argues for and demonstrates the use of computer vision and other computational approaches for the “exploration and analy- sis of contemporary culture at scale.”6 As highlighted in the introduction, we have also been excited by the possibilities of using computer vision algorithms in our work on twentieth- century US documentary photogra- phy. A cornerstone of visual culture studies is that images make meaning differently than other forms of expression do; we must account for this fact when applying computational techniques to large collections of visual materials. By combining the growing body of applications of computer 112449_14046_ch01_4P.indd 10112449_14046_ch01_4P.indd 10 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 11 ___-1 ___0 ___+1 vision to the study of digital images with the work of visual culture studies, we will offer a cohesive theory that explores the methodological and epis- temological implications of using computer vision as a tool for the study of visual messages. In this chapter we present the theory of distant viewing as a way of understanding how computer vision works and enables the exploration of collections of digital images. Specifically, we will develop a theoretical understanding of how distant viewing— the application of computer vision methods to the computational analysis of digital images— works, and why it is needed. This work emerges from the intersection of theories from sev- eral fields, including visual semiotics, media studies, communication stud- ies, information science, and data science. By drawing on work across the humanities, social sciences, and sciences, we offer an interdisciplinary the- ory that interweaves the ways of knowing and understanding that animate a range of fields to understand the relationship between computer vision and digital images. Attention to features of image analysis becomes critical when using com- putational methods. Visual culture studies and visual semiotics have estab- lished that visual materials make and transmit meaning differently than other forms of communication.7 It is necessary, then, to consider how these differences affect the modes of research afforded by computational tech- niques and how the unique ways that images make meaning are accounted for within specific computational methods such as computer vision. At the same time, using algorithms to assist in the processing of visual materi- als mediates the task of understanding through a digital, computational process. The technologies of computer vision are being trusted to “view” collections of still and moving images. This raises several pressing questions about the ways of seeing that specific algorithms engage in and why. For example, applications to support the military and surveillance state have motived many of the most studied and accessible computer vision meth- ods, such as face detection and object- tracking. Applying these methods is never culturally neutral. Distant viewing responds to the need for a theory of how computer vision algorithms serve as a means of analysis to study visual materials, and can account for the ways that these algorithms medi- ate the interpretation of digital images. Our use of the terms distant and viewing each signals a component of generating annotations from images. The first is the distance from the eye, for computational methods “see” by calculating images as numbers. The 112449_14046_ch01_4P.indd 11112449_14046_ch01_4P.indd 11 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 12 Chapter 1 -1___ 0___ +1___ second is distance as scale. Through computer vision, interpretation of images can exceed a person’s physical ability to view and remember. At the same time, distance is not objective. Viewing makes explicit that the com- putational processing of images is shaped by ways of seeing and practices of looking and, therefore, by a set of social and cultural decisions. Using a term explicitly linked to visuality also signals how images convey messages differently than forms such as text. By theorizing computer vision as a com- putational mode of communication for decoding messages, claims to objec- tivity through computational interpretation no longer hold. Instead, the terms together, distant and viewing, make explicit that computer vision is a technology of communication shaped by people and imbued with cultural and social values. Chapter 2, where we turn to a methodological under- standing of distant viewing, provides a detailed analysis of how the pro- cess of annotation integrates with existing data analysis pipelines. Before addressing the practices of distant viewing, we consider the epistemologi- cal and semiological implications that come from exploring digital images computationally using algorithmically generated information. In the following sections, we start by drawing on concepts from visual semiotics and information science to establish that the use of computer vision to create structured annotations is necessary because of the way digital images are stored and the way images convey meaning. We then establish a framework for the specific ways that distant viewing uses computer vision. We show how images are converted into structured annotations that serve as mediators throughout the process of computation. Then, we illustrate how the application of computer vision can be understood as machine- based ways of seeing. The process of seeing through a computer is subject to culturally influenced factors, often mirroring human- based ways of seeing. Recognizing how these influences affect what information is privileged and hidden by modern computer vision algorithms allows us to understand the possibilities and limitations of distant viewing. Meaning Making through Images The process of interpreting meanings encoded in an image is a part of our daily lives, often implicit, and occurring incredibly quickly. For example, what do we interpret when we view a photograph in an online newspaper of a sandy beach with palm trees bent over precariously in the same direction, 112449_14046_ch01_4P.indd 12112449_14046_ch01_4P.indd 12 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 13 ___-1 ___0 ___+1 large waves crashing onto the shore, and dark clouds on the horizon? Many people, even before reading an accompanying caption or headline, will have a near- instantaneous realization that the news is covering the existence of a severe weather event. How is this kind of information transferred through an image, such as the watercolor or photograph in figure 1.1? Meaning is primarily interpreted based on elements that resemble a storm: the effects of dark skies and oncoming heavy winds. The same image could be meaning- fully printed in newspapers around the world, regardless of the language of the newspaper. Attention to how images make meaning becomes a neces- sity to understand how to apply computer vision and interpret the results. Comparison with text demonstrates the challenges. Textual data is described by characters, words, and syntax. Read within a particular cultural setting, these elements are interpreted as having mean- ing. Linguistic elements, such as words, serve as explicit signs that corre- spond to objects primarily by convention.8 The word pencil in English is typically used to refer to a long cylindrical object that contains inner solid marking material (such as graphite or charcoal) surrounded by an outer material (such as wood) for writing or drawing. Millions of English speakers Figure 1.1 On the left, a Winslow Homer watercolor depicting a scene from Nassau, Bahamas, in 1898 (Metropolitan Museum of Art, 06.1234; Palm Tree, Nassau (1898), https:// www . metmuseum . org / art / collection / search / 11131). On the right, a photograph by Brigitte Werner taken at Hayman Island, Australia, in 2019 (https:// pixabay . com / photos / hayman - island - australia - travel - 745789 / ). 112449_14046_ch01_4P.indd 13112449_14046_ch01_4P.indd 13 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 https://www.metmuseum.org/art/collection/search/11131 https://www.metmuseum.org/art/collection/search/11131 https://pixabay.com/photos/hayman-island-australia-travel-745789/ https://pixabay.com/photos/hayman-island-australia-travel-745789/ 14 Chapter 1 -1___ 0___ +1___ induce the link between the six- letter word and the definition by their shared usage. That is, most words function as symbols, a socially agreed- upon connection between the word and the concept represented by the word.9 Grammatical constructs such as verb conjugation, plurality, and object- verb relationships operate similarly within a particular language to produce higher- level meanings between individual words. Visual forms such as film and photographs convey meaning in a different way than text.10 They do not convey information primarily through agreed- upon relationships.11 A photograph, for example, in its most basic form is a measurement of light through the use of either chemically sensitive materials or a digital sensor.12 The objects represented within a photograph can typically be identified by people who speak different languages.13 As Roland Barthes argues, there is a “special status of the photographic image: it is a message without a code.”14 In photography, it is not necessary to con- struct an explicit mapping between the visual representation and what is being represented. The relationship between the photograph and the object represented by the photograph is signified by shared features between the object and its photo.15 In other words, meaning is conveyed through the photograph’s mimetic qualities. A similar relationship holds for other visual forms. Both paintings and photographs illustrate and circulate concepts through characteristics such as lines, color, shape, and size.16 Images serve as a link to the object being represented by sharing similar qualities. It is possible to recognize a paint- ing of a particular person by noticing that the painted object and person in question share properties such as hairstyle, eye color, nose shape, and clothing. The representational strategies of images, therefore, differ from those of language. While often rendered meaningful in different ways through language, visual material is pre- linguistic.17 The French poet Paul Valéry eloquently described this phenomenon as “looking, in other words forgetting the name of the things that one sees.”18 Interpreting images is further complicated by the amount of variation in images intended to convey the same or similar concepts. Photography offers an example again. The culturally coded elements of photographic images coexist with the raw measurements of light. The cultural elements are exposed through the productive act of photography— what Barthes refers to as the image’s “connotation.”19 Consider the images in figure 1.1. They were created over a century apart on nearly opposite sides of the world. 112449_14046_ch01_4P.indd 14112449_14046_ch01_4P.indd 14 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 15 ___-1 ___0 ___+1 On the left, the painting was created by pushing pigments suspended in water across a piece of paper. On the right, we have a digital image cre- ated by capturing the components of light observed over a small fraction of a second by an array of digital camera sensors. Yet, despite these differ- ences, viewing each image conveys a similar scene of palm trees blowing in the wind in front of an ocean bay. Further connotations of the images can be built up from these elements. We may view either of these images as connoting scenes of idyllic luxury and relaxation, like the scenes one might post on social media from a vacation at a tropical beach. Or, possibly with some additional context, we might understand the movement of the palm trees as representing a destructive and dangerous oncoming storm. Interpreting the messages of a single image requires decoding its individual elements, an act that becomes even more important when working with a large collection. Interpreting the connotation of visual messages is further shaped by people’s beliefs and values.20 As media theorist Stuart Hall argues, the mes- sages encoded and decoded from images are not objective but shaped by the cultural, social, and political meanings that people want to convey and are positioned to interpret. For example, the photographer may have taken the photograph shown in figure 1.2 to convey loyalty. Whether the viewer decodes that message is not necessarily a given, depending on their background. A viewer from a different position may interpret the image as about the man’s dominance of the landscape and, therefore, ideas about masculinity. How one interprets an image is shaped by the larger cultural and social ideologies that inform how one interprets the world.21 The mes- sages that are encoded may not be decoded, and messages that were not intentionally encoded may be decoded. How information is encoded and decoded is shaped by cultural ideologies, embodied ways of knowing, social scripts, and grammars of everyday life from which we learn and which we rely on to interpret the world.22 How one views an image through compu- tational processes is a part of the same process. We can extend these considerations to the computational analysis of digi- tal images. While a person looking at an image can decode the objects and meaning of the visual messages, the process of making these explicit decisions is what makes images so challenging to study computationally. We rely on learned semiotic and ideological systems to interpret the meaning of visual material. This interpretive process must be made explicit in computational 112449_14046_ch01_4P.indd 15112449_14046_ch01_4P.indd 15 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 16 Chapter 1 -1___ 0___ +1___ processes. Therefore, the computational analysis of images requires an algo- rithmic interpretation of the meaning of digital images. Distant viewing asks that we acknowledge and take into account the interpretative, ideological work of algorithmically analyzing digital images. To further deduce how the process of computationally decoding the messages in an image works, we must understand how images are stored and analyzed by a computer. Working with Digital Images The unique features that humanistic inquiry have identified about work- ing with visual materials apply to analyzing digital images. The way images Figure 1.2 A color photograph digitized as part of the Farm Security Administration / Office of War Information (FSA- OWI) archive, held by the US Library of Congress. The pho- tograph is credited to staff photographer Russell Lee and cataloged as being taken in August 1942. The item’s caption reads: “Shepherd with his horse and dog on Grav- elly Range, Madison County, Montana” (Library of Congress, https:// www . loc . gov / pictures / item / 2017878800 / ). 112449_14046_ch01_4P.indd 16112449_14046_ch01_4P.indd 16 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 https://www.loc.gov/pictures/item/2017878800/ https://www.loc.gov/pictures/item/2017878800/ Distant Viewing: Theory 17 ___-1 ___0 ___+1 are stored as pixels mirrors the semiotic differences between forms such as text and image. In lieu of the human eye, computational interpretation through computer vision is used to decode the meaning of images. See- ing through computer vision converts the act of viewing into a computa- tional process. The speed of computational processes thus enables analysis at scale and the decoding of millions of images in a short period of time. Whether applied to a single image or at scale, the information captured by computer vision becomes a mode of communication for interpreting mes- sages in digital images. Digital images are stored in formats that make it possible to see images on a digital screen. A computer displays images as pixels, the “minute indi- vidual elements in a digital image.”23 The word “pixel” itself reflects this rela- tionship. It is a combination of pix, the plural form of pic (which is short for picture), and el, which is an abbreviation of element.24 The term emerged alongside the terms pix and pel in the 1960s among researchers working on image processing who were trying to find ways to describe the basic elements of images. The digital image processing and artificial intelligence communi- ties embraced the term pixel during the 1970s, followed by the television and image sensor communities in the 1980s.25 Debates and norms across research disciplines and industries over the last several decades have resulted in slight variations in definitions and uses of the term. The most common definition today is that a pixel is the smallest element of a digital image. Computers work with digital images as a set of numbers that comprise pixels. As shown in figure 1.3, a pixel is stored as three numbers, indicating the amount of red, green, and blue light needed to represent the color of one point in the image. It is possible to create almost any color with this method.26 Adding the maximum amount of red and green and turning off the blue, for example, results in yellow. The complete digital image in figure 1.2 is represented by a computer by storing three rectangular grids corre- sponding to the red, green, and blue light intensities for every pixel in the image. Returning to figure 1.2 helps illustrate a disconnect between the com- puter’s storage and our understanding of the image. Figure 1.4 shows a grayscale version of the same image at four different zoom levels. Each zoom level is centered on the left eye of the horse. When we look at the largest image, it seems apparent that this is a photograph of a horse and that its left eye is at the center of our cropped image. When we look at the 112449_14046_ch01_4P.indd 17112449_14046_ch01_4P.indd 17 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 18 Chapter 1 -1___ 0___ +1___ two highest zoom levels in isolation, it seems impossible to guess that these are images of a horse’s eye. However, these same pixels were identified as an eye in the lower zoom levels. How is it possible that the same numbers can be interpreted so differently? When we look at the display of a digital image, we understand the pix- els in context with one another. We can only identify the eye after putting Figure 1.3 The upper- left figure is a cropped and lower-resolution version of the shepherd seen in figure 1.2. The other panels show the red (upper right), green (lower left), and blue (lower right) pixel intensities. The numbers indicate how bright each color channel is as an integer from 0 to 255. 112449_14046_ch01_4P.indd 18112449_14046_ch01_4P.indd 18 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 19 ___-1 ___0 ___+1 it into perspective with other pixels that resemble the horse’s ears, nose, hair, and neck. These features are similarly understood only by putting them in perspective with all the other pixels. The implication for ana- lyzing digital images, then, is that a substantial gap exists between the numeric representation of the image by the computer and the parts of the image that one sees when viewing the image. The pixels that represent figure 1.1’s two images— from a digital scan of the watercolor image and Figure 1.4 Four zoomed and grayscale versions of the image in figure 1.2. The highest level of zoom (upper left) also includes the grayscale pixel intensities, an integer from 0 (black) to 255 (white). All of the images are centered on the left eye of the horse in the image. 112449_14046_ch01_4P.indd 19112449_14046_ch01_4P.indd 19 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 20 Chapter 1 -1___ 0___ +1___ the born- digital image on the right— provide another example. The spe- cific pixels used and printed from the two images are completely different, yet we are able to understand the images as both representing scenes with elements of palm trees, an ocean, and the sky. With all of this variation, how do we tell the computer which pixels to look for and what those sets of pixels mean? The challenge of interpreting pixels is further complicated by how images are stored compared to other data types. We return to the com- parison between images and text, but this time focusing on how they are digitally stored. Text is written as a one- dimensional stream of characters. These characters are written in an encoding scheme that defines an agreed- upon mapping from 0s and 1s into symbols.27 Compression software can be used to store the encoded text using a smaller amount of disk space, but this compression must be reversible. Going from the compressed format back into the raw text must be possible without losing any information. Since images are composed of pixels, they are in a different format. While displayed as an array of pixels on the screen, images can be stored in several compressed formats. Many standard compression formats, such as JPEG, use a lossy compression algorithm. These algorithms only approxi- mately reconstruct the original image. Similarly, it is possible in any storage format to rescale an image to a lower resolution. This process saves storage space but results in a grainier version of the original. Differences in stor- age methods between text and image data correspond to the semiological differences argued by scholars in media studies and visual culture studies. The fact that digital images can be scaled to a smaller size highlights the lack of explicitly coded elements within images. If an image consists of a code system, lossy compression will require losing some coded elements. However, images reproduced from compressed files have no detectable dif- ferences from the original file for a moderate amount of compression.28 An illustration of how lossy compression affects an image is given in figure 1.5. The colors, shapes, and objects within the frame remain discernable even under extreme forms of information compression. The format of digital images reinforces the semiological properties of visual materials. Pixels convey meaning only when put into context with one another by mimicking the act of viewing objects, people, and environ- ments directly through the human visual system. The rectangular grid of pixels printed in different shades of gray in figure 1.4 convey the image of a 112449_14046_ch01_4P.indd 20112449_14046_ch01_4P.indd 20 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 21 ___-1 ___0 ___+1 horse, for example, because looking at the resulting print shares similarities to the act of looking at a horse in real life. The complex ways that images make meaning are a large part of what make visual forms popular kinds of cultural expression and what makes visual materials particularly exciting objects of study. These complexities, however, must be accounted for when applying a computational analysis to the study of visual materials, a task that we now turn toward to fully theorize distant viewing. Figure 1.5 The image from figure 1.2 is shown (upper- left) along with five levels of increased compression. The compression algorithm uses a singular value decomposition on the individual color channels, reducing the dimensionality of the matrix of pixel intensi- ties to 100, 50, 25, 15, and 5, respectively. We used this approach because it does a good job of showing what happens to the image under extreme forms of compression. 112449_14046_ch01_4P.indd 21112449_14046_ch01_4P.indd 21 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 22 Chapter 1 -1___ 0___ +1___ Computational Exploration with Computer Vision Returning to our primary task of working computationally with a collec- tion of digital images, recall that standard methods for the visualization and exploration of data cannot be applied as- is to image data. Having seen the ways that visual materials convey meaning, we now have a more concrete way of understanding why applying computational methods to a collection of images presents certain difficulties. Fortunately, as is often the case, iden- tifying the source of these obstacles will become the first step in producing a solution. An example will help further illuminate the challenges that we need to solve. Consider the task of trying to automatically detect and characterize themes found within a large digital collection of newspaper articles. The explicit codes of written language provide a powerful tool for this task. Each of our texts can be split into individual words, the smallest linguis- tic unit meaningfully understood in isolation, and the number of times a word is used can serve to encode information about each document.29 Then, we could use the word counts to automatically find sets of terms that tend to co- occur within the same documents with a high probability. In other words, if one word from a detected topic is used in each document, other words within the same topic are also more likely to be used. With no explicit tagging of the dataset, this kind of model could, for example, detect a set of co- occurring words such as cloudy, cold, wind, and afternoon. Manual intervention is only needed to interpret this topic to conclude that these words all focus on the concept of “weather.” Both the interpretive acts of determining the keywords to associate with each article and assign- ing a meaning to the co- occurring words may be delayed until after a model has been applied. These are crucial features of many methods for textual analysis; the power of word counts to convey a reasonable approximation of document- level meaning has been one of the most important tools for the computational analysis of textual data. Now, consider the parallel visual task of finding and identifying themes within a collection of digital newspaper photographs. As we have argued in the previous two sections, there is no equivalent way of grouping together and counting raw pixel values in the way that we were able to do by group- ing letters into words. For this reason, before we do a computational analy- sis of a collection of images, one needs to interpret the messages encoded 112449_14046_ch01_4P.indd 22112449_14046_ch01_4P.indd 22 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 23 ___-1 ___0 ___+1 in the images. In other words, the images must first be viewed. There is no equivalent option, as there is with structured and textual data, to delay interpretation of the images’ meaning(s) before applying computational methods. In other words, the first step of interpreting each image must be to create a structured, or indexical, system that we can then aggregate and analyze. Because these interpretations will not perfectly capture all the information present in the image, care must be given to the ways that the images are interpreted relative to the guiding areas of inquiry. To be more concrete, the computational analysis of a collection of digi- tal images requires as a first step that the messages encoded within the images are interpreted through the construction of structured labels.30 We will refer to these labels as annotations and use annotation to describe the process by which these labels are created. Annotations can take a variety of different forms. They can be a single number, such as an indication of how many people are in an image. A single word can also be used as an annota- tion, such as an indication of the name of an object at the center of the image. Other annotations include automatically generated sentence- length captions or a large set of numbers representing the amounts of predefined colors present in the image’s frame. Annotations can even take the form of images themselves, such as through the tagging of images with other images that contain the same people. Often, a collection of different kinds of annotations will need to be produced to address a specific question that one is exploring with a given collection of images. How, then, do we go about producing these annotations? Manually creating structured annotations for a collection of digital images can be a laborious task. Even the production of a single, relatively clear label such as counting the number of people in the frame, can become prohibitively time- consuming when working with a large collection of images. More intricate annotations, such as outlining and describing every object in each image, are essentially impossible to construct manually for all but the smallest collections. Methods such as content analysis avoid these difficulties by only labeling a random sample of images and gener- ally restricting the annotations to a small set of straightforward and rela- tively easy to produce categories.31 These approaches make it impossible to iteratively explore a collection of images and limit our analysis to only a subset of possible research questions. Further, many more complex rela- tionships between visual features and archival data can only be established 112449_14046_ch01_4P.indd 23112449_14046_ch01_4P.indd 23 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 24 Chapter 1 -1___ 0___ +1___ by working with the entirety of a collection. Another approach is clearly needed. The process of creating and interpreting digital images through annota- tions generated by computer vision is at the center of distant viewing.32 Using algorithms to generate annotations allows us to work with the entirety of a collection. It also allows us to create intricate annotations that can be visualized, modeled, remixed, and aggregated through an iterative exploratory analysis. Our name for our approach to working with collec- tions of digital images highlights the fact that the creation of annotations should be seen as a process of viewing, or interpreting, and that this view- ing is done at a distance because it is mediated through the algorithmic pro- cess of computer vision. An understanding of the field of computer vision will further elucidate the possibilities and limitations of distant viewing. The field of computer vision focuses on how computers automatically pro- duce ways of understanding digital images. The field is interdisciplinary by design, drawing on research in areas as diverse as computer science, physics, engineering, statistics, physiology, and psychology. Most tasks in computer vision are oriented around algorithms that mimic the human visual system. Tasks include detecting and identifying objects, categorizing motion, and describing an image’s context. While visual understanding is at the center of the field, computer vision algorithms may also take a multimodal approach. Algorithms may use, for example, image captions or film soundtracks to aug- ment visual components in much the same way that humans integrate all of their sensory inputs to understand the world around them. Our ability to construct high- quality computer vision annotations is driven by current research focuses within the field of computer vision. These directions, likewise, are influenced by the industry and government applications that fund the research. Some of the earliest computer vision algorithms were designed to identify images of numbers; this research was explicitly funded to sort mail envelopes based on detecting handwritten postal codes.33 High- accuracy algorithms exist for the detection and identi- fication of faces within an image. The research behind these tasks has been driven in no small part by applications in surveillance, which we should engage with cautiously. Several tools provide high- quality annotations for the detection of cars, people, crosswalks, and stoplights. These annotations are the direct consequence of computer vision applications within the tech- nology of self- driving cars. When we use computer vision algorithms to 112449_14046_ch01_4P.indd 24112449_14046_ch01_4P.indd 24 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 25 ___-1 ___0 ___+1 produce automatically generated annotations, it is crucial to remember the role these funding streams play in the content and structure of available algorithms. The availability of accurate annotations is also a function of the level to which an annotation is abstract or culturally mediated. Some tasks are well- positioned to be addressed with computer vision. For example, computer vision algorithms are better than human evaluators at detecting defects in agricultural products.34 Similarly, a simple model can be used to iden- tify the orientation of photographs with almost perfect accuracy.35 Both of these tasks have concrete “answers” and can be identified by looking at only a small portion of the image, making them relatively easy tasks within computer vision. Other annotations present more difficulties and bring to the fore ethical and social questions. The goal of automatically identifying a person’s emotion based on a still image is very difficult and shaped by cultural politics.36 Computer vision algorithms struggle to attain human- like accuracy even when classifying strong emotions within a single cultural context.37 When dealing with more subtle emotions across a range of cultures, the task becomes nearly impossible even for human annotators, much less a computational algorithm. The types of questions and analyses available through distant viewing are shaped by the relative difficulty and constructiveness of building the algorithms that are annotating. The chal- lenge of determining if a task is amenable to exploration through computer vision is further complicated by determining exactly what we are decoding. Decoding through Viewing The process of distant viewing applies computer vision algorithms to auto- matically interpret a layer of meaning within images through the creation of structured annotations. As signaled in our terminology, computer vision algorithms engage in the process of viewing an image. This characterization allows for a reflexive formulation of distant viewing. Whereas we have so far used visual semiotics to argue for the necessity of computer vision, we can similarly take computer vision as an object study itself which can be analyzed through the application of media theories. Theories of communi- cation around decoding become key. The process of annotating images with computer vision can be under- stood as a mode of communication that transmits a message between the 112449_14046_ch01_4P.indd 25112449_14046_ch01_4P.indd 25 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 26 Chapter 1 -1___ 0___ +1___ materials of interest— the digitized images— and human audiences. To com- putationally decode the messages, people must decide which annotations to look for in an image using computer vision. The human eye is replaced by computational processes that identify the features in images through num- bers. As a result, computer vision decodes to interpret and convey encoded messages in digital images. At the same time, computational methods are created by people and are therefore not outside of cultural, social, and his- torical ways of seeing and practices of looking.38 By theorizing computer vision as a computational process of decoding messages (“viewing”), the method of distant viewing makes explicit that computer vision is a tech- nology of communication produced by people and therefore imbued with cultural, political, and social values. How to interpret visual media in a digital form is a question about con- veying and interpreting meaning. In his model of communication, Stuart Hall argued that messages are encoded and decoded.39 The sender produces a message in a form such as television. The message is circulated, and audi- ences then interpret the message. The message encoded may not be the message decoded. The values and beliefs of the creator and receiver shape the messages that are conveyed and interpreted. The form of the medium also impact which messages are communicated. For example, digital images make and convey meaning differently than audio. The same image, such as a meme, will often be interpreted differently by an audience in the United States than an audience in France. Computer vision has become another powerful actor in the process of encoding and decoding digital images. Nevertheless, exactly how is it possible to understand this newer technol- ogy of interpreting meaning in images? Images encode messages. Exactly how they send those messages and which parts of the message are decoded are shaped by what is recognized and how the signs and symbols that comprise an image are interpreted. Visual cul- ture studies, informed by semiotics and rhetorical studies, explores how images signify and communicate, which differs from how other forms of knowledge, such as text, do so.40 We return again to the relationship with text. Even at the level of an individual object, meaning is encoded in images differently than in text, as theorized by semioticians across fields such as linguistics, media studies, and visual culture studies. Computer vision has become a way for people to create annotations to decode visual messages. As a set of computational processes designed to 112449_14046_ch01_4P.indd 26112449_14046_ch01_4P.indd 26 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 27 ___-1 ___0 ___+1 interpret images, computer vision emerged to address the issue of how to understand images. In other words, computer vision is a computational model of communication designed to interpret information from digital images. To do so requires building annotations that replicates the features necessary to interpret the meaning of an image. Therefore, computer vision algorithms look for specific features by following processes to recognize patterns that we determine based on a task. The process of computational viewing through computer vision algo- rithms produces new structures that attempt to capture layers of meaning within images. Creating structured data is often described as information “extraction” and aligns with the popular channel encoding model of com- munication proposed by Claude Shannon. The model provides a commu- nication framework describing how a fixed message is passed between two parties. It focuses on the amount of intrinsic information contained in a message and the amount of redundancy needed to ensure a high probabil- ity that the resulting message will be transmitted between the two parties without any errors.41 Due to the nature of visual messages, however, Shannon’s communica- tion model does not accurately capture the process of producing structured data through computer vision.42 The decoded messages do not symmetri- cally represent an image’s intended meaning. Instead, in the language of Stuart Hall, computer vision algorithms are active participants in the pro- cess of knowledge production through the act of decoding. During this process of decoding, computer vision algorithms produce structured data from visual inputs. The knowledge produced by the algorithms— such as label names for detected objects or a probability that the image was taken outdoors— are not objective or intrinsic to the images themselves. Instead, the kinds of data labels that are privileged, and the internal mechanisms used to produce them, are significantly influenced by the social contexts that motivated, produced, and circulated the algorithms themselves. In other words, the algorithms produce knowledge by interpreting visual materials within the frame of their own artificially produced social context. Framing the use of computer vision as an imperfect decoding process high- lights the need to consider the underlying decisions privileged by existing algorithms. Computer vision expands the rate and scale of interpretation by becom- ing an intermediary between the eye and an image. Algorithms can iterate 112449_14046_ch01_4P.indd 27112449_14046_ch01_4P.indd 27 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 28 Chapter 1 -1___ 0___ +1___ over millions of images looking for features. The rate is increasing as hard- ware such as high- performance computing and GPUs reduce the time for analysis. The ability to zoom out and view at a large scale is a powerful affordance of these recent advances. Messages that may have been difficult to interpret by looking at just a few images can be decoded through large- scale analysis. This changes not only the kinds of messages that we can decode but also what we can encode, since we now have a new mode of communication with which to interpret and convey messages. Distance from the human eye combined with large- scale computation could lead to claims about objectivity. After all, powerful discourses have lent fields built on numeracy and quantitative evidence claims to neutrality and objectivity.43 However, theorizing computer vision as a mode of communica- tion inculcated in the process of sending and receiving messages challenges such claims. Instead, computer vision becomes a cultural and social process. The annotations that we adopt, create, and resist through computer vision are in conversation with existing cultural, social, and technical values shaped by visual cultures. The concept of “viewing” becomes particularly important. Decoding images requires decisions about which annotations to view with. How and what we choose to see and look for— where we direct our gaze, how we see, and how we perceive and discern visually— are culturally and socially shaped.44 Decoding, therefore, is not independent of visual cul- tures. Instead, viewing is not simply a biological process but relies on ways of seeing and practices of looking that people learn from each other to inter- pret the world through.45 Viewing, therefore, conveys that decoding through computer vision is a set of decisions about how to interpret visual messages that is shaped by cultural and social values, in addition to producing them. The distinction, as theorized in visual culture studies, between seeing and looking becomes necessary for further expanding on the stakes of using the term viewing.46 Seeing occurs through the physical process of receiving light when one’s eyes are open. This does not mean that one is looking, which can be defined as actively seeking to see through visual perception. For example, one might be in a room and see a photograph but choose not to look at the image. One can also try to look and not see. For example, one may return to the room to look at that photograph but not be able to locate it because the lights are turned off. Seeing occurs when the eyes are open and whether we want to see or not. The act of looking is an intentional process where we decide what we want to see. Types of looking not only include what to look 112449_14046_ch01_4P.indd 28112449_14046_ch01_4P.indd 28 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 29 ___-1 ___0 ___+1 for but ways of looking, such as watching. Viewing, therefore, indicates the entanglement of seeing and looking in analyzing digital images. Scholars such as John Berger, Lisa Cartwright, Lisa Nakamura, and Marita Sturken have further theorized these distinctions as producing visual cultures that we learn, circulate, and rely on to decode the meaning of images. Ways of seeing and practices of looking shape the encoded meaning in images. As John Berger argued in the popular 1972 BBC television series and subsequent book Ways of Seeing, “an image became a record of how X had seen Y.”47 He analyzed how X, a community such as White male Euro- pean oil painters, had seen Y, women as forms to depict in the nude, and argued that they produced this way of seeing for the male gaze and thereby revealed as well as produced problematic gendered power relations. Lisa Cartwright and Marita Sturken expanded on this concept, calling for visual culture studies to address “practices of looking” to emphasize the inten- tionality of looking in place of the passiveness of the biological process of seeing. One can see and not look. One can look and not see. Scholars such as Stuart Hall and Lisa Nakamura have further argued that these ways of seeing and practices of looking are shaped by and produce ideologies such as gendered and racialized visual cultures.48 Therefore, the ways of looking replicated through media shape which messages are encoded and decoded as well as being imbued with beliefs, ideologies, and values. How we view is also a culturally, socially, and historically informed deci- sion.49 Until the 1950s, most technologies of looking still involved the eye, such as a magnifying glass, telescope, and camera. The advent of comput- ing and computer vision has enabled a way of looking that no longer relies on the physical process of the human eye.50 Yet, the term computer vision and scholarship in machine learning naturalize a computational process through the language of biological seeing and the eye.51 This research is not a biological process but rather is focused on emulating humans’ ways of see- ing and practices of looking through the fundamental way that computers “see,” which is through computations based on the pixel intensities. Ways of seeing, such as color perception, and practices of looking, such as iden- tifying people, are computationally decoded to interpret images. The result is an epistemological and ontological shift in what and how we view that does not easily fit into current theories.52 Therefore, we see that annotation through computer vision can be charac- terized as decoding messages in digital images, as a form of communication 112449_14046_ch01_4P.indd 29112449_14046_ch01_4P.indd 29 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 30 Chapter 1 -1___ 0___ +1___ that changes how we “see,” and as a new scale with which we can view images. When annotations are built through computer vision, ways of see- ing and practices of looking are encoded into the computational processes to decode information about the images. This allows us to shift how we understand computer vision. Rather than focusing on whether a computer vision algorithm can be objective, we focus on defining which ways of view- ing are encoded and decoded, and assessing the possibilities, limitations, and effects of these decisions. Conclusions In this chapter, we have focused on the epistemological and semiological implications of applying computer vision algorithms to the study of digital images, which we have theorized as distant viewing. The theory is based on the ways that visual objects make meaning, which is further supported by the way that digital images are stored and displayed. These features neces- sitate the creation of annotations that capture a way of viewing images, which requires the use of computer vision in order to apply distant viewing to larger corpora. As a method, distant viewing also provides ways of reflexively and criti- cally engaging in the computational analysis of images. While the next chapter will go into further detail, a discussion of a few possible avenues of inquiry using distant viewing is warranted. Ways of seeing and practices of looking shape our daily lives and are entangled in questions about power. Who gets to look, who is the subject of looking, and which practices of looking circulate are not neutral processes. Whether efforts by US commu- nities of color to assert full citizenship through twentieth- century African American portrait photography, the use of the close shot in film on female bodies for the male gaze and thereby producing misogynist ways of see- ing, or the use of the skin brightening and warming Instagram filters to assert ageist and racialized standards of beauty, visual cultures are being encoded and decoded through images constantly.53 Images, therefore, are shaped by and circulate practices of looking. However, these decisions are often implicit and therefore difficult to recognize and make explicit. Dis- tant viewing allows for decoding digital images and their visual cultures, which we demonstrate in chapters 3 through 6 by viewing a range of still and moving images. 112449_14046_ch01_4P.indd 30112449_14046_ch01_4P.indd 30 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Theory 31 ___-1 ___0 ___+1 As we go about distant viewing, there are significant reasons to be cau- tious about using the method to identify and challenge ways of viewing. The annotations for computer vision are largely driven by industry and government applications. Whether they are trying to identify trucks and light posts for self- driving cars or people for surveillance technology, com- puter vision methods are often driven by entanglements with capitalism, state power, and militarism.54 However, computational image analysis should not just be the domain of multinational corporations and govern- ment. We can use distant viewing to ask different questions and to critique and question our visual cultures of computer vision. Furthermore, perhaps the most exciting part is that we can recreate and reimagine the role and possibilities of computer vision. Reconfiguring how to use and remake ways of viewing through computer vision algorithms can be a timely and labori- ous project. Distant viewing offers one way that we can question existing annotations and remake computer vision. Designed with the intention to mimic the human eye and neural pro- cesses, computer vision algorithms look for certain features by following processes for calculating pixels to recognize patterns. Computer vision “sees” through numeracy and “looks” based on the assigned numerical patterns. Therefore, computer vision enables identification of practices of looking in visual materials and algorithmically creates practices of looking. Computer vision encodes social, cultural, historical, and political values algorithmically. Distant viewing, therefore, enables a reflexive view of com- puter vision. We can use distant viewing of digital visual materials to inter- rogate the ways of viewing embedded in computer vision. Distant viewing provides a call to action. We are slowly cracking away at the facade that algorithms are unbiased and recognizing that algorithms can do incredible harm as well as good. Yes, there are algorithms of oppres- sion.55 Yes, there are weapons of math destruction.56 However, we have fewer capacious theories for understanding the computational processes that produce ways of viewing at a large scale. So, we heed media scholar Steve Anderson’s call to interrogate and theorize vision technologies through the lens provided by media and visual culture studies, and zoom out and in through data science and digital humanities.57 As long as we are involved in the process of building annotations for seeing, and therefore viewing, then having a method and theory for analyzing, interpreting, and critiqu- ing these computational processes matters. So, we need distant viewing. 112449_14046_ch01_4P.indd 31112449_14046_ch01_4P.indd 31 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 -1___ 0___ +1___ 112449_14046_ch01_4P.indd 32112449_14046_ch01_4P.indd 32 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 ___-1 ___0 ___+1 Images. Images everywhere. We see, look, analyze, and interpret images, often in seconds. A newspaper cover photo of a palm tree bent over in the wind and waves crashing ashore can quickly convey a storm. A histori- cal photograph from the 1930s of a man, horse, and dog standing upright together in a rural plain sends messages of dominance, fortitude, and loy- alty. To add a different kind of example, think of the feed on Instagram and how quickly one swipes up while still interpreting the images posted by the accounts that one follows. Messages are quickly decoded based on ways of seeing and practices of looking that we have learned, constructed, and resisted. What happens when we slow down and ask: What is this an image of, what are the messages being conveyed, and how do we know this? In other words, what if we want to analyze how the messages interpreted were visually constructed? What if we want to explore whether there are mes- sages that were missed on the first view? And, what if we want to view these images through certain ways of seeing and practices of looking? A person can sit down and closely analyze an image, but this takes time and becomes a challenge when the number of images increases. Consider the challenge of analyzing images from another angle. Librar- ies, archives, and museums have made significant commitments to digitiz- ing visual media. Priorities for digitization include collections with a large audience, those that the institution wants to bring attention to, or collec- tions in a quickly degrading format. What if we want to describe each of these images? The decisions that one might make for an exhibition may not be the same as decisions designed for facilitating access, discovery, and analysis. What, then, do we do if we decide to change how to describe the images? How does one do this at large scale? 2 Distant Viewing: Method 112449_14046_ch01_4P.indd 33112449_14046_ch01_4P.indd 33 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 34 Chapter 2 -1___ 0___ +1___ The approach we offer to this challenge, distant viewing, uses computer vision to computationally explore digital images. The previous chapter pre- sented a theoretical treatment of the possibilities, limitations, and implica- tions of the distant viewing approach. Here, we focus on the practical aspects of applying computer vision to a set of digital images. Building upon the stakes outlined in the previous chapter, we situate distant viewing within the field of data science. Our analysis focuses on how the method of distant viewing engages with existing data science methodologies while accounting for the unique ways that images make meaning and explicates the modeling assumptions that underlie the computational analysis of visual data. As a starting point, consider a typical workflow for computational analy- sis from data science, an interdisciplinary field that applies and develops methods to understand collections of data.1 The first diagram in figure 2.1 illustrates a series of high- level steps involved in the processing of data. The steps and names are adapted from Hadley Wickham and Garrett Grol- emund’s well- known text on the subject.2 The pipeline shown here is focused on the algorithmic aspects of working with the data and therefore does not include steps such as designing a research question and collecting data. The starting point, instead, involves loading and organizing information. This structured data is often stored in the form of a relational database or files in a well- known format. The pipeline’s first step, organization, includes standardizing names and units, identifying data errors, and combining or separating data according to the format needed for subsequent analyses. After the data are organized, the pipeline moves to the process of explora- tion. Here, an iterative mix of visualizations, transformations, and model- ing is used to understand the data and address various research questions. Finally, the third step involves communicating the results of the explora- tion. The communication step can take various forms depending on the desired audience, such as short presentations, peer- reviewed papers, and digital projects. The arrows in the figure show the conceptual flow of infor- mation. However, data analyses almost always require a more flexible itera- tion back and forth between each part. The ways that visual materials make meaning combined with the mechan- ics of digital images and computer vision require a modification of the stan- dard data science pipeline. Implicit in the organization step of the pipeline is the notion that producing structured data from the available inputs requires only a reorganization of the original dataset.3 However, as shown in 112449_14046_ch01_4P.indd 34112449_14046_ch01_4P.indd 34 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 Distant Viewing: Method 35 ___-1 ___0 ___+1 chapter 1, knowledge produced from computer vision algorithms works dif- ferently. Creating a structured representation of messages encoded in digital images involves a process in which information is lost as well as created. Informed by the theory of distant viewing, our method adds a new step into the standard data science pipeline, which is incorporated into the sec- ond row of figure 2.1. This new first step encapsulates the process in which digital images are annotated with computer vision techniques. The con- struction of these annotations is inserted prior to the step of data organiza- tion.4 Rather than assuming the data exists or is given, we make explicit through this addition the creation of annotations from the visual data.5 Unlike the process of organizing structured data, annotations are not just a reorganization of the original inputs. Rather, annotations capture elements of the images using algorithms that only view in a certain way. Determining Figure 2.1 Two pipelines for working with data. The top represents a typical workflow when work- ing with structured data, adapted from work by Hadley Wickham and Garrett Grole- mund. The pipeline on the bottom illustrates a modification built using distant viewing, which first annotates the visual materials. (Steps such as developing a research question or hypothesis and collecting the data are integral parts of data science but omitted here for clarity.) The arrows show the conceptual flow of information; analyses of data almost always require an iterative approach moving back and forth between each part. 112449_14046_ch01_4P.indd 35112449_14046_ch01_4P.indd 35 14/07/23 7:07 PM14/07/23 7:07 PM Downloaded from http://direct.mit.edu/books/book-pdf/2163333/book_9780262375160.pdf by guest on 12 February 2024 36 Chapter 2 -1___ 0___ +1___ which annotations to create and how to create them, therefore, becomes an important step in the analysis of digital images. Following the annotation step, distant viewing engages with the process of organizing the data. This involves aggregating the annotation from the computer vision algorithms and combining with other structured informa- tion, such as a description of the digitization formats used or information about each digital image’s creator. The tasks and goals of the exploration and communication tasks mirror those of a standard data science pipeline but require some unique considerat