From Data Analytics 
to Data Hermeneutics
Online Political Discussions, Digital Methods and 
the Continuing Relevance of Interpretive Approaches
Paolo Gerbaudo
Abstract
To advance the study of digital politics it is urgent to complement 
data analytics with data hermeneutics to be understood as a meth-
odological approach that focuses on the interpretation of the deep 
structures of meaning in social media conversations as they develop 
around various political phenomena, from digital protest movements 
to online election campaigns. The diffusion of Big Data techniques in 
recent scholarship on political behavior has led to a quantitative bias 
in the understanding of online political phenomena and a disregard 
for issues of content and meaning. To solve this problem it is necessary 
to adapt the hermeneutic approach to the conditions of social media 
communication, and shift its object of analysis from texts to datasets. 
On the one hand, this involves identifying procedures to select samples 
of social media posts out of datasets, so that they can be analysed 
in more depth. I describe three sampling strategies  – top sampling, 
random sampling and zoom-in sampling – to attain this goal. On the 
other hand, “close reading” procedures used in hermeneutic analysis 
need to be adapted to the different quality of digital objects vis-à-vis 
traditional texts. This can be achieved by analysing posts not only 
as data-points in a dataset, but also as interventions in a collective 
conversation, and as utterances of broader “discourses”. The task of 
interpretation of social media data also requires an understanding of 
the political and social contexts in which digital political phenomena 
unfold, as well as taking into account the subjective viewpoints and 
motivations of those involved, which can be gained through in-depth 
interviews, and other qualitative social science methods. Data herme-
neutics thus holds promise for a closing of the gap between quantita-
tive and qualitative approaches in the study of digital politics, allow-
ing for a deeper and more holistic understanding of online political 
phenomena.
Keywords: Big data; digital politics; hermeneutics; data analytics 
qualitative methods; random sampling; close reading.
DOI 10.14361/dcs-2016-0207
DCS | Digital Culture and Society | Vol. 2, Issue 2 | © transcript 2016
96 Paolo Gerbaudo
Introduction
The digital transformation of our societies has not just transformed the “ontology” 
of politics, i. e. the nature or essence of contemporary political phenomena in a 
variety of domains, from social movements using data leaks as protest tactics to 
political campaigns mastering the art of data-driven targeted advertising. It has 
also transformed the “epistemology” of political research, i. e. the methods used 
to analyse political phenomena as they unfold online. If opinion surveys, statis-
tical studies of electoral behaviour, content analysis of political speeches and news 
broadcasts have long been the tools of the trade for political analysts, we are now 
witnessing the development of an array of digital methods in political research – 
sophisticated computational techniques that are employed to sift the ocean of 
information contemporary politics is immersed in.
The most popular method in the emerging field of digital politics – the fledg-
ling field of scholarship that explores the variety of political phenomena enabled by 
digital technology – is undoubtedly data analytics. Data analytics can be described 
as a series of techniques for statistical and computational analysis of various types 
of digital datasets (Kambatka 2014). Social scientists have used various kinds of 
econometric tools to analyse the different aspects of social media conversations, 
typically including: their network structure (see for example, González-Bailón et 
al. 2013); the temporal evolution of conversations (Conover et al. 2013); the process 
of information diffusion (Theocharis 2013); and the correlation of various popu-
larity metrics, such as the number of likes or retweets (Dubois et al 2014).
Data analytics has enriched research on digital politics, by providing sophis-
ticated ways to study and visualise online political dynamics. However, it has fast 
become a sort of methodological orthodoxy in the field, leading some researchers 
to overlook its manifold limits and shortcomings (Tufecki 2014; boyd/Crawford 
2014). More importantly, its quantitative bias has contributed to marginalizing 
questions of cultural meaning and social motivation, which are fundamental to 
understand the content of social media conversations (Couldry/Stephansen 2014).
This article aims at fleshing out an alternative, qualitative approach to the 
study of online political phenomena that addresses some of the problems with 
data analytics. This is what I term “data hermeneutics”, an updating of the inter-
pretive methods originating from a number of disciplines including phenomeno-
logical philosophy, literary criticism, qualitative sociology and cultural anthro-
pology – to deal with the specific properties of digital communication and social 
media datasets. Political research cannot content itself with ever more sophisti-
cated forms of computational analysis of political behavior online. It also needs 
to answer qualitative questions about the “who”, “what”, “how” and “why” of 
digital political phenomena. If it has to have any real explanatory power, it has to 
pay attention to the meaning and subjective viewpoints inherent in social media 
conversations and the contexts in which these conversations occur.
From Data Analy t ics to Data Hermeneutics 97
The tradition of hermeneutics with its focus on structures of deep meaning 
and subjective worldviews reflected by them, provides with an inspiring but 
outdated blueprint on how to pursue such endeavour. To make this approach 
current again it is necessary to revise it to match the peculiar conditions of digital 
communication, which has a non-linear, dialogic, extemporaneous, and interac-
tive character. Whereas traditional hermeneutics has dealt with the analysis of 
various traditional “texts” (novels, films, political speeches, news articles), data 
hermeneutics needs to find ways to analyse online content, approaching data as 
“inscriptions” (Ricoeur 1971) or recorded traces of a peculiar form of social text: 
social media conversations.
I discuss two practical aspects of this digital adaptation of hermeneutics: 1) 
the development of qualitative sampling procedures geared at reducing the size 
of social media datasets; 2) the development of “close data reading” procedures 
that may help interpret in the context of larger conversations and in relation to a 
number of connected discourses, narratives, motivations and worldviews.
First, data hermeneutics requires sampling procedures aimed at reducing the 
size of the datasets to a scale amenable to qualitative analysis. I discuss three such 
procedures: top sampling, random sampling and zoom-in sampling. In the first 
case the database is filtered for the top messages based on a number of popularity 
metrics, such as number of retweets or of likes. In the second case the sample is 
obtained by selecting a random set of posts, (tweets or Facebook posts), which can 
be considered representative of a conversation. In the third case, the researcher 
zooms in on a particular point in the conversation deemed to be particularly 
significant – for example a spike in user engagement. These three procedures, 
which can be used in combination, provide researchers with a “select social media 
dataset” that can then be analysed in more depth.
Second, “close reading” and “in-depth analysis”, expressions that condense 
the modus operandi of hermeneutics, need to be revised to match the properties 
of social media data vis-à-vis traditional texts. I propose a three-step “close data 
reading” procedure: reading posts as rows in a dataset; as part of the conversa-
tion; as part of a certain social discourse. First, the researcher reads posts as data-
points in a dataset, paying attention to specific content and stylistics of a given 
post. Second, she approaches posts as exchanges in the conversation they were 
“scraped” from, paying attention to their embeddedness in a dialogic commu-
nication. Third, a full understanding of the meaning of social media posts and 
conversations requires interpreting their themes in light of the cultural, social and 
political contexts they navigate. To this end researchers need to take into account 
the subjective viewpoints and motivations of participants, by using more tradi-
tional qualitative methodologies, such as in-depth interviews.
The article begins by reviewing the current debate on digital methods in 
digital politics research, focusing on a number of critical issues, and in particular 
the neglect for issues of meaning. The second part of the article moves instead to 
flesh out in the positive the data hermeneutics approach, by presenting connected 
98 Paolo Gerbaudo
sampling procedures and analytical techniques. The conclusion sums up the 
content of my methodological proposal and considers some of the questions and 
challenges ahead for digital politics research.
Beyond the quantitative bias of data analytics
Methodology in the emerging field of digital politics is dominated by “data 
analytics” (Kampatka et al. 2014), a set of statistical and computational techniques 
for the analysis of large social media datasets. The development of data analytics 
has been led by major internet firms as Facebook, Amazon and Twitter, which 
in recent years have created data science teams to develop sophisticated market 
intelligence (Chen et al. 2012). Social scientists, and more specifically political 
researchers, have been fast to follow this trend, using data analytics to study what 
happens in social media conversations, relevant to various political phenomena, 
from social movements to electoral campaigns and political debates (Theocharis 
2013; González-Bailón et al. 2013; Conover et al., 2013). Data obtained by “mining” 
from the APIs of social network sites is subject to various forms of statistical and 
computational analysis. Among the most typical issues analysed via data analytics 
feature the structure of social media conversations often pictured through network 
diagrams; their clustering into distinct groupings sharing common features; the 
distribution and correlation between different social media metrics (followers, 
likes, shares, retweets etc.); the temporal evolution of conversations (peaks, lows, 
etc.); the frequency of key terms used in conversations.
Data analytics has provided innovative and sophisticated techniques to renew 
and advance sociological methods, beyond the current crisis of empirical sociology 
(Burrows/Savage 2007). It can be understood as a necessary update to traditional 
quantitative social science data such as surveys (Kitchin, 2014; Tinati, et al. 2014), 
facing up to the new methodological challenges posed by the abundance of social 
media data, and allowing to explore new social and political phenomena that have 
evident interest for research. Furthermore, data analytics offers great advance-
ments the detail and sophistication of quantitative analysis, allowing researchers 
to conduct studies “at scale” (Kambatla 2014), examining the entire “population” 
of a given phenomenon or a close approximation.
These positive elements notwithstanding, data analytics has demonstrated 
significant shortcomings, which call for the development of alternative approaches. 
After an initial phase of naïve enthusiasm, in fact in recent years this method-
ological approach has begun to be criticized by a number of scholars, who have 
highlighted how data analytics frequently falls short of the standards expected in 
quantitative research (Tufekci 2014; Tinati et al. 2014; Lazer et al. 2013). Zeynep 
Tufekci has argued that in many circumstances there has been a “lack of clarity 
with regard to sampling, universe and representativeness” (Tufekci 2014: 324). 
Similarly Kate Crawford and danah boyd have highlighted that “claims to objec-
From Data Analy t ics to Data Hermeneutics 99
tivity and accuracy are often misleading” and that it is erroneous to think that 
“bigger data is always better data” (2014: 217).
There is however a more fundamental critique that can be made against data 
analytics: its quantitative bias and its neglect of issues of meaning. By focusing 
on the mathematical form of social media conversations  – its structure and 
dynamics – data analytics tends to overlook their content, and the deep meaning 
structures expressed in them and the motivations they betoken. Data analytics 
can reveal with great sophistication the mathematical properties of datasets, but it 
is not well equipped to answer qualitative questions about the “what”, the “how” 
and the “why” of social media conversations. Also when content is analysed, as 
in computer-assisted textual analysis (Grimmer/Stewart 2013), the logic remains 
one of “counting occurrences”, for example measuring frequency of certain terms, 
or the emotional tone of a conversation. This quantitative bias is exacerbated by 
the fact that data analytics-driven politics research often lacks in context (Tufekci 
2014; Crawford/boyd 2014). To quote Clifford Geertz sometimes it is as if data 
analytics researchers seem to think that it is possible to understand phenomena 
without knowing then (1957: 64).
Looking at social media data as texts
The shortcomings of data analytics should be seen by qualitative researchers as 
evidence for the continuing relevance of interpretive methods. Social media data 
is not inherently hostile to a qualitative research agenda, and potentially offers a 
treasure trove for qualitative research. The abundant and fine-grained information 
that can be retrieved from online conversations, including textual information, 
pictures, videos, and other digital objects is a resource that waits to be tapped from 
a qualitative angle. Indeed some qualitative researchers have begun working in 
this direction, as seen in the field of online discourse analysis (Androutsopoulos/
Beißwenger 2008; Kelsey/Bennett 2014). Yet, much still remains to be done in 
devising effective strategies and procedures for the analysis of social media data 
that can make qualitative methods relevant once again.
Data hermeneutics is a notion that proposes a radically different orientation 
from the one pursued by data analytics, by giving a new lease of life to one of the 
most old-fashioned of interpretive methods: the hermeneutic approach. Where 
data analytics operates according to a purely objectivist view of online political 
conversations, conceived of as forms of collective behaviour which can be objec-
tively measured, data hermeneutics  – alike all interpretive methodologies  – 
operates with the idea that these conversations are first and foremost symbolic 
interactions which cannot be understood without taking into account the subjec-
tive viewpoints of those involved. Where data analytics – as in fact all forms of 
analytics – involves various forms of numerical analysis, drawn from the field of 
statistics and computation, data hermeneutics centers on the symbolic analysis 
100 Paolo Gerbaudo
of the meaning structures of online conversations, in light of connected social 
discourses and motivations. Finally, where – as the very etymology of the term 
analytics suggests  – data analytics is mostly interested in analyzing that is in 
“breaking down” a given phenomenon into basic units and variables for statistical 
analysis, data hermeneutics’ chief concern is the synthetic aim of interpreting, 
reconstructing and explaining the overarching narratives that underpin social 
media conversations.
Data hermeneutics is a digital adaptation of hermeneutics, a term that 
derives from the ancient Greek word “hermeneuo” which means to “understand” 
or to “interpret” and can be described as a broadly developed set of methodolo-
gies for interpretation (Szondi 1975). The origins of this approach hark back to 
Greek antiquity and medieval philosophy, with its interest in the different literal 
and allegorical layer of meaning of sacred texts. In modern times hermeneutics 
has been associated with phenomenological philosophy, and in particular the 
work of Edmund Husserl (1970), Martin Heidegger and his pupil Hans Georg 
Gadamer (2004), and has influenced a wide array of social and political theo-
rists, including Max Weber (1978; 1981), Walter Benjamin (1977), Fredric Jameson 
(1981) and Anthony Giddens (1984). The hermeneutic interpretive approach has 
also profoundly shaped research methods in the humanities and social sciences, 
as most evidently seen in the context of new criticism, qualitative sociology and 
social anthropology (Bleicher/Bleicher 1980).
In the context of literary criticism (Bressler 1999), hermeneutic approaches 
have informed the development of “close reading” procedures (Wolfreys 2000). 
Close reading can be described as a process of deep analytical engagement with 
a text – a novel, a poem, but also a film or any other similar artefact – with the 
aim of exploring the complex network of meaning that underpins it. Thus, for 
example, in analyzing a novel, a film, or a political speech, literary scholars look 
at their content and formal characteristics, such as the language, tone, imagery, 
and rhetorical figures. While sometimes – as in the case of New Criticism and 
semiotics – this analysis can take an purely formalistic character, close reading 
has also been used in the context of more sociologically minded discourse analysis 
approaches, as performed for example in the context of cultural studies and soci-
ology (Wodak/Krzyzanowski 2008), looking for the connection between specific 
texts and broader discourses.
In social science, the hermeneutic approach has been popular among both 
sociologists and anthropologists. Key in this respect has been Max Weber’s view of 
sociology as an interpretive science, different from the natural sciences, because 
the subjects and objects of analysis are both human beings, bestowed with 
consciousness and reflexivity (Tucker 1965). The priority for the social sciences, 
vis-à-vis the natural sciences, is Verstehen, the in-depth understanding of the 
subjective viewpoints, motives and worldviews that inform social action, rather 
than action as just an externally observable behaviour (Weber 1978; 1981). Anthony 
Giddens expanded on this idea by coining the notion of “double hermeneutics”, 
From Data Analy t ics to Data Hermeneutics 101
which highlight how social science studies does not study only people’s behaviour 
but also people’s interpretation of the social world and of their social action, and 
thus revolves around an interpretation of already existing interpretations, or the 
scholarly interpretation of lay conceptions (1984: 20).
The hermeneutic approach has also been taken up by social anthropologists 
such as Clifford Geertz, who argued that in producing “thick descriptions” of 
communities and social practices, anthropologists should take into account “the 
interpretations to which persons of a particular denomination subject their expe-
rience” (1973: 15). He argued that interpretive methods required an effort to “find 
one’s feet” (Geertz 1973: 18) in the phenomena, or to use another metaphor “put 
oneself in the shoes” of the actors and communities analysed, looking at people as 
conscious and creative subjects rather than objects prey to social forces they cannot 
control. This interpretive orientation has been at the heart of qualitative methods 
such as in-depth interviews and focus groups and has informed the development 
of the now popular “grounded theory” approach (Strauss/Corbyn 1998) in which 
researchers are expected to develop their interpretation of social phenomena in a 
bottom-up manner, rather than testing a-priori hypotheses.
In recent years there have been some inklings of a digital adaptation of herme-
neutics, as signaled by terms as “computational hermeneutics” (Harnad 1990; 
Mohr et al. 2013) and “digital hermeneutics” (Capurro 2000). Rafael Capurro 
for example has argued that hermeneutics needs to face up to the challenge of 
digital technology, and develop an “understanding [of] the foundations of digital 
technology and its interplay with human existence” (ibid: 37). This article contrib-
utes to this emerging line of methodological reflection and practice, by exploring 
specific strategies and procedures for the specific purpose of social media analysis 
and digital politics research.
A digital adaptation of hermeneutics, does not simply entail saying that 
hermeneutics needs to “find its own feet” in the digital world, but also that to 
understand the digital world it is necessary to recuperate the concern with inter-
pretation which is ultimately hermeneutics’ raison d’être. This assertion is a highly 
contentious one, due to the anti-interpretive character of the ideology of Big Data, 
or “dataism” (Van Dijck 2014) – and the idea that data is already a ready-made 
form of knowledge which does not require active interpretation. This persua-
sion has been put forward most explicitly in a famous article by Wired magazine 
editor Chris Anderson, where he argued that in the present data deluge “[c]orrela-
tion is enough. We can stop looking for models. We can analyze the data without 
hypotheses about what it might show” (2008). This intervention has been criti-
cized as going too far by other Big data experts (Bollier/Firestone 2010; Cukier/
Mayer-Schönberger 2013: 72). Yet, it is interesting precisely because it reveals in 
condensed form the overly positivist and anti-hermeneutic stance of data science.
Anderson’s prophecy about the end of theory and interpretation neglects a 
number of facts. First, there is no such a thing as “raw data”, which is in fact 
an “oxymoron” (Gitelman 2013), since data is always structured in higher order 
102 Paolo Gerbaudo
categories that reflect various biases and assumptions as expressed in the DIKW 
(Data, Information, Knowledge, Wisdom) hierarchy used in Informatics (Rowley 
2007). Secondly, datasets display a number of social and political biases that reflect 
and sometimes amplify social inequality (O’Neil 2016), and can be identified only 
through processes of in-depth interpretation. Thirdly, the overabundance of data 
makes the task of interpretation particularly important. As argued by Alyssa Wise 
and David Shaffer “with larger amounts of data, theory plays an ever-more critical 
role in analysis” (2015: 5). Therefore, rather than hastily throwing interpretation 
out of the window, it is urgent to revive and revise interpretive methodologies to 
match the conditions of a digital era.
The main challenge data hermeneutics is to shift from texts to data as the 
main object of analysis; or better to find ways to read data as text, that is as a partly 
coherent and discrete web of meaning. Interpretive approaches have traditionally 
been concerned with analyzing texts – novels, poems, films, speeches, interviews, 
field-notes, etc.  – by examining them in great depth sentence by sentence, one 
might say, or even word by word, as signified by notions as “close reading” and 
“in-depth analysis” frequently referred to as methodological short-hands. Social 
media conversations may indeed analysed in ways similar to the analysis of tradi-
tional texts, such as by exploring their language, imagery, tone, and other stylis-
tics. However, significant modifications are necessary due the specific nature of 
social media as objects of analysis.
With their non-linear, extemporaneous, and interactive nature, social media 
conversations are radically different from a novel, a film, or an ethnographic 
field-note. Consider for example the way in which social media resemble more 
oral conversations, rather than written texts; the way in which each tweet or 
Facebook message can hardly be understood in isolation from other messages; the 
sheer quantity of social media messages and the connected risk of information 
overload; the speed and instantaneity of conversations; their fluid and networked 
character; or the way in which the various interactions available on social media 
(such as liking, retweeting, or favouriting) add another layer of meaning that 
was unknown in pre-digital texts (Van Dijck/Poell 2013). These idiosyncrasies 
of online communication pose serious challenge to interpretive approaches and 
require significant adaptations. In the continuation of this article I focus on two 
practical issues relevant in the development of data hermeneutics: a) “small data” 
sampling methods; b) and “close data reading” procedures.
Sampling social media datasets for qualitative analysis
The main obstacle for data hermeneutics lies precisely in the “Big-ness” of Big 
Data, in the vastness of datasets available to researchers. While this is the aspect 
that makes social media datasets so interesting for quantitative researchers due 
to their great level of detail and the possibility to study conversations “at scale”, 
From Data Analy t ics to Data Hermeneutics 103
it is also the element which is most problematic for qualitative researchers who 
are used instead to work with “small data” (Couldry/Stephasen 2014). Qualitative 
researchers are expected to engage at length with research material, exploring 
the fine-grained meaning structure of texts and connected discourses. This 
approach limits the amount of evidence which can be analysed. The employment 
of textual analysis software such as NVivo or ATLAS.ti (Friese 2014), provides 
only partial solace. Ultimately effective close reading continues to imply a great 
deal of “manual” coding by “human operators”. From this situation it follows that 
the main challenge for data hermeneutics is one of focus: reducing the amount of 
data to analyse, to a selected sample which can still be considered significant and 
representative of a given aspect of the conversation.
Three sampling procedures can be used to perform this task: top sampling, 
random sampling, and zoom-in sampling. First, one may decide to sample for top, 
by focusing on the messages which – based on a number of popularity metrics 
(likes, retweets, shares) – can be considered as the most visible or important in 
a given conversation. Second, a different strategy involves random sampling, 
selecting by chance a subset of messages from a given conversation using appro-
priate software. Third, “zoom-in sampling” involves concentrating on a particular 
period of time in the conversation, which for whatever reason is considered partic-
ularly significant (start dates of protest waves, election days, etc.). Each of these 
sampling procedures provides a different approximation to a data sample for quali-
tative analysis, and will therefore befit different research questions and designs.
Top sampling is a strategy that has already been utilised by researchers, espe-
cially those interested in the behaviour of “power users”, user that have dispropor-
tionate level of influence on online conversations (Cha et al. 2010), on the basis 
of a number of popularity metrics (likes, retweets, favourites etc.) Practically, 
sampling at the top is fairly easy and can be performed by using standard spread-
sheet software such as Microsoft Excel, selecting the column of the variable one 
takes as the most indicative of the popularity of messages, ordering in descending 
order, and then filtering the top 50, 100, 200, 1000 messages. The viewpoint over 
a conversation offered by this procedure is evidently biased. It only affords an 
understanding of what happens “at the top” of a conversation and its most influ-
ential users and messages. This procedure is particularly suitable when it comes 
to highly public and visible conversations which tend to have a strong power law 
distribution and few communication centres (Gonzalez-Bailon et al. 2011; 2013). 
However, it is less desirable when analysing less topical conversations, and cannot 
be considered representative of the average of a given conversation. It is good to 
look at the peak of conversations, not to explore explore the “base”.
The second sampling procedures is random sampling. Random sampling is 
an already well-rehashed sampling strategy in the social sciences (Patton 2005) 
which involves selecting by chance a sub-set of a given population. This approach 
can be updated in a digital context by using a number of digital tools such as 
T-CAT tool developed by the Digital Methods Initiative (DMI) at the University of 
104 Paolo Gerbaudo
Amsterdam. This type of sampling strategy has many advantages. It can return a 
sample that can be considered representative of the totality of messages contained 
in a given dataset. As it is the case with random sampling more generally, the 
representativeness of the sample depends on the ratio between the size of the 
population analysed and the size of the dataset: the greater the ration, the greater 
the risk the sample may not be truly representative (Marshall 1996). This approach 
can be used if one is interested in getting a general sense of the type of messages 
to be found at the “base” of a conversation, including messages with relatively low 
popularity.
The third sampling strategy is zoom-in or peak sampling, a sampling proce-
dure that focuses on a given time in the conversation that is of particular interest 
to the researcher. These may include online reflections of “real world” events 
(a protest event, or an election) or moments of high user engagement on social 
media, or any other event considered of particular significance to understand the 
dynamics of the conversation. In my own research I adopted this strategy to look 
at the online preparation of major protest events in the Arab Spring and the Indig-
nados, where one could see the build-up of “digital enthusiasm” (Gerbaudo 2016). 
The advantage of zoom-in sampling is that it concentrates on moments that can be 
particularly revealing of a number of digital political dynamics, such as the nexus 
between social media and mobilization, or the social media reflection of offline 
events. Its main disadvantage is obviously its temporally limited coverage, and the 
fact that it thus returns a selective image of the conversation.
These three sampling procedures may be used in combination in the design 
of concrete research projects. For example, if one is interested in the way in which 
key top social media accounts reacted to a certain incident, zoom-in sampling and 
top sampling may be utilized in concert. The combination of two or more sampling 
procedures can also help more easily achieve the aim of “dataset reduction” 
which – as we have previously seen – is a condition of possibility for data herme-
neutics. As a rule of a thumb, based on my own experience conducting various 
digital politics research projects, when sampling from large datasets researchers 
should aim for a dataset numbering between 1000 and 100,000 words, which 
roughly equates between 40 and 4,000 tweets. This corpus is comparable in size 
to the ones traditionally studied by qualitative researchers – novels, films, political 
speeches, and the like – and small enough to be analysed in-depth by a “human 
operator” without falling prey to information overload.
Data close reading
Besides reducing the size of datasets, data hermeneutics entails a rethinking of 
the procedures traditionally used to analyse texts, for the purpose of adapting 
them to social media data analysis. This is what can be described as “close reading 
of data” or “data close reading”, an adaptation of close reading procedures to the 
From Data Analy t ics to Data Hermeneutics 105
specific conditions of social media communication. Hermeneutic researchers in 
the humanities and social sciences have typically studied texts, where the notion 
of text is not limited to written texts, but to all compositions, artworks, and social 
performances that can be understood as relatively discrete and coherent symbolic 
objects. In the case of literary criticism, typical texts have included novels, poems 
and films; in the social sciences, everyday behaviour, public performances and 
similar phenomena cam also be read as texts, often by producing textual accounts 
of them, such as ethnographic field-notes. In approaching these texts qualitative 
researchers have typically aimed for an in-depth engagement with the object of 
analysis. This stance is most clearly revealed by the term “close reading” used 
in literary criticism, which highlights how researchers are expected to explore 
texts in great detail, deciphering their complex and largely invisible deep meaning 
structures, and their connection to broader narratives and discourses.
The analytical process in qualitative research typically involves the use of 
various “coding” procedures whereby the researcher marks out certain portion 
of text as belonging to certain overarching themes, which can then be organized 
in broader categories and narratives. As argued by Johnny Saldana, a code in this 
context can be described as “a word or short phrase that symbolically assigns 
a summative, salient, essence-capturing, or evocative attribute for a portion of 
language-based or visual data” (2015: 4). This methodology has been applied to 
such diverse types of data as interviews, ethnographic observations, films, and 
newspaper articles. It is easy to understand how such approach can be used to 
analyse traditional texts, or social events and rituals that are relatively circum-
scribed, coherent and mostly linear in their form. But how does close reading 
functions when studying social media data?
Close data reading needs to approach social media data as “texts”, considering 
data-points, e. g. Facebook posts or tweets, as meaningful messages. Reflecting 
back on my experience conducting digital politics research I propose that close 
data reading should proceed in different steps, allowing to progressively “close 
in” on the deep meaning structures of posts and conversations. Selected social 
media datasets, obtained through the sampling procedures previously described, 
can be analysed in three steps which all imply a different “gaze” on research data: 
reading posts as rows in a dataset; as part of the conversation; as part of a certain 
social discourse.
First, Facebook posts and Twitter posts can be read as rows in a dataset, the 
practical form in which they manifest themselves to qualitative researchers at the 
start of a project. Researchers typically view such datasets in the form of a long 
table of rows, to be browsed through via spreadsheet software or via qualitative 
analysis software such as NVivo or ATLAS.ti. Thereby posts appear in their rawest 
form as mere rows listed in a table, chunks of interactions now stripped from 
their surrounding context of a live conversation, and which now appears in the 
dead form of data-points. A number of elements can however already be identified 
at this stage. Researchers can explore the topics discussed in each post, as well 
106 Paolo Gerbaudo
as the form they are expressed in, such as the use of a certain type of language, 
imagery, tone, or specific rhetorical figures. In the case of my own research on the 
2011 protest movements, this step of analysis already allowed to identify a number 
of significant features of online protest discourse including: the use of a conversa-
tional and exhortative language (2016); the popularity of different memes (2015); 
the adoption of the entire gamut of typical social media tropes from emoticons, 
to sloganeering shortened sentences; and in terms of content the abundance of 
reference to unifying subjects, as “the people”, the citizenry, or the 99 % (2014).
The second level of analysis involves repositioning posts in their original 
environment, reading them in the “live” context of the conversations. This step 
approaches social media data as the traces or inscriptions of a specific type of 
social text: a social media conversation. Yet, to better understand the meaning 
of posts it is necessary to approach these data-points not as texts in and of them-
selves, but rather as traces or “inscription” (Ricoeur 1972), that is the partial and 
largely arbitrary recordings of live social media conversations. To get a sense of 
how a message was perceived by internet users it is necessary to read it in the 
context of the conversations in which it was uttered. Practically, this can be done 
by browsing the associated web address and exploring the conversations it contrib-
uted to. It is useful for this purpose to create a folder of screenshots of conversa-
tions, one can then refer back to during the analysis. In my research, this step of 
analysis helped me to understand the intense emotional dialogue developed in 
such movements as the Egyptian 2011 uprising and the Spanish Indignados, and 
the way users reinforced positive messages channeled by activist social media, 
thus fuelling a wave of “digital enthusiasm” (2016). By reading posts in their lived 
contexts, we can understand the degree to which they are in-tune with the mood 
of internet communities, and the content of the dialogic discourse that emerges 
out social media interactions.
The third step for the close reading of data – the one which is more similar 
to traditional coding procedures – explores the discourses and deeper structures 
of meaning of a given post. Like with step two, the gist is to avoid reading posts 
in isolation. However, in this case the connections one needs to pay attention to 
are not just the ones with other messages within a specific conversation, but the 
links between the message and broader discourses that act as background and 
source of meaning for a given message. Important to this end is that researchers 
acquired an understanding of the context in which digital political phenomena 
operate and the subjective motives of participants as it can be secured by tapping 
into more traditional qualitative methods. In the case of my own research the 
most important traditional methods used in tandem with data hermeneutics were 
in-depth interviews with protest movement participants active on social media 
which allowed to gain an in-depth understanding of their subjective viewpoints 
and backgrounds. Further contextual sources of information that can aid interpre-
tation include ethnographic observation, archival documents, and similar sources 
of background information about a given phenomenon.
From Data Analy t ics to Data Hermeneutics 107
A practical example of how this cultural understanding can aid the work of 
social media interpretation can be offered considering the famous “Bring tent” 
tweet by the Canadian countercultural magazine Adbusters that first launched the 
Occupy Wall Street movement. In and of itself this tweet seems to have quite 
limited meaning, too short and cryptic to allow for analysis. Yet, seen in the 
cultural context of contemporary social movements, of post-modern neo-anar-
chism and of a nascent Occupy Wall Street movement, some interesting elements 
can be inferred from it. For someone who knows the context of this tweet, the 
nature, motivations and aims of the Occupy movement, the message resonates for 
a number of motives, including the appeal to participate in the Occupy movement; 
the action format of the occupation; the pragmatism of the movement; its distrust 
in traditional ideologies, and its connected emphasis on the political significance 
of concrete and practical activities, such as the act of bringing a tent and setting it 
up in a protest camp in a central public space.
This three-step of analysis allow researchers to move deeper and deeper in the 
web of meanings of social media conversations. At each step the research progres-
sively “closes in” on the interpretation of posts and social media conversations, 
while at the same time broadening the perspective of analysis, and paying atten-
tion to the general context. This procedure is evidently not an exact science. To 
quote Clifford Geertz is more of an artisanal process of “guessing at meanings, 
assessing the guesses, and drawing explanatory conclusions from the better 
guesses” (1973: 20). Researchers should verify the validity of their emerging inter-
pretation of meanings, by “triangulating”, that is comparing and contrasting the 
findings emerging from different posts, and progressively refining their interpre-
tation.
Conclusion
As I have sought to demonstrate in this article, a digital update of the hermeneutic 
method – data hermeneutics – is urgently needed to overcome the limits of data 
analytics. Crunching numbers in ever more powerful and sophisticated ways is 
not enough, if one is not able to fully explain the categories of analysis, and the 
significance, ramifications, and implications of findings. While “counting” – the 
core logic of data analytics – can no doubt be useful in gaining an overview of 
the structure and dynamics of conversations, a real understanding of their moti-
vations and meanings can be achieved through the sampling and close reading 
procedures proposed in this article. To make sense of online political phenomena, 
we cannot approach them as merely structures of behaviour to be studied math-
ematically. We also need to approach them as texts, webs of meanings which 
researchers need to slowly acquaint themselves with before they can claim to know 
them and comprehend them. Research based on data metrics has been weak in 
108 Paolo Gerbaudo
depth of understanding and contextual knowledge, and it is precisely in these 
areas that data hermeneutics can give a timely contribution.
The idea of data hermeneutics put forward in this article is in part a polem-
ical response to the current dominance of data analytics in social science, and a 
reassertion of the importance of qualitative methods. However, when it comes to 
designing concrete research projects, data hermeneutics should be understood 
as non-exclusive. Data analytics and data hermeneutics should often be used in 
tandem, as part of a “quanti-qualitative” approach (Venturini/Latour 2008), with 
various iterations between the two. Data analytics is particularly precious in the 
initial scoping of a research project, since it allows to gain an overview of conversa-
tion structures, from which researchers can then turn towards data hermeneutics, 
looking in more depth at the content of specific messages, and specific excerpts of a 
given conversation. Data hermeneutics can improve on quantitative approaches by 
providing a clearer understanding of various categories utilized in such analyses. 
For example in analyzing the language utilized in a given conversation one can 
combine qualitative and quantitative methods in effective ways, by identifying the 
most recurrent terms and then looking at how these terms are concretely used 
in a number of expressions, and finally investigating the motivations underlying 
these expressions. Furthermore, statistical testing procedures from data analytics 
can provide indications on where to go deeper, for example by highlighting the 
peaks of activity in a certain dataset and thus suggesting where a more in-depth 
quantitative analysis should be conducted.
Thus, what is required is not the wholesale substitution of data analytics with 
data hermeneutics, but a methodological rebalancing the ultimately can benefit 
both qualitative and quantitative research, and more generally allow for a more 
holistic and better contextualized understanding of contemporary politics.
References
Anderson, Chris (2008): “The end of theory: The data deluge makes the scientific 
method obsolete.” In: Wired magazine 16/7 (https://www.wired.com/2008/06/
pb-theory/).
Androutsopoulos, Jannis/Beißwenger, Michael (2008): “Introduction: Data and 
methods in computer-mediated discourse analysis.” In: Language@Internet 
5/2, pp. 1–7.
Batrinca, Bogdan/Treleaven, Philip C. (2015): “Social media analytics: a survey of 
techniques, tools and platforms.” In: AI & Society, 30/1, pp. 89–116.
Benjamin, Walter (1977): The origin of German tragic drama, London: NLB.
Bleicher, Josef (1980): Contemporary hermeneutics: Hermeneutics as method, 
philosophy and critique, London: Routledge & Kegan Paul.
Bollier, David/Firestone, Charles M. (2010): The promise and peril of big data, 
Washington, DC: Aspen Institute.
From Data Analy t ics to Data Hermeneutics 109
boyd, dana/Crawford, Kate (2012): “Critical questions for big data: Provocations 
for a cultural, technological, and scholarly phenomenon.” In: Information, 
Communication & Society 15/5, pp. 662–679.
Bressler, Charles E. (1999): Literary criticism: An introduction to theory and prac-
tice, Upper Saddle River, NJ: Prentice Hall.
Burrows, Roger/Savage, Michael (2007): “The coming crisis of empirical sociol-
ogy.” In: Sociology 41/5, pp. 885–899.
Capurro, Robert (2010): “Digital hermeneutics: An outline.” In: AI & Society 25/1, 
pp. 35–42.
Cha, Meyung/Haddadi, Hamed/Benevenuto, Fabricio/Gummadi, P. Krishna 
(2010): “Measuring User Influence in Twitter: The Million Follower Fallacy.” 
In: ICWSM 10/30, pp. 10–17.
Chen, Hsinchun/Chiang, Roger H./Storey, Veda C. (2012): “Business Intelligence 
and Analytics: From Big Data to Big Impact.” In: MIS Quarterly 36/4, pp. 1165–
1188.
Conover, Michael D./Ferrara, Emilio/Menczer, Filippo/Flammini, Alessandro 
(2013): “The digital evolution of occupy wall street.” In: PloS one 8/5.
Dubois, Elizabeth/Gaffney, Devin (2014): “The multiple facets of influence: Iden-
tifying political influentials and opinion leaders on Twitter.” In: American 
Behavioral Scientist 58/10, pp. 1260–1277.
Friese, Susanne (2014): Qualitative data analysis with ATLAS.ti, London: Sage.
Gadamer, Hans-Georg (2004): Truth and Method, New York: Bloomsbury Pub-
lishing.
Giddens, Anthony (1984): The constitution of society: Outline of the theory of 
structuration, Berkeley: University of California Press.
Gerbaudo, Paolo (2012): Tweets and the streets: Social media and contemporary 
activism, London: Pluto Press.
Gerbaudo, P. (2014): “The ‘Movements of the Squares’ and the Contested Resur-
gence of the ‘Sovereign People’ in Contemporary Protest Culture”. In: SSRN 
2439359.
Gerbaudo, Paolo (2015): “Protest avatars as memetic signifiers: political profile pic-
tures and the construction of collective identity on social media in the 2011 
protest wave.” In: Information, Communication & Society 18/8, pp. 916–929.
Gerbaudo, Paolo (2016): “Rousing the Facebook Crowd: Digital Enthusiasm and 
Emotional Contagion in the 2011 Protests in Egypt and Spain.” In: Interna-
tional Journal of Communication 10/20, pp. 254–273.
Gibbs, Graham R. (2002): Qualitative data analysis: Explorations with NVivo, 
Buckingham: Open University Press.
Gitelman, Lisa (2013): “Raw data” is an Oxymoron, Cambridge, MA: MIT Press.
González-Bailón, Sandra/Borge-Holthoefer, Javier/Rivero, Alejandro/Moreno, 
Yamir (2011): “The dynamics of protest recruitment through an online 
network.” In: Scientific Reports 1/197.
110 Paolo Gerbaudo
González-Bailón, Sandra/Borge-Holthoefer, Javier/Moreno, Yamir (2013): “Broad-
casters and hidden influentials in online protest diffusion.” In: American 
Behavioral Scientist 57/7, pp. 920–942.
Grimmer, Justin/Stewart, Brandon M. (2013): “Text as data: The promise and 
pitfalls of automatic content analysis methods for political texts.” In: Political 
Analysis 21/3, pp. 267–297.
Harnad, Stevan (1990): “Against computational hermeneutics.” In: Social Episte-
mology 4, pp. 167–172.
Husserl, Edmund (1970): The crisis of European sciences and transcendental 
phenomenology: An introduction to phenomenological philosophy, Evanston: 
Northwestern University Press.
Jameson, Fredric (1981): The political unconscious: literature as a socially symbolic 
act, London: Methuen.
Kelsey, Darren/Bennett, Lucy (2014): “Discipline and resistance on social media: 
Discourse, power and context in the Paul Chambers ‘Twitter Joke Trial’.” In: 
Discourse, Context & Media 3, pp. 37–45.
Kitchin, Robert (2014): The data revolution: Big data, open data, data infrastruc-
tures and their consequences, London: Sage.
Lazer, David/Kennedy, Ryan/King, Gary/Vespignani, Alessandro (2014): “The 
parable of Google Flu: traps in big data analysis.” In: Science 343/6176, 
pp. 1203–1205.
Marshall, Martin N. (1996): “Sampling for qualitative research.” In: Family prac-
tice 13/6, pp. 522–526.
O’Neil, Cathy (2016): Weapons of math destruction: How big data increases 
inequality and threatens democracy, New York: Crown.
Patton, Michael Quinn (2005): Qualitative research, Chichester: John Wiley  & 
Sons.
Ricoeur, Paul (1971): “The model of the text: Meaningful action considered as a 
text.” In: Social research 38, pp. 529–562.
Rogers, Richard (2013): Digital methods, Cambridge, MA: MIT Press.
Rowley, Jennifer E. (2007): “The wisdom hierarchy: representations of the DIKW 
hierarchy.” In: Journal of Information Science, 33/2, pp. 163–180.
Saldaña, John (2015): The coding manual for qualitative researchers, London: Sage.
Stephansen, Hilde C./Couldry, Nick (2014): “Understanding micro-processes of 
community building and mutual learning on Twitter: A ‘small data’ approach.” 
In: Information, Communication & Society 17/10, pp. 1212–1227.
Strauss, Anselm/Corbin, Juliet (1990): Basics of qualitative research, Newbury 
Park, CA: Sage.
Theocharis, Yannis (2013): “The wealth of (occupation) networks? Communica-
tion patterns and information distribution in a Twitter protest network.” In: 
Journal of Information Technology & Politics 10/1, pp. 35–56.
From Data Analy t ics to Data Hermeneutics 111
Tinati, Ramine/Halford, Susan/Carr, Leslie/Pope, Catherine (2014): “Big data: 
methodological challenges and approaches for sociological analysis.” In: Soci-
ology, 0038038513511561.
Tucker, William T. (1965): “Max Weber’s ‘Verstehen’.” In: The Sociological Quar-
terly 6/2, pp. 157–165.
Tufekci, Zeynep (2014): “Big questions for social media big data: Representative-
ness, validity and other methodological pitfalls.” In: arXiv, 1403.7400.
van Dijck, José (2014): “Datafication, dataism and dataveillance: Big Data between 
scientific paradigm and ideology.” In: Surveillance & Society 12/2.
Venturini, Tommaso/Latour, Bruno (2010): “The social fabric: Digital traces 
and quali-quantitative methods.” In: Proceedings of Future En Seine 2009, 
pp. 87–101.
Weber, Max (1978): Economy and society: An outline of interpretive sociology, 
Berkeley: University of California Press.
Weber, Max (1981): “Some Categories of Interpretive Sociology.” In: The Sociologi-
cal Quarterly 22/2, pp. 151–180.
Wise, Alyssa F./Shaffer, David W. (2015): “Why theory matters more than ever in 
the age of big data.” In: Journal of Learning Analytics 2/2, pp. 5–13.
Wodak, Ruth/Krzyzanowski, Michal (2008): Qualitative discourse analysis in the 
Social Sciences, London: Palgrave Macmillan.
Wolfreys, Julian (2000): Readings: Acts of close reading in literary theory, Edin-
burgh: Edinburgh University Press.