Data and the Humanities: Digital Humanities and Disrupting Disciplinary Boundaries

Below is a working paper I wrote for my graduate course on the History of Data Science led by Professor Cathryn Carson Spring 2015 at UC Berkeley. You can see the bibliography of readings from our class on my Zotero library.


What does data mean to a humanist? What would it mean to datify humanistic inquiry? This paper examines the recent literature on data within the humanities and critical debates about ‘digital humanities’. As I demonstrate in this paper, the debates around data within the humanities fits within three interlocking frameworks: first, the tensions between relevancy and distinction within the humanities in relation to the sciences; second, the boundary work of defining and distinguishing ‘digital humanities’; and third, the shifts in methods occurring across all disciplines around data, data intensive sciences, mixed methods, and scholarly communication.

I recognize that debates around disciplinary identities and methods can transform into polemical battles around academic territory. I seek to immerse myself in these debates to understand the contours and ridges and to understand the boundary making processes currently manifesting within the humanities. Observing historical patterns within these debates, I conclude that a historiography of data science, data intensive humanities, and digital humanities is inevitably a narrative about disruption of existing disciplinary boundaries.

Humanities as the Non-Science: Digital Humanities and the Case of History

This section explores the origins and debates within Digital Humanities and Digital History drawn from two canonical digital humanities texts, A Companion to Digital Humanities edited by Susan Schreibman, Ray Siemens, and John Unsworth (2004) and Debates in the Digital Humanities edited by Matthew K. Gold (2012). Rather than a polemical narrative of digital humanities and its definitions, I highlight a few of the main debates within the history of digital humanities around the overlapping questions of method, epistemology, and institutional support. In particular I demonstrate how these questions parallel a larger debate on the disciplinary and institutional boundary making between the humanities and sciences. By situating the history of digital humanities and digital history within a longer history of academic disciplines and methods, this analysis provides insight into larger questions such as institutional support of the humanities, the relevance of humanistic inquiry, the changing role and landscape of the academy, and the perceived intellectual imperialism from the sciences. I focus my analysis of scholars who directly engage with and draw comparisons between the sciences and the humanities in order to highlight the productive intellectual conversations that can emerge through comparative rather than combative discussions between the sciences and humanities.

Throughout the twentieth century, the epistemological and methodological waves and turns within the discipline of history revolved around the questions of qualitative or quantitative methods, verifiability and use of evidence, and concept of truth, objectivity, and interpretation.  In Mixed Methods of Social Inquiry, Jennifer Greene describes the qualitative and quantitative debate comprised of two interrelated discussions—on philosophical epistemology or technical method.[1] What is the nature of the ‘social world’ and how does one accumulate ‘social knowledge’? For the post-positivist, the social world exists and is real and observable while for the interpretist this world is constructed and ascribed with meaning.[2] These meanings are context-specific and multiple rather than universal. Furthermore, scholars debate if social knowledge and understanding of humans’ lived experiences can be generalizable or contingent. A post-positivist would find social knowledge generalizable and propositional while an interpretist or constructivist would create social knowledge that was “multiplistic, dynamic, and contingent.” These debates around the construction or understanding of the social world are core to the understanding of the challenges to and within digital humanities and digital history.

Digital history has a tenuous history, emerging out of the philosophical and methodological back and forth between quantitative and qualitative inquiry. In “Computing and the Historical Imagination,” William G. Thomas II describes the three phases of quantitative history and computing technologies within history: the 1940s and the creation of large datasets and use of mathematical functions, the 1960s “new” social, economic, and political history associated with the emergence of social sciences and cliometrics, and the post-1980s phase after the revolution in personal computing, the internet, and commercial database programs. [3] Thomas dedicates a significant portion of his analysis to the debates on and backlash against quantitative history and cliometrics in the 1960s and 1970s.[4]

Rather than present the story of computing within the discipline of history as a back and forth polemical critique of quantitative methods, Thomas analyzes how historians have made use of and adapted computing processes to historical inquiry. In this narrative, historians are not simply determined by technology. For example, in the 1980s and 1990s, historians such as Manfred Thaller critiqued the application of commercial databases to the discipline of history that often contains “fuzzy” and incomplete data. Attempting to isolate the technological environment or software from the “knowledge environment” of the source, Thaller drew attention to the ways in which technologies structured and constrained epistemologies. Although not thoroughly explored in this book chapter, this theme is an important reoccurring debate in media studies and new fields of ‘critical inquiry’ in digital humanities up to the present. I will return to this topic at the end of this paper.

Historians are diversely multifaceted with interdisciplinary tendencies but also hesitant to adopt ‘new and fleeting trends’ within academic inquiry. Methodological turns in history such as quantitative history, the Annales school, humanities computing, and cliometrics have profoundly contributed to the integration of numbers and statistics within some historical research. Yet the cultural turn and critical challenges to positivism in the 1980’s and 1990’s have also undermined quantitative approaches within history. These debates on historical inquiry often revolve around a diametrically opposed binary of qualitative or quantitative methods. Additionally, history straddles the disciplinary line between the humanities and social sciences. This dual identity can make history writing inclusive, yet can also demonstrate its internal contradictions. Is this historical scholarship more literary or more social scientific? Does it stick to a narrative form and close reading or is there more attention to method, case studies, and statistical evidence?

Along with the qualitative and quantitative debate, the ‘objectivity question’ is another central point of critique in the discussion of computing technology within history.[5] Interestingly, computing technology and digital humanities can fall to either side of the objectivity debate: digital history fragments narrative for political and popular uses of history or digital history purports the accumulation of facts and objective realities. Thomas notes how amongst British historians, computing technology is seen as a “handmaiden of postmodernism, as a witless accomplice in the collapse of narrative, and as the silent killer of history’s obligation to truth and objectivity.”[6]

In the first part of A Companion to Digital Humanities, Susan Hockey provides a similar history of humanities computing, nowadays known as digital humanities. Hockey begins with the early uses of computing, concordances, and frequency counts in humanities from the 1950s to 1970s. By the 1960s, international associations, journals, and centers were formed such as the Association for Literary and Linguistic Computing, the Association for Computers and the Humanities was formed, the journal “Computers and Humanities”, and the Centre for Literary and Linguistic Computing at Cambridge. Hockey describes the 1970s to 1980s as a period of consolidation of methods, tools. Hockey explains that by the 1980s to 1990s, the advent of the personal computer and electronic mail shaped the development of the humanities computing research method as well as scholarly communications and debate. Out of these developments, the Text Encoding Initiative (TEI) emerged as one of the first “systematic attempt to categorize and define all the features within humanities texts that might interest scholars.” The development of TEI and mark up standards reflect the concerns within humanities computing about how to represent, analyze, and disseminate electronic texts. Throughout the 1990s, the central debates in the humanities revolved around electronic texts, archives, and digital libraries. Hockey’s narrative of humanities computing is functionalist and determined by technological innovations and institutions rather than epistemological or philosophical trends.

In contrast, in the think piece “Where is Cultural Criticism in the Digital Humanities?” Alan Liu points to the important juncture of May 1968, the rise of cultural criticism and cultural critique within the humanities, and the influence on ‘close’ and ‘distant’ methods of reading.[7] “May 1968 marked the return of the repressed: a surge in postmodern, rather than modern, theories of discourse and culture that identified the human as ipso facto collective and systemic…it seemed clear that humanity was congenitally structural, epistemic, class based, identity-group based (gendered, racial, ethnic), and so on. Currently, distant reading is a catch-all for that.” According to Liu, digital humanities “break the détente” between close and distant, cultural criticism and critical methods and have become instrumental to the work of “mainstream humanists.” Alan Liu’s deep engagement with theoretical and philosophical turns within literary studies raises productive questions on the future of digital humanities and relationship to mainstream humanists. Liu calls digital humanists to dialogue with science and technology studies and the work of Lorraine Daston, Peter Galison, and Bruon Latour. STS provides digital humanities an enriching discussion about the inextricable web of man and machine, instruments and interpreters.

Another way to engage with the questions of methods, use of evidence, and objectivity is the look more explicitly at how humanists and historians compare themselves with the sciences. Inspired by the work of early twentieth century Marxist historians Marc Bloch and E.H. Carr, historian John Lewis Gaddis revisits the meaning of the craft of history in The Landscape of History: How Historians Map the Past published in 2004.[8] Gaddis questions what it means to be scientific and historical, and he argues the tremendous resemblance between the two fields. Gaddis argues that both the historian and the scientist (particularly natural scientist) are similar in their use of evidence, notion of causation, seeking of patterns, and propositional or theoretical understandings of reality. The Landscape of History’s narrative is framed by the extended metaphor of history as a landscape of the past—a historian paints his/her cartographic “best fit” representation of the past using the skills of articulation, context, and accessibility to their audience. This “cartographic verification” is another quintessential argument throughout the book—history is not permanently and objectively defined, but also contrary to postmodernist’s rejection of truth, continuities and patterns do in fact exist throughout history.[9] Like natural scientists, historians thus provide a provisional and temporary ‘generalization’ of the world based on the existing data and observations.

Gaddis also raises an important question regarding the hierarchy of knowledge production between history (which he characterizes as part of the humanities) and sciences. Where does the pressure for generalized particularization to be more “scientific” originate? Why does scientific data and often overgeneralized theories provide a more “valid” argument than the acceptance of variability and exceptionality characteristic of humanistic and historical inquiry? Gaddis leaves this question overall unanswered. However, his entire thesis rests upon a certain epistemological move that accomplishes precisely this: by emphasizing the ‘scientific’ similarities within history, Gaddis seems to justify and center the legitimacy of history against the measure of scientific credibility.

Digital historian at the Institute of Museum and Library Services, Trevor Owens opens his commentary “Discovery and Justification are Different: Notes on Science-ing the Humanities” with an imaginary but familiar conversation between a computer scientist and humanist. This conversation illustrates the seemingly insurmountable divide between science and the humanities but in fact reveals that both scientists and humanists must distinguish and clarify the idea of ‘evidence.’  “What of all this counts for what? What can you say based on the results of a given technique?”

Owens proposes an alternative view to the preoccupation of the practice of science with purely positivism. Instead Owen argues that both science and humanities are also about interpretation and evidence, two aspects essential to the methodology of the humanities and history. Owens explains that both disciplines are “about generating new ideas and then exploring evidence to see to what extent we can justify our interpretations over a range of other potential interpretations.” Although brief, Owen’s mode of inquiry subverts notions of scientific imperialism upon the humanities and considers how both fields can learn and build upon one another. This turn in thinking about the sciences and humanities is key to the values within digital humanities as a challenge to existing notions of method and disciplinary boundaries. Lisa Spiro attempts to capture the spirit and values of digital humanities as a community defined by the following: openness, collaboration, collegiality and connectedness, diversity, and experimentation. These characteristics draw from both the humanities and sciences, but also reflect the changes of this age of data and the internet.

Broadly defined digital humanities is at the intersection of humanities and computing. This open-ended definition results in confusion and debate but also leaves room for inclusion and experimentation. Specific manifestations of digital humanities include a combination of the following: the use of computational methods for research, the recognition and use of ‘new media’, the integration of technology within teaching and exhibition, and the changes in scholarly communication in the digital age.[10] The last aspect could expand to include the use of electronic publishing and communication of research and teaching. Furthermore, digital humanities has often been defined as an emergent field and body of inquiry that transgresses disciplinary boundaries of the sciences and humanities. In order to understand its position, I situated my historiography of digital humanities within specific attempts to bridge the two sides of campus.


Data, Data Intensive Sciences, and Data for Humanists and Historians

In March 2015, The International Journal of Humanities and Arts Computing called for submissions on “The Future of Digital Methods for Complex Datasets.” The journal requests submissions that discuss the cohesiveness of ‘digital methodology’ as a thing in itself with particular processes and workflows outside of a specific project. The call for proposals continues to pose the question, “is Digital Methodology for the Humanities & Arts something distinct from data science or other computational methods? Or alternately, has the underlying reliance on data forged a common methodology across previously distinct disciplines?” This section attempts to answer these questions by engaging with the literature on data intensive sciences to conceptualize how the humanist and historian uses ‘data.’ Digital humanities has been described as data science on the other side of campus. I examine specifically how data sciences have influenced digital humanities. I also highlight certain discussions on the anxieties around data in order to undermine the popular critique of data science as unproblematic (where data intensive scholarship is purely positivistic and without critical challenge of the power structures data curation and data stewardship.) I focus on three ways that data is used or can be used by historians and humanists: discovery, reflections on methodology, and scholarly communication.

In The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences, Rob Kitchin lays out the landscape of ‘big data’ and data science. [11] In addition to the “Three V’s” that characterize big data, (Volume, Velocity, and Variety), Kitchin adds that big data is exhaustive, fine grained, relational, flexible, and scalable. Kitchin provides a textbook analysis of the social, cultural, and political enablers of big data and the “data revolution” such as advances in computation, networking, automated data production and collection as well as governmental surveillance, social media, and citizen science. Kitchin thoroughly presents the pervasiveness of big data in aspects of social and political life, yet does not justify nor elaborate the ‘revolutionary’ aspects of this moment in time. A tone of technological determinism and data imperialism underwrites this book, presenting an image of tremendous patterns of change without a clear vision of the direction of the  ‘revolution’ or the object of disruption.

In Chapter 8 on the sciences, social sciences, and humanities, Kitchin describes the recent paradigms of data intensive science as the re-emergence of empirical thinking, shift towards theory-less driven exploratory analysis, and the data ‘speaking for itself.” Kitchin’s generalizations of data intensive sciences appears overwrought and superficial only to prove his point that the “deification of data can lead to data determinism.” Nevertheless, Kitchin raises the interesting question of the new legitimate producer of knowledge—the collector or the statistician.[12]

Kitchin turns his investigation to digital humanities and also observes similar trends as within the sciences—the turn towards the methodological rigor (and the scientific method) and positivism or post-positivism. However, Kitchin hesitates to conclude if ‘big data’ could help to decipher the depth and complexity of human social systems and ‘meaning.’ Kitchin argues, “the digital humanities approach ‘opens up the possibility for wide scale “surface” studies (across people) as well as, rather than instead of, “depth” studies (focused on a small number of individuals.”[13] Kitchin continues to state that “exposed data allow us to approach interesting questions from multiple and interdisciplinary points of view, in the way that citations to textual sources do not. Again, we are arguing not for wholly replacing close readings and textual analysis in historical research but, rather, for complementing them with our explorations of data.”[14] The ‘mixed method’ use of surface and depth studies and qualitative and quantitative methods is particularly important within the humanities where data is often fuzzy and fragmented. Furthermore, the attention to interpretation and contingency characteristic of history and the humanities is conducive to triangulation and mixed methods where a plurality of traditions can provide a more holistic understanding.

Furthermore, Kitchin emphasizes that computational social sciences provides insights for the starting point rather than an end point of a study. The stage of which to use computational or digital methods has become another issue of debate, particularly in topic modeling and exploratory data analysis.[15] According to Trevor Owens, digital humanists are more receptive to topic modeling in ‘discovery’ stages of projects, than as the bulk of an evidentiary claim or justification.[16] Trevor Owens and Fred Gibbs emphasize that the use of ‘data’ as evidence must proceed through rigorous hermeneutical tactics in a similar way as a text or artifact.  “Therefore, data is an artifact or a text that can hold the same potential evidentiary value as any other kind of artifact. That is, scholars can uncover information, facts, figures, perspectives, meanings, and traces of thoughts and ideas through the analysis, interpretation, exploration, and engagement with data, which in turn can be deployed as evidence to support all manner of claims and arguments. I contend that data is not a kind of evidence; it is a potential source of information that can hold evidentiary value.”[17]

Kitchin also states that developments in data preparation and pre-processing have decreased the labor and time spent on the coding phase while computational social scientists and humanists can now focus on the hypothesis and analysis stages. However, I disagree with this generalization—given the complexity of humanistic and social data, the data preparation stage is the most time consuming and also can be incredibly insightful. In my digital humanities project presented at the DH Faire at UC Berkeley in April 2015, I demonstrate how the time and labor- intensive phase of data preparation forced me to ruminate and read a source carefully and systematically. This process revealed new insight on the underlying motivations, method of production, and intended audience of my original source text. In this way, the data collection and preparation phase can help to generate and refine research questions and hypotheses.

Scholarly Communication – Methods, Collaboration, Interoperable Data & Literature

In the volume Writing History in the Digital Age, Part 4 “Writing with the Needles from Your Haystack” discusses the ways in which tools change the writing of history itself. In “The Hermeneutics of Data and Historical Writing”, Frederick W. Gibbs and Trevor J. Owens recognize the increasing role of data within historical research and call for more methodological transparency in historical writing.[18] The suggestions of the authors seem straightforward and simple, but in fact is a disruption of the historical form of scholarly communication: the monograph.  What would scholarship look like if readers could refer back to historians’ datasets, verify results, suggest alternative readings, and offer new insights?

This challenge to scholarship parallels the discussions within the sciences about scholarly communication and communities of collaboration. Jim Gray introduces similar concepts in his 2007 talk to the National Research Council – Computer Science and Telecommunications Board on the “fourth paradigm for scientific exploration” and effects on scholarly communication. Peter Fox and James Hendler describe the fourth paradigm of data driven science as the integration of significant data sources into the practice of the scientific method. Gray adds that in this era of data exploration, scientists examine their data later in the pipeline of data collection and preparation.[19] Gray briefly discusses a vision of an internet world in which “scientific data and literature interoperate.” According to Gray, readers would be able to examine scientific literature and directly engage with the data. This would increase “’information velocity’ of the sciences and will improve the scientific productivity of researchers.”[20] Gray also alludes to institutional changes where funding agencies require grantees to upload and sharing their data to the public.

Source: Anthony J. G Hey, Stewart Tansley, and Kristin Michele Tolle, The Fourth Paradigm: Data-Intensive Scientific Discovery (Redmond, Wash.: Microsoft Research, 2009).

Pg. XXV.

This abstract vision of interoperable data and literature is explored in Part 4 “Scholarly Communication” in The Fourth Paradigm with specific case studies and methodological suggestions. Clifford Lynch of the Coalition for Networked Information examines the longer history and culture of scientific communities, scholarly communication, and the scientific record. The scientific record is used to communicate findings, build up communities and collaborations, raise and resolve disagreements, establish precedence, and most importantly to allow for the ‘reproducibility’ of scientific results.[21] Lynch recognizes the tensions between transparency and self-protection of intellectual property within the scientific community, but emphasizes that ‘reproducibility’ of results distinguishes science. Lynch effectively explains how reproducibility of results underscores why it is important to make available data on an experiment.[22] Lynch adds that with data intensive sciences of the fourth paradigm, born digital data and form of scientific papers provides the structural streamline to seamlessly weave together data and literature and allow scientists to understand results and reproduce it.[23]

The call for “interoperability between literature and data” appears within Gibbs and Owens’ work. Interoperability between literature and data ultimately necessitate the scholar to make transparent their methods and data sources. Gibbs and Owens discusses the critical need for historians to “explicate the process of interfacing with, exploring, and then making sense of historical sources in a fundamentally digital form—that is, the hermeneutics of data.”[24] The “hermeneutics of data” include discussion of data queries, tool workflows, production and interpretation of data visualizations. The authors argue that historical writing must now include a critical, explicit, and intentional discussion and engagement with data. In other words, historical writing must be both a “product and process of understanding.”[25] Gibbs and Owens contend, “Thus our methodologies might not be as deliberate or as linear as they have been in the past. This means we need more explicit and careful (if not playful) ways of writing about them.”

Not only should history writing be a narrative product but also must disclose the processes, temporary impasses, wrong directions and alternative visions that characterize the messy exploration of historical research.[26]

The push towards transparency, communication of failure, and the life history of intellectual thinking draws from the sciences (Philip Guo’s “Ph.D. in a box”, the Journal of Negative Results, and computer science ‘cultures of play’ and ‘cultures of building’) and also art and media concepts of sketchbooks and drawing boards.[27] Owens and Gibbs emphasizes the importance and benefits of communicating methods and data: “Beyond explicit tutorials, there are several key advantages in foregrounding our work with data: (1) it allows others to verify historical claims, (2) it is instructive as part of teaching and exposing historical research practices, (3) it allows us to keep pace with changing tools and ways of using them. Besides, openness has long been part of the ethos of the humanities, and humanists continually argue that we should embrace more public modes of writing and thinking, as a way to challenge the kind of work that scholars do.”[28] Within these proposals to communicate methods and the productive quality of failure, there remain institutional and professional anxieties about laying bare methods. Early communication of methods and data could lead to premature exposure of flawed methods or certain vulnerabilities regarding intellectual ideas. At the same time, if digital humanities is truly to be subversive and partake in the ‘revolutionary’ aspects of data intensive scholarship, then there needs to be a shift in mentality around sharing, collaboration, and communication that is supportive and productive.

By Way of Conclusion: Lingering Questions and Directions within Digital Humanities

Data Driven Humanistic Investigations and the Quantitative v. Qualitative Debate

Like other research projects, one of the challenging facets of a quantitative and digital approach is the balance between a data driven question or topical research question. In other words, does the detailed, miniscule data and its variations and peculiarities drive the process and ultimately shape the research question? Or does the research question dictate the focus of the data? Within my own work, it was often difficult to stay focused on the ‘so what’ question. The technical and technological costs of digital humanities projects and data intensive scholarship requires a constant reevaluation of methods and underlying hypothesis. This regular check in is an important practice, especially where shiny tools can distract from the overall mission of constructing an argument. In this way, the reminder to triangulate methods between narrative and evidence can ground a project.

Even in the case of data science, lie case studies of provocative and effective mixed-methods approaches. In the hands on and reflective manual Doing Data Science, authors Cathy O’Neil and Rachel Schutt detail the work of industry data scientists and present their actual projects. In doing this the authors move beyond the vague polemics of terms and methods and demonstrate the importance of both the human and machine within data science. Chapter 7 focuses on extracting meaning from data and presents David Huffaker and Google’s hybrid approach to social research. Described as the “effective marriages of both qualitative and quantitative research, and of big and little data”, this case demonstrates the iterative processes and mixed methods required of complex datasets.[29]

In my own research attempts to triangulate methods, I wrote a history of the book paper in Spring 2015 with both qualitative and quantitative approaches titled “Moving Towards Meaning: A Methodological Experiment of Computational Humanities of Technique du Peuple Annamite.” I engaged in a modular approach to the text in order to consider various propositions and conditions of which the text was produced. My method draws from Andrew Piper’s approach to computational modeling of the plot structures within the modern novel.[30] My paper and exploration of various hypotheses on the production of Technique is “a circular process” that requires the movement between hypotheses and methods. Piper describes this movement as one “gradually approaching some imaginary, yet never attainable centre, one that oscillates between both quantitative and qualitative stances (distant and close practices of reading).”[31] Piper develops a polemic on the binaries of distant and close reading, quantitative and qualitative methods and emphasizes the conversional, and generative process of research that moves between methods and interpretations. Specifically, Piper discusses the practice of building models and applies the stages of ‘models’ to his own research. Moving from belief to measurement, to discovery (validation), and back and forth through iteration, these analytical stages require both quantitative and qualitative methods. Piper concludes that “the validation of the model (close reading) is not an end in itself. Rather, it is the beginning of further testing (distant reading).”

Critical views on data and computation: What do the humanities provide to science?

Recently, more scholars have drawn attention to the social and cultural hierarchies of knowledge produced and reproduced within computing technology. Tara Mcpherson analyzes how race relations and certain worldviews manifest in computational logic of “controlling complexity” and privileging clarity, efficiency, and standardization. Mcpherson argues that the writing off code reflect certain cultures of Gramscian “common sense.” For example Mcpherson traces the emergence and advancement of a culture of modularity in the 1960’s take hold in software design, in politics, and in the production of knowledge in the university such as in the humanities.

Mcpherson’s approach of situating computing culture within historical and socio-political contexts is an important and novel analytical lens into digital technologies and media, also referred to as the emergent field of ‘critical code studies.’ Tara Mcpherson emphatically states the importance of this new conversation in the conclusion of her article: “The lack of intellectual generosity across our fields and departments only reinforces the divide-and-conquer mentality that the most dangerous aspects of modularity underwrite. We must develop common languages that link the study of code and culture.”[32]

As demonstrated in this multifaceted reading of contingency and situatedness, a critical understanding of data can move productively beyond purely constructivist or functionalist interpretations. The strength of the humanities is its intrinsic attention to context, complexity, and multiplicity. Furthermore, the humanities provides a constructivist and interpretist lens towards knowledge production, provenance, and assemblage that disrupts “data deification.” This contributes a crucial perspective to the current conversations around data and digital methodologies and thus the humanities must play an active role. Yet to do this requires that the humanist deeply engage with and understand the “logics of systems” in order to critique its structures and representations. Scholars must be digital archivists and curators of their work, to understand how to use “born digital” objects for evidence and organize datasets and create databases. But scholars also need to be more than users of these systems and approaches. Responding to the decades of hyperspecialization in the academy, Mcpherson calls for the training of a new generation of broad and diversely trained hybrid scholar such as the critical digital humanist who conceptually understands the technical front end and back end and with a lens of critical humanistic inquiry and challenge to technological determinism. “Politically committed academics with humanities skill sets must engage technology and its production not simply as an object of our scorn, critique, or fascination but as a productive and generative space that is always emergent and never fully determined.”[33]



Forster, Chris. “I’m Chris. Where Am I Wrong?” Accessed May 13, 2015.

Gaddis, John Lewis. The Landscape of History:  How Historians Map the Past. New York: Oxford University Press, 2004.

Gibbs, Fred, and Trevor J. Owens. “The Hermeneutics of Data and Historical Writing.” edited by Kristen Nawrotzki; Jack Dougherty, 2013.–writing-history-in-the-digital-age?g=dculture;rgn=div1;view=fulltext;xc=1#7.3.

Gray, Jim. “Jim Gray on eScience: A Transformed Scientific Method.” edited by Kristin Michele Tolle, Anthony J. G Hey, and Stewart Tansley, xvii – xxxi. Redmond, Wash.: Microsoft Research, 2009.

Greene, Jennifer C. Mixed Methods in Social Inquiry. San Francisco, CA: Jossey-Bass, 2007.

Guo, Philip Jia. “Software Tools to Facilitate Research Programming,” 2012.

Hey, Anthony J. G, Stewart Tansley, and Kristin Michele Tolle. The Fourth Paradigm: Data-Intensive Scientific Discovery. Redmond, Wash.: Microsoft Research, 2009.

“Journal of Negative Results in BioMedicine | About.” Accessed January 29, 2015.

Kitchin, Rob. The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences. SAGE, 2014.

Liu, Alan. “Where Is Cultural Criticism in the Digital Humanities,” 2012.

Lynch, Anthony J. G. “Jim Gray’s Fourth Paradigm and the Construction of the Scientific Record.” edited by Stewart Tansley, Kristin Michele Tolle, and Anthony J. G Hey, 177–83. Redmond, Wash.: Microsoft Research, 2009.

Mcpherson, Tara. “Why Are the Digital Humanities So White? Or Thinking the Histories of Race and Computation,” 2012.

Novick, Peter. That Noble Dream: The “Objectivity Question” and the American Historical Profession. Cambridge University Press, 1988.

“On Building.” Accessed May 18, 2015.

O’Neil, Cathy, and Rachel Schutt. “Doing Data Science.” O’Reilly Media, Inc., 2013.

Piper, Andrew. “Modelling Plot: On the ‘Conversional Novel.’” Accessed March 15, 2015.

Ramsay, Stephen. “The Hermeneutics of Screwing Around: Or What You Do with a Million Books,” 2010.

Thomas II, William G. “Computing and the Historical Imagination.” edited by Susan Schreibman, Ray Siemens, and John Unsworth, Hardcover. Blackwell Companions to Literature and Culture. Oxford: Blackwell Publishing Professional, 2004.

tjowens. “Discovery and Justification Are Different: Notes on Science-Ing the Humanities.” Accessed March 15, 2015.

[1] Jennifer C. Greene, Mixed Methods in Social Inquiry (San Francisco, CA: Jossey-Bass, 2007).

[2] According to Greene, positivism or positivistic thinking, is characterized by the following: “the social world exists independent of our knowledge of it (realism), a commitment to objective methods and to methodological sophistication, and the setting of questions of value outside the perimeter of scientific question of fact, all in service of causal explanation as universal truth.”[2] Post-positivists purport similar ideas, but with more recognition of methodological error and human fallibility.

[3] William G. Thomas II, “Computing and the Historical Imagination,” ed. Susan Schreibman, Ray Siemens, and John Unsworth, Hardcover, Blackwell Companions to Literature and Culture (Oxford: Blackwell Publishing Professional, 2004),

[4] Thomas analyzes the argument around Robert Fogel and Stanley Engerman’s Time on the Cross: The Economics of American Negro Slavery 1974) that used quantitative methods to examine controversial topics in American history: the economic profitability of slavery, the economy of the South before the Civil War, and the productivity of slave and free agriculture.

[5] For a thorough examination of the objectivity question within the historical profession, see the seminal historiographical work Peter Novick, That Noble Dream: The “Objectivity Question” and the American Historical Profession (Cambridge University Press, 1988).

[6] Thomas II, “Computing and the Historical Imagination.”

[7] Alan Liu, “Where Is Cultural Criticism in the Digital Humanities,” 2012,

[8] John Lewis Gaddis, The Landscape of History:  How Historians Map the Past (New York: Oxford University Press, 2004).

[9] Ibid. Pg. 34.

[10] Chris Forster’s blog post and comments attempts to lay out the different ‘rings’ under the ‘big tent circus’ of digital humanities: Chris Forster, “I’m Chris. Where Am I Wrong?,” accessed May 13, 2015,

[11] Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences (SAGE, 2014).

[12] Ibid. Pg. 137.

[13] Ibid. Pg. 146.

[14] Fred Gibbs and Trevor J. Owens, “The Hermeneutics of Data and Historical Writing,” ed. Kristen Nawrotzki; Jack Dougherty, 2013,–writing-history-in-the-digital-age?g=dculture;rgn=div1;view=fulltext;xc=1#7.3.

[15] The challenge though, is that the process and methods of historical scholarship are often not disaggregated as systematically in the form of the scientific method.

[16] tjowens, “Discovery and Justification Are Different: Notes on Science-Ing the Humanities,” accessed March 15, 2015,

[17] Gibbs and Owens, “The Hermeneutics of Data and Historical Writing.”

[18] Ibid.

[19] Anthony J. G Hey, Stewart Tansley, and Kristin Michele Tolle, The Fourth Paradigm: Data-Intensive Scientific Discovery (Redmond, Wash.: Microsoft Research, 2009).

[20] Jim Gray, “Jim Gray on eScience: A Transformed Scientific Method,” ed. Kristin Michele Tolle, Anthony J. G Hey, and Stewart Tansley (Redmond, Wash.: Microsoft Research, 2009), xvii – xxxi. Pg. xxv.

[21] Anthony J. G Lynch, “Jim Gray’s Fourth Paradigm and the Construction of the Scientific Record,” ed. Stewart Tansley, Kristin Michele Tolle, and Anthony J. G Hey (Redmond, Wash.: Microsoft Research, 2009), 177–83. Pp. 178-179.

[22] However, the difference is that the humanities is characterized by a community of scholarship and experimentation not predicated upon reproducibility in the same way.

[23] Lynch, “Jim Gray’s Fourth Paradigm and the Construction of the Scientific Record.” Pg. 181.

[24] Gibbs and Owens, “The Hermeneutics of Data and Historical Writing.”

[25] Ibid.

[26] Stephen Ramsay, “The Hermeneutics of Screwing Around: Or What You Do with a Million Books,” 2010,

[27] Philip Jia Guo, “Software Tools to Facilitate Research Programming” 2012,; “Journal of Negative Results in BioMedicine | About,” accessed January 29, 2015,; “On Building,” accessed May 18, 2015,

[28] Gibbs and Owens, “The Hermeneutics of Data and Historical Writing.”

[29] Cathy O’Neil and Rachel Schutt, “Doing Data Science” (O’Reilly Media, Inc., 2013),

[30] Andrew Piper, “Modelling Plot: On the ‘Conversional Novel,’” accessed March 15, 2015,

[31] Ibid.

[32] Tara Mcpherson, “Why Are the Digital Humanities So White? Or Thinking the Histories of Race and Computation,” 2012,

[33] Ibid.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s