Texts as Data—Data as Texts


Rather than get lost in the semantic battle of defining disciplines (What is/are the digital humanities?), this presentation explores how we as humanists can use data to help us think through our humanities questions, evidence, and argument. Drawing from ‘digital’ and ‘data science’ methods of experimental design and operationalizing, I shared my data science project on the library of congress collection of Vietnamese materials.


Video of presentation

This talk was part of the “Texts as Data—Data as Texts” Seminar and Workshop at Yonsei University in Seoul on January 12, 2017.

Continue reading “Texts as Data—Data as Texts”

A Humanist Does Data Science: ‘Deconstructing’ Libraries Project

Library of Congress Main Reading Room
Library of Congress Main Reading Room

Spring of 2016 I enrolled in my first ever graduate level data science course at the School of Information at UC Berkeley. The course ‘Deconstrucing Data Science’ investigated quantitative methods of machine learning and data analysis. Coming from a humanist background, the course challenged me to think in drastically different ways about evidence, data, and argument. In the process of learning new data science methods, we reflected on experimental design and challenged the underlying assumptions of empirical methods. These critical reflections resonated with similar debates around the ‘scientific’ character of history and the social sciences to draw informed conclusions about the past and society.

Continue reading “A Humanist Does Data Science: ‘Deconstructing’ Libraries Project”

Data and the Humanities: Digital Humanities and Disrupting Disciplinary Boundaries

Below is a working paper I wrote for my graduate course on the History of Data Science led by Professor Cathryn Carson Spring 2015 at UC Berkeley. You can see the bibliography of readings from our class on my Zotero library.


What does data mean to a humanist? What would it mean to datify humanistic inquiry? This paper examines the recent literature on data within the humanities and critical debates about ‘digital humanities’. As I demonstrate in this paper, the debates around data within the humanities fits within three interlocking frameworks: first, the tensions between relevancy and distinction within the humanities in relation to the sciences; second, the boundary work of defining and distinguishing ‘digital humanities’; and third, the shifts in methods occurring across all disciplines around data, data intensive sciences, mixed methods, and scholarly communication.

I recognize that debates around disciplinary identities and methods can transform into polemical battles around academic territory. I seek to immerse myself in these debates to understand the contours and ridges and to understand the boundary making processes currently manifesting within the humanities. Observing historical patterns within these debates, I conclude that a historiography of data science, data intensive humanities, and digital humanities is inevitably a narrative about disruption of existing disciplinary boundaries.

Continue reading “Data and the Humanities: Digital Humanities and Disrupting Disciplinary Boundaries”