Historical Texts as Data?

(This post was originally published on the Cultural Heritage Informatics Fieldschool blog here)

Last summer I had the chance to travel to the colonial archives in Aix-en-provence, France (Archives nationales d’outre mer or ANOM) to get a taste of ‘primary document archival research.’ Armed with a digital camera, a macbook, and a French dictionary, I bumbled around the archives, attempting to mirror the sense of confidence and purposefulness that other scholars seemed to have. After a month of 9-5’s at the archives (and evenings of pastis and concerts in Aix), what did I have to show for my dedicated data-collecting? Over 3,000 poorly labeled digital photos, an incomprehensible excel sheet of ‘important!’ records, and the overwhelming sense of gloom that I would never get through the endless number of primary documents needed to do my research.

After a speedy bootcamp introduction to data this week, I now realize the incredible importance of creating a sensible workflow and metadata structure as I am doing archival research. Historians don’t necessary call this process ‘data-collecting,’ but looking at the process that way could be an useful way to save time and not feel overwhelmed. The topics this week didn’t necessarily address organization and workflow, but in our discussions about cleaning/scraping/visualizing data, it reminded me to think about the basic components needed to produce good data.

1. Organization is Key

For historians,’data-collecting’ is akin to semi-purposefully/randomly reading old documents with a theme in mind. With the ease of digital photography, OCR, and more and more online databases, many scholars including myself fall into the trap of ‘over-collecting.’ Although over-collecting can be helpful to make more thorough and better supported arguments, your data won’t be of any use if it’s not organized. Most simply you need 1) a place to store the metadata (data that gives information about other data, e.g. title, author, date, publisher) of each record and 2) a way to insure you can find the original file. Some scholars do this by an excel sheet and subfolders in their harddrive. I have personally used Zotero to input my metadata and Dropbox and Mac Timemachine to continually back up my data. Although it might seem to take a lot of your time, detailed recordkeeping will prove useful when you return from field research and begin your writing stages.

Other useful pre-archive tips to keep in mind : Link
A full guide to archival research including a ‘record keeping sheet’ that can be an example for your metadata schema: Link

2. How can a Historical Text translate to Data?

In my own research on Vietnamese travel stories, I deal with a lot of narratives and reports that don’t automatically translate into ‘hard’ data that can be easily visualized or manipulated. Like other disciplines that do close readings of texts and qualitative analyses, history can seem antithetical to large datasets and quantitative analyses. However, data-oriented methods such as ‘text-mining’ seem to be making their way into changing traditional humanistic inquiry and research. Essential to analyzing large and small data sets is the actual collection of metadata that describes the object. This is no simple task though because it also involves the larger question, “what do you as a researcher want these objects to say/show/prove/demonstrate?”

I have just submitted to my committee my thesis titled “Where People and Places Meet: Travel and the Spatial Identities of Indochina, France, and Hue in 1920s-1940s Vietnamese Print,” where I examine tourism advertisements, socio-cultural reports, and travel stories, or du ký, to understand how travelers ideologically ‘mapped’ places with cultural, colonial, political, personal significance through the publication of their travel experiences. As you can tell, this study was quite textual and theoretical in nature. Even though I have extensively read and analyzed these texts, I did not extract these sources for data in a consistent method. I started to reflect how I could input relevant components into a table (such as traveler name, gender, age, group size, destinations, and transportation), and in doing so, I have already begun to look at my research in a different way. I asked basic questions such as “How do these texts relate? How are they different?” to “Are there more isolated journeys or group journeys? What are the primary modes of transportation represented?”

I am currently brainstorming different ways of translating these textual representations of movement into a visualization, such as a map of the popular travel routes with a temporal component to understand global events and transportation developments.

Recent Talks

Bibliotactics Published: Available Open Access Globally, Book Talks, Podcasts

After a long journey, Bibliotactics: Libraries and the Colonial Public in Vietnam is now available! Free Open Access or Print copy through UC Press (code UCPSAVE30 for 30% off). Follow specific book related events, see video recordings and teaching resources here: https://bibliotactics.com Below are a flurry of recap of the past month: Spring in person and…

Saigon Experimental “Book Talks”: Art, Community, Women’s Reading, August 8-9, 2025

Join us in Saigon for experimental community talks around reading, feminism, and art. It’s an honor and dream to “start” my book tour in Saigon (where the intellectual/personal/political journey began) alongside feminist cothinkers Minh and Yen. August 8, 2025: “Library History as Data-Art-Life: Fabulating the Vietnamese Past” August 9, 2025: “Women’s Reading: Theory and Practice”…

On Slowness: A World Building Provocation for Teaching and Research

In the past, I was frustrated by how slow my work moved. I was impatient that my critical inquiry, confusion, curiosity forced me to constantly revisit sources, translations, historical contexts. Market pressures to publish and produce (a talk, an article, a dissertation, a book, teaching, digital resources) with the promise of professional security perpetuated a…

Teaching Resources

Feminist Decolonial Futures: Tactics of Teaching Digital Humanities [New Publication]

Link to Publication on Journal of Interactive Technology and Pedagogy: https://cuny.manifoldapp.org/read/feminist-decolonial-futures-tactics-teaching-digital-humanities/section/7ec4034b-77d8-4722-9ab1-e63b8e6e0aef Excerpt: “This piece is part of a commitment to sharing and creating human communities of care alongside the materialist structures of support that constrain the everyday, such as time and resources. Interwoven with feminist and decolonial praxis, this piece includes a collection of critical…

[TEACHING] On Digital Teaching During and After COVID-19

On November 9, 2020 I was invited to speak on the topic of digital teaching as part of the History Department, Center for Digital Scholarship, and 21st Century PhD Series at Brown University. The talk was well attended on Zoom from faculty, staff, and students from all over campus. I talk through concrete activities, tools,…

Translating Across Time and Space: Film Screening, Artist Talk, and Creative Translation Activity at Harvard

I was invited to speak at an innovative event on translation and creative expression organized by the scholar Catherine H. Nguyen from the Committee on Degrees in History and Literature and the Committee on Ethnicity, Migration, and Rights at Harvard University. Together with poet-scholar Quan Tran, we shared our scholarship and arts practice. I spoke…

1. Organization is Key

2. How can a Historical Text translate to Data?

Related

Leave a comment Cancel reply