(This post was originally published on the Cultural Heritage Informatics Fieldschool blog here)
Last summer I had the chance to travel to the colonial archives in Aix-en-provence, France (Archives nationales d’outre mer or ANOM) to get a taste of ‘primary document archival research.’ Armed with a digital camera, a macbook, and a French dictionary, I bumbled around the archives, attempting to mirror the sense of confidence and purposefulness that other scholars seemed to have. After a month of 9-5’s at the archives (and evenings of pastis and concerts in Aix), what did I have to show for my dedicated data-collecting? Over 3,000 poorly labeled digital photos, an incomprehensible excel sheet of ‘important!’ records, and the overwhelming sense of gloom that I would never get through the endless number of primary documents needed to do my research.
After a speedy bootcamp introduction to data this week, I now realize the incredible importance of creating a sensible workflow and metadata structure as I am doing archival research. Historians don’t necessary call this process ‘data-collecting,’ but looking at the process that way could be an useful way to save time and not feel overwhelmed. The topics this week didn’t necessarily address organization and workflow, but in our discussions about cleaning/scraping/visualizing data, it reminded me to think about the basic components needed to produce good data.
1. Organization is Key
For historians,’data-collecting’ is akin to semi-purposefully/randomly reading old documents with a theme in mind. With the ease of digital photography, OCR, and more and more online databases, many scholars including myself fall into the trap of ‘over-collecting.’ Although over-collecting can be helpful to make more thorough and better supported arguments, your data won’t be of any use if it’s not organized. Most simply you need 1) a place to store the metadata (data that gives information about other data, e.g. title, author, date, publisher) of each record and 2) a way to insure you can find the original file. Some scholars do this by an excel sheet and subfolders in their harddrive. I have personally used Zotero to input my metadata and Dropbox and Mac Timemachine to continually back up my data. Although it might seem to take a lot of your time, detailed recordkeeping will prove useful when you return from field research and begin your writing stages.
- Other useful pre-archive tips to keep in mind : Link
- A full guide to archival research including a ‘record keeping sheet’ that can be an example for your metadata schema: Link
2. How can a Historical Text translate to Data?
In my own research on Vietnamese travel stories, I deal with a lot of narratives and reports that don’t automatically translate into ‘hard’ data that can be easily visualized or manipulated. Like other disciplines that do close readings of texts and qualitative analyses, history can seem antithetical to large datasets and quantitative analyses. However, data-oriented methods such as ‘text-mining’ seem to be making their way into changing traditional humanistic inquiry and research. Essential to analyzing large and small data sets is the actual collection of metadata that describes the object. This is no simple task though because it also involves the larger question, “what do you as a researcher want these objects to say/show/prove/demonstrate?”
I have just submitted to my committee my thesis titled “Where People and Places Meet: Travel and the Spatial Identities of Indochina, France, and Hue in 1920s-1940s Vietnamese Print,” where I examine tourism advertisements, socio-cultural reports, and travel stories, or du ký, to understand how travelers ideologically ‘mapped’ places with cultural, colonial, political, personal significance through the publication of their travel experiences. As you can tell, this study was quite textual and theoretical in nature. Even though I have extensively read and analyzed these texts, I did not extract these sources for data in a consistent method. I started to reflect how I could input relevant components into a table (such as traveler name, gender, age, group size, destinations, and transportation), and in doing so, I have already begun to look at my research in a different way. I asked basic questions such as “How do these texts relate? How are they different?” to “Are there more isolated journeys or group journeys? What are the primary modes of transportation represented?”
I am currently brainstorming different ways of translating these textual representations of movement into a visualization, such as a map of the popular travel routes with a temporal component to understand global events and transportation developments.