Humanities Research and an XML Bridge

11 Jul

Amy Borsuk, Scripps College

I wasn’t surprised when my friend asked me, “How do you do research in the humanities; what is there to research?” I laughed; what else could I do? Humanities students could probably pay off their student loans if they were paid every time someone asked a question like this. With time, the issue for me hasn’t been the question, but how to answer it. To an English major/humanities student, there’s an unending amount of questions to delve into, and these questions are valuable and useful, but to those who don’t see the connection between the sciences and the humanities, the humanities serves no practical function.

So for her, the real questions are: how does research in the humanities serve a practical function? What product does humanities research produce? For me, the question is: what can I learn through the methodology of this particular kind of humanities research?

My friend and I operate under very different understandings of the value of research. It’s not to say that one is more valid than the other, but each one serves a very different function. Yet, the Counting the Dead digital archiving research project arguably responds to both perceptions of the value of research: we’re moving toward a product to be presented and used, but in order to get there, one must value the process.

We are working to design a digital archive of documents (varying in form from journal entries to diaries, newspaper articles, collections of letters, and Plague Bills) that document the plague outbreak in London in the 1660s. In this digital archive, we want certain kinds of information to be categorized and organized such as the mortality rate, names of cities and countries, names of people and characters involved in the outbreak, and so on. Enter the computer sciences, stage right. We’re using XML markup, a kind of tagging system that doesn’t influence the interface of a page, but how the information on the page is structured, categorized and organized. This system has required us to be very patient and flexible, because our methods don’t always work: we have an idea, we pursue it, the idea is complicated by contradictions, limitations in technology, etc., and we have to go back and revise. As I’m changing the revised tag for various elements in our documents, I’m reminded of my friends taking chemistry who lament that they messed up their titrations and have to start all over.

Even more excitingly, there are people in the sciences who acknowledge the value and logic of interdisciplinary application in literature. Two fellow friends and Scripps students, one majoring in Neuroscience and the other in Anthropology, were excited to hear about the use of XML and computer science in our archiving process. Both of them had used XML in their respective disciplines, and were fascinated by the translation of XML-usage for literature. As we talked, we felt a growing sense of satisfaction and unity: XML was a bridge connecting all of our disciplines and projects together through a shared tool.

So although some are determined to ask “So what?” about the value of humanities and literary research, others are connecting with us through familiar platforms in order to understand what’s so important about our work, and what’s exciting about it. I’m glad that this project has become a way for me to get involved in making these connections happen more frequently.

But….Do We Care About That?

6 Jul

As part of the Counting the Dead project, we’ve decided to blog about our process, which I’ve described as “exploratory encoding.” To this end, each member of the research team will be posting periodic pieces here that reflect on our work to date and where we hope we’ll be in the future. — Jacque Wernimont, Director

Beatrice Schuster, Scripps College

Many of my struggles with encoding revolve around that question, but it’s also what makes it so fun and engaging. When I first began encoding, I thought that it would be much like other computer or administrative skills I’ve learned; I thought I would get the process down, understand how to use the tags, and then most of the work would be in the actual technical process.

I’ve really enjoyed working on encoding plague documents because of the interesting discussions I’ve had with Professor Wernimont and my fellow researchers around the seemingly countless choices to be made concerning the encoding. There are infinite things to tag in a document – role names,place names, personal names, countries, dates, numbers, damage to the original document, line breaks, page breaks, column breaks, symbols, figures, headers,titles, etc.

To make this process even more complex, there aren’t just an endless number of things to tag, but an endless number of interpretations to make of the content of the document. For example, what exactly requires a “place” tag – is it Barcelona, TheKnifemakers’ Street, the pesthouse, or all three? In order to answer these kinds of questions, we have to figure out the kind of archive we’re creating – do people care about the Knifemakers’ Street as a place name that will show up when they search for all place names? These kinds of questions are also about efficiency; is it worth our time to encode every last detail of a document, its contents, and itsphysical appearance? What aspects are important to our goals as archivists?

The best thing about working as a research assistant on this project is that Professor Wernimont truly cares about all of our opinions on these issues. While to some, the idea of sitting around a table discussing whether or not a general reference to “the pesthouse” should be encoded as a would seem tedious, I’ve found these discussions invigorating and fun. They remind me how important each word is, and how many different interpretations it can produce in context.The only problem with these conversations is that we can discuss the pros and cons back and forth, but we ultimately end up asking ourselves and each other the same question: “Do we care about that, or not?”

Ultimately, we’ve always been able to come to a conclusion (in the case of place names, we decided that all specific and general place names are encoded), but it’s certainly a lot more complicated process than simply learning the different tags and applying them.

Many of our discussions around what is important and what is not also revolve around the differences and similarities between a digital archive and a traditional physical archive. In a physical archive, the person accessing the document would be able to see it exactly as it appears on thepage, with every line break, smudge, and decorative element. But in a digital setting – and more specifically in this particular digital setting – do we care about that?

This question is especially difficult to answer because it’s not as if the visual experience of a text is completely separate from its content; the two work together. It would benearly impossible to “translate” a text into digital form in its entirety,which is why we as a team have chosen to limit the extent of detail we encode in XML and provide a PDF of the source document along with the encoding. This way, those who access it can have the benefits of the text as if they were viewing it in a traditional archive while also making the content of the documents searchable.

Overall, the process of deciding which aspects we care about or don’t care about in encoding has reminded me of the complexity of language and shown me more about how we read texts. While I know that this archive will be an immensely useful resource, I think a large part of the value of it is in the process as well, because it brings up these questions and forces us, asscholars, to continue questioning and evolving.

Exploratory Encoding

3 Jul

Presented at the 2012 NITLE Summit

Keeping track of our texts

23 Feb

We have a very wide range of texts in our “Counting the Dead” archival project: early newspapers, running governmental tallies (in the form of mortality and plague bills), poetic commemorations, first person accounts and memorials, belated fictional narratives, and more. This post is a place holder of sorts as we start to organize our work.