Monthly Archives: July 2015

Peer-to-Peer Project Development: Concatenating databases

This is the first in what is expected to be a (short) series of posts that will reflect our engagement with the community of digital humanities researchers in ways that facilitate use of BPS.

The Problem and Challenge:

One of the innovative features of BPS is its corpus agnostic architecture: its tools for prosopographic analysis are designed to work on text corpora regardless of their specific content.  Conceptually, this reflects our contention that humanities researchers engaged in prosopographical research approach the source documentation as a repository of Names, Relationships, Activities in Documents (NRAD), and that each researcher has a workflow with which he/she is familiar and reflects his/her engagement with the evidence and research questions it supports.

Practically speaking, this poses a challenge for BPS, as the individual researcher’s presentation of data, be it in a database or in marked-up text, must be converted into TEI that can be ingested into BPS’s architecture. But BPS recognizes that not all, perhaps not most, digital humanities researchers have the capacity or desire to perform the transformation themselves.

Working toward a solution:

This summer, with support from the Capacity Building and Integration in the Digital Humanities project of the Digital Humanities at Berkeley Mellon grant, the BPS team is developing a protocol for the conversion of TEI/EpiDoc-Leiden into BPS-compliant TEI. This is the first report focusing on preliminary conversations between Laurie and Micaela Langellotti, a post-doctoral researcher at the Center for the Tebtunis Papyri housed in the Bancroft Library at UC Berkeley, as they consider how to prepare Micaela’s data for BPS.

The corpus contents and the presentation of its data:

Micaela is analyzing an archive of documents that consist of registers and contracts, two document types that present much the same data—names of principals in activities, dates on which transactions occur, identification of the object on which the transaction focuses, quantification of sums of money and commodities or areas of land, etc.—but in very different formats. The registers give a single line summary of the important facts, while the contracts preserve the full legal instrument that effected the transaction.  In her own research, Micaela collected the data from the register and the contract forms in an Excel spreadsheet and as a table in a Word document, respectively.  For her own purposes, this was sufficient, if not efficient; she had a single spreadsheet for each register, and the cells of the Word document contained data that addressed multiple attributes associated with each name instance.  Our first steps focused on harmonizing these formats.

E pluribus unum: integrating many spreadsheets into a single database

Micaela originally committed the data from each register to its own database. While they all adhered to the same structure, the multiplicity of files meant that she performed searches on each of them and then manually combined the search results for analysis.  Our first step was to concatenate the databases, and we ultimately decided to migrate them all into a single database created in FileMakerPro.

The only adjustment that had to be made at this stage was the addition of a field for the name (museum siglum) of the record from which each line of data was drawn.  Her original databases were named for each register, document-by-document. I am hardly a power-user of FM, but it was easy enough to show Micaela how to adapt the existing database structure (File > Manage > Database), by creating new field names (in this case, TextID). After populating this field, she saved the updated database as a copy of itself. This process meant that any snafus that might have arisen in successive updates to the FM database would affect only the lastest version, and we could always step back one iteration. Once Micaela had a single database for her register documents, we looked at the database structure itself with an eye toward facilitating markup for each name instance.

(In the next installment: Refining the database structure)