Peer-to-Peer Project Development: Concatenating databases

This is the first in what is expected to be a (short) series of posts that will reflect our engagement with the community of digital humanities researchers in ways that facilitate use of BPS.

The Problem and Challenge:

One of the innovative features of BPS is its corpus agnostic architecture: its tools for prosopographic analysis are designed to work on text corpora regardless of their specific content.  Conceptually, this reflects our contention that humanities researchers engaged in prosopographical research approach the source documentation as a repository of Names, Relationships, Activities in Documents (NRAD), and that each researcher has a workflow with which he/she is familiar and reflects his/her engagement with the evidence and research questions it supports.

Practically speaking, this poses a challenge for BPS, as the individual researcher’s presentation of data, be it in a database or in marked-up text, must be converted into TEI that can be ingested into BPS’s architecture. But BPS recognizes that not all, perhaps not most, digital humanities researchers have the capacity or desire to perform the transformation themselves.

Working toward a solution:

This summer, with support from the Capacity Building and Integration in the Digital Humanities project of the Digital Humanities at Berkeley Mellon grant, the BPS team is developing a protocol for the conversion of TEI/EpiDoc-Leiden into BPS-compliant TEI. This is the first report focusing on preliminary conversations between Laurie and Micaela Langellotti, a post-doctoral researcher at the Center for the Tebtunis Papyri housed in the Bancroft Library at UC Berkeley, as they consider how to prepare Micaela’s data for BPS.

The corpus contents and the presentation of its data:

Micaela is analyzing an archive of documents that consist of registers and contracts, two document types that present much the same data—names of principals in activities, dates on which transactions occur, identification of the object on which the transaction focuses, quantification of sums of money and commodities or areas of land, etc.—but in very different formats. The registers give a single line summary of the important facts, while the contracts preserve the full legal instrument that effected the transaction.  In her own research, Micaela collected the data from the register and the contract forms in an Excel spreadsheet and as a table in a Word document, respectively.  For her own purposes, this was sufficient, if not efficient; she had a single spreadsheet for each register, and the cells of the Word document contained data that addressed multiple attributes associated with each name instance.  Our first steps focused on harmonizing these formats.

E pluribus unum: integrating many spreadsheets into a single database

Micaela originally committed the data from each register to its own database. While they all adhered to the same structure, the multiplicity of files meant that she performed searches on each of them and then manually combined the search results for analysis.  Our first step was to concatenate the databases, and we ultimately decided to migrate them all into a single database created in FileMakerPro.

The only adjustment that had to be made at this stage was the addition of a field for the name (museum siglum) of the record from which each line of data was drawn.  Her original databases were named for each register, document-by-document. I am hardly a power-user of FM, but it was easy enough to show Micaela how to adapt the existing database structure (File > Manage > Database), by creating new field names (in this case, TextID). After populating this field, she saved the updated database as a copy of itself. This process meant that any snafus that might have arisen in successive updates to the FM database would affect only the lastest version, and we could always step back one iteration. Once Micaela had a single database for her register documents, we looked at the database structure itself with an eye toward facilitating markup for each name instance.

(In the next installment: Refining the database structure)

BPS blogs internationally and bilingually (with help from our friends)

Visit Llibredigital to read our blog post in English or in Spanish.  Thanks to Neus Rotger, Professor of Arts and Humanities, Universitat Oberta Catalyuna and Visiting Scholar, UC Berkeley’s department of Rhetoric, for inviting BPS to contribute to her blog.  Produced in connection with the  Master of Digital Edition (UOC), the blog concerns all aspects of the digital book. The impetus for this collaboration came as a result of the DH Faire.

BPS at the DH Faire

On April 8, Following the keynote address by Professor Zephyr Frank (Department of History, Spatial History Project at Stanford University), Laurie Pearce will participate in a panel discussion of the Landscape of Berkeley DHOther panelists include Elizabeth Honig (Art History, Brueghel Family Research Website) and Francesco Spagnolo (Magnes Museum). Prof. Cathryn Carson (History) will moderate what promises to be a lively discussion among these active members of the Berkeley DH community. The roundtable will take place in the striking new home of the Social Science Matrix, on the 8th floor of Barrows Hall.

BPS will also be displaying a poster in the session that concludes the DH Faire, also in the Matrix space.  Get all the details, and rsvp for the reception here.

BPS and the Social Science Matrix

Over the course of the fall 2014 semester, BPS team members Laurie Pearce and Patrick Schmitz have been engaging with staff and team members from other DH projects, based both on and off campus, in the new Social Science Matrix program. Laurie has written a blog post about one of the workshops, which you can see here.

BPS awarded NEH Digital Humanities Implementation Grant

BPS is pleased to announce that it is a recipient of one of the seven just-announced NEH Digital Humanities Implementation Grants.

The two-year grant will support the efforts of the BPS team (co-PIs: Laurie Pearce, Niek Veldhuis; Technical Lead: Patrick Schmitz), as they transform BPS from a prototype to a functional out-of-the-box toolkit for prosopographical research and social network analysis (SNA). The BPS team will introduce new features including:

  • The ability to import corpora from existing databases, converting the data into the TEI format used by BPS
  • Natural language processing tools to pre-process TEI corpora and automatically mark up activities and roles
  • Extended visualization tools to support interactive viewing, and filtering the results of disambiguation and SNA
  • Generalized feature model that can capture more traits about individuals (e.g. life roles, birthplace, office/title, etc.)
  • Workspace access control to support collaboration

Workshop announcement! DH-CASE II: Collaborative Annotations in Shared Environments

Image

BPS team members Patrick Schmitz and Laurie Pearce, and Berkeley Research IT DH specialist, Quinn Dombrowski will be chairing a workshop,  DH-CASE II: Collaborative Annotations in Shared Environments, co-located with, and immediately preceding, DocEng2014, in Ft. Collins, CO.  The full-day workshop will take place on Sept. 16.