Digital Humanities Congress 2018, DHI Sheffield

Last week I attended the Digital Humanities Congress 2018, organized once every two years by the Digital Humanities Institute of the University of Sheffield. This conference is a major venue in the UK to see what’s going on in Digital Humanities in the country and also abroad (this year with quite some delegates from Poland and the Netherlands, despite the very successful DHBenelux last June).

To me, the great topics this year were: workflows and infrastructure for digital scholarship, text processing and analysis, and digitization.

DH infrastructure

Platforms for archiving, cataloguing, preserving and accessing humanities materials are always a big concern at DH conferences, and were no less in DHC2018. Two keynotes touched upon the issue. The Digital Panopticon is a large scale project that integrates 50 datasets, 4M records and 250K individuals, tracing convicts between Britain and Australia (1780-1925). Record linkage was used to conciliate identities across databases in an effort that reminds Linked Data a lot (and it won’t be the last; Linked Data and ontologies were present in quite a number of papers). As usual, visualization and simulations act as interfaces for interpreting the integrated database.

https://t.co/YzNF0O7ScA matches 4M records and 250K people across 50 databases with record linkage –Linked Data for History at its best! #dhc2018 pic.twitter.com/RRXaCbHkKD

— Albert Meroño (@albertmeronyo) September 6, 2018

The second keynote on the topic was by DARIAH-EU and its fundamental work on providing digital research infrastructure for the Arts and the Humanities in Europe. Archives and provenance are important, but central to DARIAH is a group of independent decision makers that operate on transactional cooperation to facilitate visibility and quality control. This materializes through various working groups of volunteers around virtual competence centers and areas of interest. For example, one is about Artificial Intelligence and Music (disclaimer: I’m co-chair of this one :-) ).

As I mentioned, many other papers discussed ontologies as fundamental parts of DH infrastructure, as in Mapping Museums and Managing Patchy Data from Birkbeck College where the standard stack of ontologies, RDF and triplestores are used to integrate metadata from various museums. On a more upper level exercise, Eva Seidlmayer proposed a taxonomy for philosophy digital objects called the Internet Philosophy Ontology (InPhO) that brings a vocabulary to describe these under different philosophy fields.

On a radically more applied talk, I really enjoyed the recipes for preserving the websites of DH projects (applies equally well to any other website) of Jamie McLaughlin. A cool idea is to set a standard HTTP header announcing the expected death date of a website:

Text processing

A great number of presentations touched upon text processing and analysis. In this respect it’s great to see quick adoption of embeddings and distributional semantics to understand large collections of texts from various domains. For example, Susan Leavy from University College Dublin used embeddings to grasp the context in which illnesses happened in the past, but had interesting insights in how she had to resort to medical historians in order to understand the change of meaning of medical terminology over time.

Another interesting work in this area was presented by Cristina Vertan from the University of Hamburg, reflecting on the annotation of vagueness and uncertainty of sources. This was about a really complex network of imprecise translations in multi-language texts (quite challenging a mix of English, German, Latin, Romainian, and Turkish) from Dimitrie Cantemir, proving that quality data does not just come out of the blue.

Digitization

A fundamental paper here was given by Stephen H. Gregg (Bath Spa University) that deeply reflected on how OCR impacts access to the contents of 18th century books (he was holding an exemplar during his talk), in particular on what’s left out in the digitization process. In general these are the same technological concerns as when microfilm was developed (although lacking the massive information retrieval features), and insisted in the fundamental importance of recording high quality provenance information. I had interesting post-talk discussions about standard ways of doing this (in the Semantic Web we have PROV) but common terminologies in this field are hard to agree upon.

Miscellaneous

A great thing of DH conferences are unclassifiable talks; my two favorite ones fall into this category. The first was by Sebastian Zimmer (Cologne Center for eHumanities), a mathematical and physical modeling of various modalities of time travel in science fiction. The talked featured a demo of a 3D Web application to visualize the models and how users can save and upload models of their favorite shows and movies (Star Trek is a topper!).

Amazing mathematical modelling by @szimr of non-linear timelines in fiction. DH can merge science and humanities in unexpected ways :-)#dhc2018 pic.twitter.com/J8fCjPNeTT

— Albert Meroño (@albertmeronyo) September 6, 2018

Tied with this fantastic work was the paper by Leah Henrickson (Loughborough University) on Natural Language Generation and its break of the hermeneutic contract and the way readers perceive authorship. I found this work to also have deep implications in the way we design AI systems, since intent is fundamental in such contract — what intentions have machines when they generate texts? The discussion pointed towards symbolic formalizations of intent and explainable AI.

Fascinating talk by @leahhenrickson on evaluating outputs of NLG reg authorship and questioning hermeneutic contracts — cognitive science deep implications #dhc2018 pic.twitter.com/gMCbYWjzHU

— Albert Meroño (@albertmeronyo) September 7, 2018

Overall DHC2018 was a really enjoyable event, and interactions with participants are always insightful and productive. Looking forward to repeat on DHC2020!

Random notes:

linguisticdna.org, really interesting model for tracing change of meaning over time — impressive applications in old manuscripts
English ale rocks! It does!
The illusion of completeness, beware of technical solutions that seem to cover every corner of a general workflow