Trip Report: DHBenelux 2019

I spent a few days last week attending the Digital Humanities Benelux (DHBenelux) 2019 Conference in Liège, Belgium. This has become a classic in the DH sphere in Europe, now running for its 6th edition and offering a distinctive, mature, and cohesive view on DH that to me was especially present this year.

We had 42 paper presentations (8 long papers, 34 short), 3 panels, 2 keynotes, and a lively poster & demo session. The organizers were also proud to present the new DH Benelux Journal, which invites the community to submit full articles based on their conference presentations. The first volume came out just a few hours before the start of the conference with last year’s topic “Integrating Digital Humanities”.

This edition had a common topic “Digital Humanities in Society”. To me, the core topics of the conference were: “we shape our buildings; thereafter they shape us”; AI and Deep Learning on Humanities datasets; and network and text analysis.

“We shape our buildings; thereafter they shape us”

This is a famous quote by Winston Churchill after WW2 I learned while reading Andrew Keen’s The Internet Is Not the Answer. Back then it had a very specific meaning regarding the reconstruction of the House of Commons, but I like Keen’s more general interpretation that buildings have a great influence in our social behavior and culture; and at the same time, it’s us who are in charge of designing and building them in concrete ways.

This is to me a good metaphor to explain the influence and impact that digital tools have in the practice of digital scholarship, the main topic of the two excellent keynotes by Tim Hitchcock (University of Sussex) and Helle Strandgaard Jensen (Aarhus University). In computer science we tend to think of keyword-based search and its simple interface as an effective way of retrieving information. However, keyword search has deep implications in the workflows of digital scholarship and historical research in missing often contextual information and the visibility of costs and provenance.

Contextual information was well covered in Hitchcock’s talk, which I can’t fully cover here in all its rich details but I forward readers to his own blog. It revolved about the concept of the “infinite archive” and the differences between keyword search interfaces and the “old school” systems of libraries and archives, which involved facing their top-down cataloging and categorization of the world when searching. This contextual information (knowledge fields governing the document you look for; its neighbors; their sizes; their density/sparseness; etc.) is typically not shown by general search engines but fundamental to the historian to perceive some “vision of the whole”. Then he switched from tool criticism to actual examples where this is addressed –some inspired by the “atlas of knowledge” approach of Katy Börner (who I had the pleasure to meet in 2012, as a fresh PhD student, in giving a Sci2 Tool tutorial)– following interfaces like OldBaileyVoices where entries are visualized and made accessible in terms of their knowledge category (e.g. the Library of Congress Classification), their word count, and other catalog metadata. I found this very inspiring since organizing knowledge at Web scale (as hard is it is) is one of the missions of the Semantic Web, so I asked about whether there are fundamental differences in the nature of archive-curated or Web-born knowledge (the short answer is no, and the main difference is apparently institutional). Contextual text (i.e. text before and after a keyword match) is also key in these processes. So overall, my takeaway was a call for less algorithmic search and more interfaces suited for humanities workflows.

Jensen’s talk had similar foundations on the value of libraries and archives, but with an emphasis on the costs and how hard it is to catalog and archive appropriately. Her punchline was that digital access has become “too easy” and this ease tends to hide from the user the enormous work of cataloging and archiving. This used to be explicit to users before since the moment you crossed the archive’s door or asked the librarian for something hard to find. So in a quick and efficient keyword search this value is often not perceived, but there were ideas on how to make it more visible (e.g. through provenance standards; I thought also of more explicit data citations and alt metrics):

These are critical topics that are evidently under research, but we should think more about them in our digital research infrastructures like CLARIAH (which received high appraisal in the conference even from colleagues outside the Benelux) and Parthenos.

AI and Deep Learning on Humanities data

The Deep Learning fever has also reached DH and there were various examples at the conference. AI was explicitly mentioned and encouraged in the call for papers, from both the ethically concerned perspective on digitization and data access; but also more practically on applying DL for typical DH tasks and datasets.

The latter was exactly what the project of Sally Chambers (University Ghent) and Matthia Sabatelli et al. (Uni Liège) did at INSIGHT on History and Art. They developed a transfer learning approach for automatically classifying paintings recognizing musical instruments in them, with a model that was originally trained on modern instrument pictures. I found this really interesting as instrument pictures are highly available and transfer learning is relatively cheap; and offers many options for multimodal entity linking with e.g. symbolic music notations (disclaimer: my own work), something to look forward to.

Network and text analysis

Network and text analysis is a classic in DH conferences and it’s about processing both unstructured and structured humanities data. I really liked the network analysis work of Ingeborg van Vugt (Utrecht) for disclosing the social network of librarian Magliabechi through his letters in early Dutch Republic, finding different social capital depending on your position in the network.

This was also present in Julie Birkholz’s (University Ghent) paper which had a really cool punchline questioning visualizations, unjustified network metrics and data incompleteness; and calling for an explicit connection between research questions and specific metrics. This is a path we definitely want to explore in the next iteration of CLARIAH in the Netherlands. Julie and I also had together a demo with a Jupyter Notebook to do network analysis on RDF graphs without tears that was very well received by the community. My own paper was more a call for looking into FAIR principles when we share and reuse objects in musicological research, with practical implications in e.g. multimodal entity linking (journal article coming on this soon).

Text analysis is always very present but I especially liked the work of James Baker et al. (University of Sussex) (slides here) on using word frequency, word lists, collocation, keyness measures and archival work for identifying style and voice patterns and deliberate language choices of Mary Dorothy George. An excellent example putting to work the concepts of keywords vs archival context discussed in the keynotes. Very much related to this, Marijn Koolen et al. (KNAW Humanities Cluster) scaled this analysis up to a number of historical text collections and aim at reusing their structural contexts for a more complete and rigorous analysis. So in general: machine-readable text is not enough.

This is to say that access to text context is not just a requirement from DH scholars but an integral part of collections that must be considered in databases, tools and workflows.

In conclusion, I found this conference has matured to a point it had a distinctive approach to DH than its larger international sister DH2019 (of which I wrote a report about here), with more history, archives, and explorative tools. I look forward for attending again next year in Leiden!

Random notes