Trip report: ISWC 2020

Three weeks ago I attended the first online International Semantic Web Conference, originally planned to happen in Athens, Greece. ISWC is the prime venue for Semantic Web and Knowledge Graphs. Despite the ongoing pandemic (or thanks to it?) ISWC’s attendance grew considerably this year: we had 534 delegates, and workshops and tutorials typically peaked at 40-70 attendees. A huge thanks to the organising committee who noticeably put an incredible effort to make this the best online conference of the year, by far! This was also the first conference I attended with my new Lecturer in Computer Science at King’s College London hat :-)

The research track had an acceptance rate of 22%, in line with prevous ISWC editions; and 685 reviews were provided for 170 submissions (around 4 reviews per paper which is great).

Workshops and Tutorials

Satellite events worked in general very well in the online setting; I particularly enjoyed the extra high availability of recorded presentations and slide decks through the whole conference. I co-organised the SPARQL Endpoints and Web API (SWApi) tutorial with the great Pasquale Lisena. I would very much encourage you to have a look at our materials if you work on the Knowledge Graph access/API area. Big thanks to all 39 participants who dropped by. A lot is happening here and we are tidying up a compilation of materials through another medium for next year, so stayed tuned!

I also attended the Wikidata Workshop and the tutorial on Common Sense Knowledge Graphs, which showed many interesting connections around how humans do knowledge engineering from a practical, large-scale view. Wikidata has become the prime lab to study this as shown by Lydia Pintscher and Kat Thornton great keynotes: Wikidata has more edits than any other Wikimedia project; has become a central data-hub for them (54% of Wikimedia articles use data from Wikidata through 6.5M queries a day –I found this a great argument for query management for knowledge graphs). In general content growth is outpacing community growth, so automated techniques (e.g. machine learning for data quality) to help keep up are much needed. It was nice to see this community reflecting on knowledge engineering issues from a practical perspective (I found this video from BBC Scotland on “what is soup?” hilarious). Kat spoke about the use of ShEx “E namespace” schemas for modelling domains, which can be linked to each other and reminded me a lot of Ontology Design Patterns. Many good papers on NER for Wikidata, similarity metrics, engineering around syncing ontology edits and suggesting citations for Wikidata’s claims based on Wikipedia’s external references were presented. I think these were all very interesting approaches to practical, large-scale knowledge engineering and I’m looking forward of what we can learn from such a large social lab.

I’m using Filip Ilievski‘s excellent paper on commonsense knowledge on Wikidata (interestingly small in size, as it happens with T-Boxes) to smooth-transition to the tutorial on Common Sense Knowledge Graphs (CSKG) from which I learned a great deal. CSKG represent “shared conceptions” among humans and of course the goal is to make them available to computers. So this is some sort of “world knowledge” that goes from basic physics to social behaviour (I look at this kind of knowledge a bit more from a Cultural AI perspective).

Many thanks to the organisers who did a great archeological effort to historically reflect on CSKG, putting together a great resource for semantic web researchers interested in CSKG. CSKG have actually been around for quite some time in a variety of forms, from Cyc (a large symbolic common sense knowledge base, but with limited top-down reasoning) to COMET (a tensor-based, common sense knowledge generator based on query answering). So an obvious goal here is integration and interoperability across modalities and CSKGs, for which some work (and more) has been already proposed (e.g. VL-BERT); although many challenges remain regarding granularity of relations, variety of representations (e.g. symbolic vs language models), and foremost a tiny overlap of concepts/entities between different CSKGs. Hyper-relational graphs like property graphs to qualify relations are being very useful here. Overall I thought these are vibrant communities addressing key issues around knowledge communities and the Web.

Main conference

I thought the big topics for this year at ISWC were: keeping reality in check, semantic programming, and hybrid KG ecosystems.

Keeping reality in check

To me this was one of the big takeaways of the conference, as it was present in both vision sessions (a fantastic initiative that I’d love to continue seeing in ISWC) and many papers. This revolves around how the big questions of the semantic web (sharing common conceptualisations, usefulness of systems and the Web, knowledge engineering, reasoning) relate to real-world scenarios and observable social behaviour. Carole Goble brilliantly summarised this as “reality” and I pretty much agree. I thought Miriam Fernandez‘s idea of having human-centric evaluation metrics, e.g. inspired by some sort of “semantic web clinical trials” evaluating actual impact of systems on humans, is a great way of advancing towards this. Similarly Elena Simperl made a call to arms to ask again important questions about knowledge engineering in the light of the new apps and requirements: for example, what is the Jupyter Notebook for knowledge engineering? And the fact that this wave of AI will not succeed until we understand how knowledge engineering works in the 21st century (e.g. in Wikidata where knowledge engineering happens quite tacitly at large scale). On a more ethical side Jeni Tennison reminded us that knowledge is power, and the visions we have for data dictate how our societies will work in the future (I thought this was a very lucid call to concentrate on data institutions and ensuring they empower society); and Helena Deus focused on FAIR , how it can (literally) save lives, the importance of generalisations, and the explicit declaration of intent for which data are collected as a means to address data bias. In a way these were all examples of “back to basics” in the semantic web, but thinking more specifically about the realities of today and tomorrow.

A lot of papers touched upon this “keep reality in check” from two angles: ontology engineering, and empirical user behaviour. On the ontology engineering side, work on Cultural heritage and Digital Humanities that extends CIDOC CRM for archaeology and represents knowledge about excavation sites sets a great example; and the explanation ontology offers an interesting model of explanations for user-centered AI. On the more empirical side, it was interesting to see Google Dataset Searches by the Numbers, which showed an impressive index of 31M datasets from 4.8K domains, from which structured data accounts 2/3 of all data downloads (would be nice to see what are the most frequent metadata properties in the datasets more often downloaded/appeared in top search results?); and I loved the excellent motivation and results of Revealing Secrets in SPARQL Sessions that looked into query intention prediction and recommendation, as I think query management for knowledge graphs will continue raising in importance.

Semantic Programming

This was a surprisingly popular topic that I think historically is more connected to ESWC, but it was nice to see it so well covered at ISWC this year. It certainly covers aspects of semantic web programming, but this time it was mainly about how knowledge graphs can support and empower developers, and bring intelligence to various coding activities. I thought Kavitha Srinivas keynote, describing the Graph4Code Knowledge Graph, was clearly spot on and revealed many interesting applications like helping developers understand what code is trying to do at the semantic level by integrating different sources (code, documentation, class hierarchy, programming fora on the Web); so not just an arbitrary syntactic RDF conversion, but a KG construction pursuing a clear research question. I thought the use of transitivity in SPARQL was a clever way of doing model recommendation through reasoning; and RDF*/SPARQL* found yet another application since direct edge annotations are needed here. There was also a call for standardising function calling in SPARQL (we have worked on creative workaround for this with Scry).

Of course, this is not to say that programming for the Semantic Web and providing tools for developers to interact with knowledge graphs was not a topic on itself. For example LDflex is a read/write Linked Data abstraction for front-end Web developers that lets coders work with RDF using simple JS expressions over a single JS object that can be read, written and “awaited” for through one single JSON-LD mapping object. With a more CLI taste, HDTCat makes HDT (the Header-Dictionary-Triples compression format for RDF) scalable by introducing a linear operator that leverages sorted dictionaries to enable fast joining of HDT files; the interface is as simple as $ hdtCat rdf1.hdt rdf2.hdt > hdtJoin.hdt. SPARQAL is an analytics recursive extension for SPARQL similar to TigerGraph GSQL and LD-Script. More on the knowledge graph access side, Daniel Garijo et al. presented OBA, a really neat way of producing more RESTful OpenAPI specifications by mapping ontologies to object schemas. It was really cool to see this using components of our very grlc. I thought these were all really interesting approaches that showed a caring community for its tools and users.

Hybrid Knowledge Graph ecosystems

The third main trend of the conference was to me the blooming of hybrid knowledge graph ecosystems. We have seen many examples now of hybrid symbolic-neural models, where e.g. ontologies are used to improve neural models, or neural models are used to improve ontologies (with interesting challenges like few/0 shot learning, trust, or bias). So I thought this hybrid trend showed having grown from eventual techniques, to be a common and key component in large knowledge graph infrastructures that are now basically hybrid in essence (similarly to how modern processors have dedicated hardware for neural tasks).

I thought this was quite noticeable in the keynotes of Larry Hunter and Guotong Xie, both focused on biomedicine applications of large ecosystems of blended symbolic/subsymbolic representations. The limitations that hybrid KG want to overcome is that machine learning alone suffers from data bias; while knowledge representation alone suffers from nuanced exceptions in local data. Both projects build large hybrid KG infrastructures to integrate diseases, products, treatments, etc.; automating linking where possible (with ML) but also leveraging reasoning (with KR) for answering questions around the “why” of phenomena. An interesting point here is that at this scale hypothesis management is really needed and KGs can truly help (if you’re interested in this, we’re recruiting). I thought that the techniques displayed in both projects were truly impressive: computing the transitive closure of the graph and then computing node embeddings for compound similarity; using GPT-3 for biomedical texts; or daisy chaining BERT, LSTM and CRF for NER and linkage. It was mind-blowing to see some of these pair with physician performance at diagnosis –so much has changed since the days of expert systems. It was nice to see also a good set of papers on scientific knowledge graphs as an example of these blooming ecosystems —AI-KG: an Automatically Generated Knowledge Graph of Artificial Intelligence being a good example.

The Acropolis of Athens

All in all, I thought this was an excellent ISWC edition with top-notch quality papers and high attendance and engagement, despite the ongoing pandemic. Huge thanks to the marvellous work of the organisers, who undoubtedly had put together the best online event of the year (at least regarding my own experience). I can’t wait to see all of you again, hopefully face to face, next year in Albany!

Misc notes