My last trip of the year was in the remote Auckland, New Zealand for the International Semantic Web Conference (ISWC 2019), the main venue for Knowledge Graphs, Web-based Knowledge Bases and AI. This report is somewhat delayed because you don’t travel often to your antipodes so I spent an extra couple of days to explore these unique and extraordinary lands (thanks Ingrid for the beautiful pictures!).
So, having said what an awesome place New Zealand is, let’s get to business. ISWC had this year 308 delegates from 40 countries, 50% of them from Europe. The research track had 42 accepted papers out of 194 submissions (22% acceptance rate) which is in the line of previous editions. Topics in papers had a large variety (we say this every year, but this was the case perhaps more than ever); databases, data management, querying and knowledge graphs were more prominent.
I’m basing this report on the extensive live notes (38 pages!) that Michael Cochez and I wrote during the event. Feel free to dive in the document for specific paper details, as it will be hard to fit everything in this post. I also attended the workshops QuWeDa and SemStats (which was great to come back at, and see that important topics such as getting more statistical offices on board or the use of provenance have taken up in the community); and the GraphQL tutorial (kudos to the great Olaf Hartig and Ruben Taelman for the fantastic crash course).
About the main conference, to me the main topics of this year (always subject to my own bias) were: For Knowledge; Dataset Work; and Enterprise Knowledge Graphs.
This point was perfectly crystallized in the keynote by Jérôme Euzenat (from which I happily reused the title for this subsection). His main point is that in the Semantic Web community we have been setting the spot for quite some time on data, from where we have reaped great benefits; but data can only take us so far, and by contrast knowledge truly lies at the core of every human civilization. I also found interesting his observations on knowledge discovery (based on data, but needs re-training) and knowledge transmission (which requires articulation and is at the core of various intelligent tasks). Basically he called for sharing knowledge (“who wants to stand of the shoulders of data?”) and to revisit the fields of eScience (with a more central role of experiment representation in scientific reporting) and knowledge evolution (e.g. natural and cultural selection is heavily based on various forms of knowledge).
Since representing scholarly workflows gets more important in my own research and project context, and I intensively worked during my PhD in concept drift for SW/DH datasets, I found these fundamental topics to think about as cases with a clear human-infused knowledge side.
I liked the contrast of this with many papers that were on the edge between symbolic and subsymbolic representation and their blending, like the work of Kristiadi et al. on Incorporating Literals into Knowledge Graph Embeddings and in general the role of representation learning as a proxy between linguistic and structured knowledge.
With this section I wanted to collapse many papers that in one way or another tried to answer the question “what is this dataset/endpoint all about?”. This is an obviously necessary task to automate to lower Knowledge Graph publishing and findability costs. Wang et al. propose a mechanism for generating dataset snippets, some kind of dataset samples that are useful for dataset search with respect to keywords and queries. Instead of computing these dataset descriptors, Hasnain et al. propose a central repository of VoID-like SPARQL endpoint descriptors (SPORTAL) in order to find relevant data. Other relevant works looked at validating SHACL constraints over SPARQL endpoints (which got the best research paper award –congrats!), or monitored and assessed the quality of data in public SPARQL endpoints.
In many of these approaches scalability is a central question and various other papers tried to deal with it. I really enjoyed the use of SANSA for assessing Linked Data quality at scale (with 200GB of semantic data analysed in barely 3 minutes); and in Sparklify for efficient evaluation of SPARQL queries. From closer colleagues, the LOD-a-lot analysis over class equivalence and subproperty relations shows evidence that knowledge engineers put more effort into class hierarchies than property hierarchies. We probably had an intuition about this, but it’s just cool to be able to compute it in 4 hours in an affordable laptop. Also on scalability I presented work on benchmarking efficient querying of RDF Lists.
Enterprise Knowledge Graphs
If the success of knowledge transfer for scientific field can be at least partly measured on its technologies penetration in industry I think the Semantic Web community can be quite happy. Besides the 16 industry track (thanks Christophe Guéret and colleagues) papers, many research, in-use and resource track papers had big industry names on them. The Microsoft Academic Knolwedge Graph promises to be a core resource in scholarly research, with 8B triples and links to Wikidata, OpenCItations, GRID, etc. and the provision of embeddings as representation. The use of OWL and SW technology at Pinterest highlights many lessons regarding knowledge engineering in large organisations that I found incredibly valuable. The deployment of the Smart Topic Miner in Springer Nature sets a landmark success story on using semantics for automating and enriching scholarly publishing workflows. Even the first keynote of the conference by Dougal Watt was a recognition to industry as a (if not the) key innovator in Semantic Web technologies. His talk was a call to finally moving to a data/knowledge centric (as opposed to application-centric) ecosystem, and left a must-read wish list of technological feats to accomplish (we really need to get to those standard SPARQL transactions):
In summary, I thought this was a really rich and heterogeneous edition of the conference, with many traditional topics like querying and scalability being well represented; and many new ones like blending representations solidifying quickly. I look forward to seeing all of you again next year in Athens!
- SemanGit: A Linked Dataset from git — really cool resource for both developers and software scholars
- 11 papers were submitted to the reproducibility track, of which only 3 could be directly reproduced and the others required extra work. This is to say this is an incredibly hard task and as a community we should be grateful to Michael Cochez and the rest of the initiative. We should make this grow!
- Last keynote — by the astrophysicist Melanie Johnston-Hollitt showing true big data problems and a role model for thinking on requirements when designing instruments of measure –a lesson to be learned
- Auckland — where awesome jazz clubs sit in basements and cover The Legend of Zelda games songs
- Huge thanks to the conference organizers –especially Fabien Gandon
- Get the bigger picture and don’t miss other great ISWC 2019 trip reports by Juan Sequeda and Armin Haller