Trip Report: ISMIR 2018

Last week I attended ISMIR for the first time, in its 19th conference which happened to come back to Paris where it started. This is the major venue for researchers in Music Information Retrieval (MIR), covering a broad set of communities and backgrounds including computer science, musicology, AI, psychology, ethnography, etc. The conference had 235 paper submissions, of which 104 were accepted (44.2%).

For the first time, the presentation format was 4 minutes plus a 1 hour poster session. So the conference sessions were slots of 1.5h with around 18 posters, and a 1h poster session right after, which was nice to get paper insights and direct exchange with the authors. I heard mixed responses to this format: while a few people missed the in-depth related-work discussions, experiment results, etc. of more classic presentations, in general presenters were very comfortable, and personally I enjoyed the continuous engagement in the poster sessions (although this adds x10 to your average conf day energy drop because it’s pretty intense):

To me, the big topics of the conference were: Deep Learning everywhere, the role of symbolic representations, and datasets and interactive tools.

Deep learning everywhere

Without big surprise, MIR researchers massively use deep learning to learn cognitive functions related to music. These are typically well-defined tasks in the MIREX competition. For example, Takumi Takahasi et al. use CNNs and CRNNs to recognize instrumentation with low latency, generating nice instrument visualizations on audio. A general message in papers doing feature separation is that CRNNs outperform MLPs in pretty much any specific task. In the case of instrument recognition, the general use case can be considered solved, although more specific tasks, like recognizing accompanying instrumentation in jazz solos, are more challenging.

About this, a really interesting finding, and perhaps my favourite paper of the conference, was Jordi Pons et al. large scale end-to-end learning of music audio, which won the best student paper award (Congrats Jordi!). This work is about scaling up models learned with CRNN via spectrograms, which are the standard audio representation mechanism. What they found is that beyond the threshold of 1M songs, feeding audio representations directly into the network (instead of the spectrograms) starts to pay off, and leads to more performant models. The larger datasets are unfortunately private. In the poster session an interesting discussion point was why this shift in convenient representations is the case, which arises more questions (i.e., what specific knowledge in the new representation is being leveraged?) and I think is well aligned with the call of having a better explainable AI (this was a recurrent topic in the conference).

The role of symbolic representations

Symbolic music representation (in MIDI, MusicXML, etc.) is also a big topic in the conference, more in musicologists’ side. I found the work of Christof Weiss et al. really interesting, in the sense that their studies relying on symbolic representations perform generally worse than those using sub-symbolic (audio-based) methods, although they display the same trending curves:

On this line, but from a different perspective, the work of Daphne Odekerken et al. on estimating chords makes an excellent point on how to use symbolic music background knowledge (in the form of MIDI and tab files) to improve the performance of models otherwise only fed with audio data. I found this to be a great spot to introduce our work on Symbolic Music Knowledge Graphs as support knowledge bases for this kind of workflows.

Linked Open Data was also very present, and I enjoyed the work of Pasquale Lisena and his colleagues in DOREMUS on integrating vocabularies for musical datasets. There is a great opportunity here to bring further integration into symbolic music representations, mixing scores, MIDI files, MusicXML, MEI and metadata databases. The work on OpenMIC and Julie Cumming et al. also made fundamental points to me (especially considering my Semantic Web background) on methodologies for encoding and integrating symbolic corpora –this is way more challenging than one could expect from the outside, as different encodings cover different parts of notation, there’s lots of ambiguity, and systematic methods and tools need further work.

Interestingly, some outcomes of papers fitting this topic are ready to integrate to some well-known Semantic Web projects, like this work on extracting features from crowd-sourced recordings which could well enrich this ISWC 2017 dataset describing the Live Music Archives in RDF :-)

I also attended Digital Libraries for Musicology (DLfM) on the last day of ISMIR (devoted to satellite events). This is the forum for musicologists and music data publishers to come together and share ideas. Without big surprise there was a lot of datasets presented (many based on IRCAM’s data), OMR (Optical Music Reconigition), tree-like grammar generation (GTTM), and quite a bit of Linked Data, including the fantastic MELD, JazzCats, and our very own MIDI Linked Data. My favourite paper though was Nestor Napoles et al., Encoding Matters, on experiments around translating between different encodings (MusicXML, MEI, etc.) and finding that either software bugs, ambiguities in encodings, or human errors make encodings quite incompatible. Overall this was a greatly inspiring event, with some takaways for me on encoding interoperability and the importance of providing good data provenance in digital libraries.

Datasets and interactive tools

Plenty of datasets also took the spot in ISMIR, which I think is great for reproducibility and Open Science. This collection tries to gather the ones used for MIR tasks. Being a huge videogame geek, one of my favourites was the NES-MDB work by Chris Donahue et al. at UC San Diego which collects, models, and renders with unparalleled trustworthiness videogame music from the legendary NES system.

On this part though the big spot was for Rebecca Fiebrink, who in a fantastic keynote argued in favor of poor-performing, small-data trained machine learning models for awesome live musical instrument creation (spectacularly demoed live). The idea here is a strong focus on users, who can easily train models with their webcams and bodies, quickly deploying musical instruments based on body movement, synth distortion, object manipulation, etc. Her work on the Wekinator is famously used by lots of artists and is the perfect example of how non-general models get low scientific interest, but are incredibly useful to musicians (users). The big takeaway was thus again on explainable AI (why does the model produce this output based on this input?), but also on UI/UX design, by breaking the assumption that users always perfectly know what they want (especially in music generation, most users don’t want full control on decisions).

Overall, ISMIR 2018 was my first ever experience with the MIR community and I learned a great deal,  not just about Music Information Retrieval, but about how AI is quickly deploying in applied domains and its consequences (an active topic of discussion in our recently approved DARIAH working group on AI and Music); and how music and computing s an extraordinarily passionate way of doing science. Hope to repeat for years to come!

Random notes: