Cover Art Classification: Doom Metal vs. Classical
Continuing my journey through Chapter 2 of the fast.ai course (first mentioned here), I decided to follow the example of the bear classifier with my own, slightly different classifier – one that is trained to tell the difference between album covers in two very distinct musical genres – doom metal and classical.
If you’d like to try it out for yourself, you can click this button to spin up a Binder container that hosts the classifier as a Voila app:
The accuracy is actually pretty decent, reaching a success rate of about 93% after training on a dataset of about 1100 images pulled via the MusicBrainz API and its friend the Cover Art Archive.
The full Github repo, including the notebook I used to download the album covers and to train the network, is here.
Some interesting (?) things I’ve noticed while testing out the classifier:
When I think of classical albums that my dad had around when I was a kid, I think of either (1) a painting of some old composer’s face in profile, and (2) a bright yellow rectangle at the top of the cover containing the title of the work and the composer’s name. And sure enough, seems like the network recognizes those features pretty strongly as well:
And when I think of doom metal albums, I think of (1) nearly-unreadable band names written in that cool creepy spidery text and (2) scary satanic monsters. The network seems to understand this too:
Here’s the confusion matrix generated during training:
Notice that it’s relatively rare that the network misidentifies doom metal covers as classical, but it’s about three times more likely that classical covers will be misclassified as doom metal. I suppose this means that doom metal albums have more readily identifiable and consistent stylistic features? It would be interesting to dig into the model and see what features it’s pulling out at each stage of the network for a few representative examples… maybe next week?
Another good followup project might be to train a network to identify the actual music from these two genres based on images of the spectrograms generated by analyzing certain representative songs. It would take a bunch more work to generate those images, but it would be pretty cool (and I think it would work well, unless the doom metal happens to include a ton of orchestration!)