Research has shown that when we listen to music, it impacts the way we perceive the world. We can encapsulate this phenomenon as musical mood, which is an alignment of the stylistic elements of music with human emotion. This inherent connection between music and mood provides a natural framework for organizing and discovering music. While musical genre is a useful tool for the same task, it’s a patchwork of descriptors that can make exploring music difficult for the uninitiated. Someone searching for a song may not connect to a genre labeled “Alternative Pop Rock” without prior exposure. However, a mood labeled “Soft Tender / Sincere” is more fundamentally easy to understand. As a company responsible for processing, storing, and distributing much of the world’s music metadata, Gracenote has a difficult task in determining the mood of hundreds of millions of songs – and a unique opportunity. With the ever-larger music catalogues of our customers such as Apple and Spotify, the only feasible way to grab this bull’s horns is with computation. But how can a computer begin to comprehend the complex harmonies, melodies, and rhythms that construct a musical mood?
Enter AI and Deep Learning
As a forward-thinking data company, one of the ways Gracenote has invested in innovation is through its Applied Research group, which develops new technologies that fuel future products. This article describes one recent return on this investment: a mood classification system called Mood 2.0 that is powered by the latest breakthroughs in Artificial Intelligence and Machine Learning. Mood 2.0 is an update to Mood 1.2, our existing mood classification system that currently enables mood-based music applications around the world. The new iteration features a significantly more refined mood space (436 mood labels for Mood 2.0 vs 101 mood labels for Mood 1.2) and delivers a 33% increase in performance over Mood 1.2. And where Mood 1.2 uses Gaussian Mixture Models, Mood 2.0 classification utilizes Deep Learning, a method employing multi-layered neural networks for training computers how to understand structure in data.
In just the last few years Deep Learning has been breaking barriers in benchmark tasks for machine perception like image and speech recognition, making it an ideal tool for computing musical mood. To train a model to compute the mood of a song, several hundred mathematical values – called features in a machine learning context – are calculated from the audio signal. Each feature captures some aspect of the mood of the song such as its rhythm or harmony. Then these features are passed through the neural network, which outputs a probability distribution over the 436 moods. This output is compared to what is known to be the correct mood of the song – called the ground truth – and the neural network parameters are updated to improve its performance. After many iterations of this training process the network learns how to properly identify the mood and can be used to calculate mood for an unknown song.
Some Tech
To train a neural network you generally need a lot of data and computing power. The training computation is done using multiple GPUs (Nvidia® GeForce® GTX TitanX, 6GB Ram), which speeds the iteration process significantly. The computer code for training the mood classifier, like many scientific computing applications, is written in Python. Libraries Scipy/Numpy (scientific computing) and Theano/Lasagne (GPU and neural networks) are utilized for quick prototyping. For the production classification system (Figure 1), the trained classifier is housed in Amazon Web Services (AWS) for scalability, allowing quick parallel processing of thousands or millions of songs.
Figure 1. Mood 2.0 Classification System Parallel Architecture
Form Factor
So what does it look like? A song can be composed of multiple moods, perhaps necessarily, and we capture this in the Mood 2.0 profile. The profile is a vector computed by post-processing the probability distribution from the neural network and represents the presence of each mood in the song. Table 1 shows the computed Mood 2.0 profile for “Give it Away” by the Red Hot Chili Peppers, where you can see the presence of different moods detected by the classifier. In listening to the track, I think you’d agree these mood labels with their score do a nice job of describing the mood of the track.
Table 1. Example Mood Profile for “Give it Away” by Red Hot Chili Peppers
Mood Label |
Score |
Loud n’ Scrappy |
38% |
Wild Loud Dark Groove |
24% |
Urgent / Frustrated Pop |
15% |
Tightly Wound Excitement / Positive Frustration |
9% |
Anger / Hatred |
6% |
Teenage Loud Fast Positive Anthemic / Melodic |
3% |
Alienated Anxious Groove |
2% |
Let’s look at a few more. Table 2 shows the top 3 computed moods for some popular tracks. I’ve purposely chosen tracks from a wide variety of genres to give readers of different musical tastes something that may be familiar. Click on the track name to listen to the song in YouTube or in Spotify when I didn’t find an official video uploaded by the artist. Again, these labels and scores do a nice job describing the mood of the audio, but where it starts to get interesting is when we see relationships between tracks from different artists/genres/etc. For example, the secondary computed mood matches for “Basket Case” by Green Day and “Stupify” by Disturbed, indicating a common mood undertone. It’s this type of rich interaction that paves the way for better music systems.
Table 2. Top 3 computed moods for some popular songs
Track |
Artist |
Primary Mood Label |
Primary Mood Score |
Secondary Mood Label |
Secondary Mood Score |
Tertiary Mood Label |
Tertiary Mood Score |
Bruno Mars |
Carefree Soaring Bliss Party People Groove |
0.439 |
Edgy Dark Fiery Intense Pop Beat |
0.329 |
Latin Boom Boom Sexy Party Trance Beat |
0.165 |
|
John Coltrane |
Dark Energetic Abstract Groove |
0.425 |
Lively “Cool” Subdued / Indirect Positive |
0.303 |
Happy Energetic Abstract Groove |
0.149 |
|
Aretha Franklin |
Slow Strong Serious Soulful Ballad |
0.402 |
Sad Soulful Jaunty Ballad |
0.234 |
Bare Emotion |
0.177 |
|
Eric Clapton |
Tender Lite Melancholy |
0.406 |
Sober / Resigned / Weary |
0.221 |
Soft Tender / Sincere |
0.16 |
|
Green Day |
Loud Fast Dark Anthemic |
0.277 |
Gothic Haunted Overdrive Beast |
0.167 |
Aggressive Crunching Power |
0.167 |
|
Disturbed |
Aggressive Evil |
0.319 |
Gothic Haunted Overdrive Beast |
0.206 |
Anger / Hatred |
0.179 |
|
Cher |
Power Boogie Dreamy Trippy Beat |
0.286 |
Passionate Dark Dramatic Fiery Groove |
0.221 |
Dark Gritty Sexy Groove |
0.182 |
|
Goo Goo Dolls |
Positive Flowing Strumming Serious Strength |
0.358 |
Dark Loud Strumming Ramshackle Ballad |
0.196 |
Loud Overwrought Heartfelt Earnest Bittersweet Ballad |
0.184 |
|
Kid Rock |
Ramshackle Jaunty Rock |
0.387 |
Whatever Kick-Back Loud Party Times |
0.255 |
Sassy |
0.226 |
|
Wu-Tang Clan |
Dark Cool Calm Serious Truthful Beats |
0.479 |
Kick-Back Dreamy Words & Beats |
0.202 |
Flat / Speech Only |
0.119 |
|
Tower Of Power |
Soulful Solid Strength & Glory |
0.367 |
Slow Strong Serious Soulful Ballad |
0.3 |
Poseur Earnest Uplifting Ballad |
0.143 |
|
Ray Charles |
Sweet & Tender Warm Mellow Reverent Peace |
0.633 |
Tender Sad |
0.197 |
Dreamy Romantic Lush |
0.089 |
Bird’s Eye
Now what do we do with all of this data? Applications abound from music discovery to music therapy to mixing cocktails. Let’s start by seeing how musical mood is distributed in the wild. In our first efforts to transition Mood 2.0 into a Gracenote product, we’ve generated Mood 2.0 on our first million tracks and below are the most common musical moods.
- Dramatic – Strong Emotional Vocal
- Dramatic – Strong Positive Emotional Vocal
- Bitter
- Power Dreamy Beat
- Serious Measured Powerful Emotive Tenderness
- Dismay / Awfulness / Bad Scene
- Flat Dance Groove – Mechanical
- Lyrical Romantic Bittersweet
- Romantic Dark Energetic Complex
- Tender Lite Melancholy
You may or may not find these surprising, but to me these reflect a mix of emotions I might experience on any given day. So we see here another reflection of the connection between music and the human experience.
Where Next?
In future posts we’ll discuss more of the technology behind and applications of Mood at Gracenote, because here we have only scratched the surface. For example, musical mood often changes throughout a song so we are currently extending Mood 2.0 classification to capture this. Having mood information in a timeline opens new possibilities for a one-to-one musical experience. Automated mood lighting to music for your home theatre, anyone?
We’ll also explore the relationship between mood and other music metadata at Gracenote. Each attribute of metadata that we catalogue (e.g. Mood, Genre, Language, Origin, Era, etc.) is useful on its own for organizing music, but they compound when used together. The answers to enquiries such as the variance of moods within a genre or the variance of languages within a mood further empower users of music listening services and allow Gracenote to push the boundaries of musical understanding at a global level.
by Cameron Summers | August 9, 2016