Select Page

Research has shown that when we listen to music, it impacts the way we perceive the world. We can encapsulate this phenomenon as musical mood, which is an alignment of the stylistic elements of music with human emotion. This inherent connection between music and mood provides a natural framework for organizing and discovering music. While musical genre is a useful tool for the same task, it’s a patchwork of descriptors that can make exploring music difficult for the uninitiated. Someone searching for a song may not connect to a genre labeled “Alternative Pop Rock” without prior exposure. However, a mood labeled “Soft Tender / Sincere” is more fundamentally easy to understand. As a company responsible for processing, storing, and distributing much of the world’s music metadata, Gracenote has a difficult task in determining the mood of hundreds of millions of songs – and a unique opportunity. With the ever-larger music catalogues of our customers such as Apple and Spotify, the only feasible way to grab this bull’s horns is with computation. But how can a computer begin to comprehend the complex harmonies, melodies, and rhythms that construct a musical mood?

Enter AI and Deep Learning

As a forward-thinking data company, one of the ways Gracenote has invested in innovation is through its Applied Research group, which develops new technologies that fuel future products. This article describes one recent return on this investment: a mood classification system called Mood 2.0 that is powered by the latest breakthroughs in Artificial Intelligence and Machine Learning. Mood 2.0 is an update to Mood 1.2, our existing mood classification system that currently enables mood-based music applications around the world. The new iteration features a significantly more refined mood space (436 mood labels for Mood 2.0 vs 101 mood labels for Mood 1.2) and delivers a 33% increase in performance over Mood 1.2. And where Mood 1.2 uses Gaussian Mixture Models, Mood 2.0 classification utilizes Deep Learning, a method employing multi-layered neural networks for training computers how to understand structure in data.

In just the last few years Deep Learning has been breaking barriers in benchmark tasks for machine perception like image and speech recognition, making it an ideal tool for computing musical mood. To train a model to compute the mood of a song, several hundred mathematical values – called features in a machine learning context – are calculated from the audio signal. Each feature captures some aspect of the mood of the song such as its rhythm or harmony. Then these features are passed through the neural network, which outputs a probability distribution over the 436 moods. This output is compared to what is known to be the correct mood of the song – called the ground truth – and the neural network parameters are updated to improve its performance. After many iterations of this training process the network learns how to properly identify the mood and can be used to calculate mood for an unknown song.

Some Tech

To train a neural network you generally need a lot of data and computing power. The training computation is done using multiple GPUs (Nvidia® GeForce® GTX TitanX, 6GB Ram), which speeds the iteration process significantly. The computer code for training the mood classifier, like many scientific computing applications, is written in Python. Libraries Scipy/Numpy (scientific computing) and Theano/Lasagne (GPU and neural networks) are utilized for quick prototyping. For the production classification system (Figure 1), the trained classifier is housed in Amazon Web Services (AWS) for scalability, allowing quick parallel processing of thousands or millions of songs.

Figure 1. Mood 2.0 Classification System Parallel Architecture

Form Factor

So what does it look like? A song can be composed of multiple moods, perhaps necessarily, and we capture this in the Mood 2.0 profile. The profile is a vector computed by post-processing the probability distribution from the neural network and represents the presence of each mood in the song. Table 1 shows the computed Mood 2.0 profile for “Give it Away” by the Red Hot Chili Peppers, where you can see the presence of different moods detected by the classifier. In listening to the track, I think you’d agree these mood labels with their score do a nice job of describing the mood of the track.

Table 1. Example Mood Profile for “Give it Away” by Red Hot Chili Peppers

Mood Label

Score

Loud n’ Scrappy

38%

Wild Loud Dark Groove

24%

Urgent / Frustrated Pop

15%

Tightly Wound Excitement / Positive Frustration

9%

Anger / Hatred

6%

Teenage Loud Fast Positive Anthemic / Melodic

3%

Alienated Anxious Groove

2%

Let’s look at a few more. Table 2 shows the top 3 computed moods for some popular tracks. I’ve purposely chosen tracks from a wide variety of genres to give readers of different musical tastes something that may be familiar. Click on the track name to listen to the song in YouTube or in Spotify when I didn’t find an official video uploaded by the artist. Again, these labels and scores do a nice job describing the mood of the audio, but where it starts to get interesting is when we see relationships between tracks from different artists/genres/etc. For example, the secondary computed mood matches for “Basket Case” by Green Day and “Stupify” by Disturbed, indicating a common mood undertone. It’s this type of rich interaction that paves the way for better music systems.

Table 2. Top 3 computed moods for some popular songs

Track

Artist

Primary Mood Label

Primary Mood Score

Secondary Mood Label

Secondary Mood Score

Tertiary Mood Label

Tertiary Mood Score

Locked Out Of Heaven

Bruno Mars

Carefree Soaring Bliss Party People Groove

0.439

Edgy Dark Fiery Intense Pop Beat

0.329

Latin Boom Boom Sexy Party Trance Beat

0.165

Giant Steps

John Coltrane

Dark Energetic Abstract Groove

0.425

Lively “Cool” Subdued / Indirect Positive

0.303

Happy Energetic Abstract Groove

0.149

(You Make Me Feel Like) A Natural Woman

Aretha Franklin

Slow Strong Serious Soulful Ballad

0.402

Sad Soulful Jaunty Ballad

0.234

Bare Emotion

0.177

Tears In Heaven

Eric Clapton

Tender Lite Melancholy

0.406

Sober / Resigned / Weary

0.221

Soft Tender / Sincere

0.16

Basket Case

Green Day

Loud Fast Dark Anthemic

0.277

Gothic Haunted Overdrive Beast

0.167

Aggressive Crunching Power

0.167

Stupify

Disturbed

Aggressive Evil

0.319

Gothic Haunted Overdrive Beast

0.206

Anger / Hatred

0.179

Believe

Cher

Power Boogie Dreamy Trippy Beat

0.286

Passionate Dark Dramatic Fiery Groove

0.221

Dark Gritty Sexy Groove

0.182

Slide

Goo Goo Dolls

Positive Flowing Strumming Serious Strength

0.358

Dark Loud Strumming Ramshackle Ballad

0.196

Loud Overwrought Heartfelt Earnest Bittersweet Ballad

0.184

All Summer Long

Kid Rock

Ramshackle Jaunty Rock

0.387

Whatever Kick-Back Loud Party Times

0.255

Sassy

0.226

Can It Be All So Simple / Intermission

Wu-Tang Clan

Dark Cool Calm Serious Truthful Beats

0.479

Kick-Back Dreamy Words & Beats

0.202

Flat / Speech Only

0.119

You’re Still A Young Man

Tower Of Power

Soulful Solid Strength & Glory

0.367

Slow Strong Serious Soulful Ballad

0.3

Poseur Earnest Uplifting Ballad

0.143

Georgia On My Mind

Ray Charles

Sweet & Tender Warm Mellow Reverent Peace

0.633

Tender Sad

0.197

Dreamy Romantic Lush

0.089

Bird’s Eye

Now what do we do with all of this data? Applications abound from music discovery to music therapy to mixing cocktails. Let’s start by seeing how musical mood is distributed in the wild. In our first efforts to transition Mood 2.0 into a Gracenote product, we’ve generated Mood 2.0 on our first million tracks and below are the most common musical moods.

  1. Dramatic – Strong Emotional Vocal
  2. Dramatic – Strong Positive Emotional Vocal
  3. Bitter
  4. Power Dreamy Beat
  5. Serious Measured Powerful Emotive Tenderness
  6. Dismay / Awfulness / Bad Scene
  7. Flat Dance Groove – Mechanical
  8. Lyrical Romantic Bittersweet
  9. Romantic Dark Energetic Complex
  10. Tender Lite Melancholy

You may or may not find these surprising, but to me these reflect a mix of emotions I might experience on any given day. So we see here another reflection of the connection between music and the human experience.

Where Next?

In future posts we’ll discuss more of the technology behind and applications of Mood at Gracenote, because here we have only scratched the surface. For example, musical mood often changes throughout a song so we are currently extending Mood 2.0 classification to capture this. Having mood information in a timeline opens new possibilities for a one-to-one musical experience. Automated mood lighting to music for your home theatre, anyone?

We’ll also explore the relationship between mood and other music metadata at Gracenote. Each attribute of metadata that we catalogue (e.g. Mood, Genre, Language, Origin, Era, etc.) is useful on its own for organizing music, but they compound when used together. The answers to enquiries such as the variance of moods within a genre or the variance of languages within a mood further empower users of music listening services and allow Gracenote to push the boundaries of musical understanding at a global level.

by Cameron Summers | August 9, 2016