TL;DR: I did a very simple analysis on Spotify data to try to find the saddest and happiest songs out there. I created playlists with the results: saddest songs [Spotify, YouTube], happiest songs [Spotify, YouTube]. All the code used for this project is available here.


Earlier this year I attended McHacks with a friend. We had never participated in a hackathon before and were curious to see what the whole thing was about. Though we went there with our own project (a sentiment analysis system for tweets about refugees that never worked), we ended up partnering with a team that had another idea in mind that involved a lot of different fields: retrieve a copy of several sad and happy songs, extract their features using signal processing techniques and build a classifier able to predict which songs belong to which mood.

We never completed it, but were very advanced when we had to end the whole thing (it didn't help that we decided to go home to sleep and that I —all my fault— decided to go get a proper brunch the following morning before returning to coding). We were using Spotify's API through spotipy to look for playlists that contained our key terms (sad and happy) and then, using the wonderful youtube-dl library for Python, download the most tagged songs from YouTube in MP3 format. That we managed to accomplish. We ran out of time while we were trying to figure out how to properly use librosa for the feature extraction step, and we never started coding the classifier.

This gave me an idea for a small side-project aimed at answering the following questions: what are the saddest and happiest songs out there? Surely there is enough data in Spotify to give it a try. So this is the idea: get the songs included in all playlists that include the relevant search terms and related synonyms (for happy, for sad) and then check which songs are the ones that appear the most.

For that, as during the hackathon, I used Python with the spotipy library and stored the results in SQLite databases. I took care not to include data from any playlist twice. For the final data analysis I went to R, as I normally do. All the code using for this project can be found on this github repository.

For the sad songs, I used the following terms:

sqlite> SELECT DISTINCT(term) FROM songs;
term      
----------
sad       
low       
down      
bitter    
dismal    
heartbroke
sorrow    
somber    
sorry     
gloomy    
glum      
grieve    
hurt      
troubled  
weep      

I retrieved a total of 1410998 entries (327986 different songs) from 11330 playlists. I specifically didn't include the sad term blue because it is too related to a music genre.

Same thing for the happy songs:

term      
----------
happy     
cheerful  
delighted 
ecstatic  
elated    
glad      
joy       
jubilant  
thrilled  
upbeat    
sunny     
blissful  
content   

In this case, I got 1483833 entries (358937 different songs) from 8931 playlists.

One idea that comes directly to mind is to compute how many times a given song appears on each set of playlists and use that as an indicator. Let's see how that would work. I loaded the data on R using the RSQLite library using the following query:

SELECT (SELECT COUNT(DISTINCT(uri)) FROM songs) AS total,
       COUNT(*) AS count, artist, title 
FROM songs 
GROUP BY artist, title

So, let's see the top 5 sad songs:

head(sad[order(-sad$count), c("artist", "title", "count")], 5)
                  artist         title   count
7382               Adele         Hello    2236
147256     Justin Bieber Love Yourself    1969
147296     Justin Bieber         Sorry    1609
4336   A Great Big World Say Something    1479
85452         Ed Sheeran    Photograph    1395

What about the happy ones?

              artist              title   count
161801 Justin Bieber              Sorry    1815
161761 Justin Bieber      Love Yourself    1537
161820 Justin Bieber  What Do You Mean?    1435
323564    The Weeknd Can't Feel My Face    1392
343933 WALK THE MOON  Shut Up and Dance    1334

You can immediately see there is something very fishy here. Justin Bieber's Love Yourself and Sorry are, simultaneously, some of the happiest and saddest songs out there. There are two issues here: 1) people don't always tag in a very coherent way; 2) more importantly, we have to take into account that very popular songs may pop up in our datasets just because they are popular. We cannot do anything to correct for the first problem, but the second one is something that can be solved by getting another dataset: a control one, with playlists obtained with neutral terms that do not have anything to do with the moods that we are trying to isolate. Indeed, I also downloaded that one using:

term      
----------
music     
song      
top       
hits      
favourite 
like      
best     

With that I downloaded 12311154 entries (1602312 different songs) from 50102 playlists with terms as dull as the ones you just read; my script crashed before retrieving all available playlists, but I chose not to restart it, as I had already gathered more than enough data. I will use this database to control for the popularity of a given song.

What I did now was the following, and I know this is a very rough approximation, but I couldn't think of a better way of doing it: I modeled the data as a binomial distribution, with the number of different playlists like the number of trials and the number of times each song is included as the number of successful events. I used R's prop.test function for this, as binom.test was too slow and I my sample was big. I computed the upper confidence interval (at 95 %) for the control and the lower one for each mood, and I then divided those ratios. With that, I want to obtain a conservative estimation of how much more a given song is tagged per mood compared to the general population provided by the control playlists.

As a precaution, and to avoid very spurious results, I removed those songs that were not tagged at least 10 times. If this threshold is modified, the final classification will change, and songs that are more popular will show up, so I've generated two different rankings: one with the previous threshold, and ax extra-popular one with 200. Here's the first one, from sad to happy:

                                                                                   artitle orig_mood label
                                                                 Kim Taylor - Build You Up 11.980122   sad
                                                                       Seahaven - Honeybee 10.572491   sad
                          The Cinematic Orchestra - To Build A Home (feat. Patrick Watson)  9.122422   sad
                                                              The Story So Far - Navy Blue  8.523226   sad
                                         Michael Schulte - You Said You'd Grow Old With Me  8.318033   sad
                                          Andrew Belle - In My Veins - Feat. Erin Mccarley  8.186986   sad
                                                              Ingrid Michaelson - Over You  8.035577   sad
                                                                Mikelwj - Please Don't Cut  7.478537   sad
                                                                  Birdy - Not About Angels  7.063241   sad
                                                       Jamestown Story - Goodbye I'm Sorry  6.991294   sad
                                                        Andrew Belle - Make It Without You  6.862819   sad
                                                                       Daughter - Medicine  6.614898   sad
                                                          Flatsound - Don't Call Me At All  6.482429   sad
                                          Keaton Henson - You Don't Know How Lucky You Are  6.168237   sad
                                                                     Real Friends - Hebron  6.114325   sad
                                                    Katy McAllister - Another Empty Bottle  6.103101   sad
                                                       Real Friends - I've Given Up on You  5.930843   sad
                                                                      Hotel Books - Nicole  5.927413   sad
                                                                          Sia - Breathe Me  5.876277   sad
                                                               Sew Intricate - If You Knew  5.841148   sad
                                                                  All Time Low - Lullabies  5.833872   sad
                                                      Kodaline - Love Like This - Acoustic  5.828360   sad
                                                            Keaton Henson - Flesh And Bone  5.693119   sad
                                                             Britt Nicole - When She Cries  5.564704   sad
                                                         Citizen - The Night I Drove Alone  5.491908   sad
                                                                 Ingrid Michaelson - Be OK  2.848528 happy
                                                                        CHAPPO - Come Home  2.861424 happy
                                                                             Yuna - Rescue  2.885926 happy
                                                            The Griswolds - Beware the Dog  2.888610 happy
                                                           Rusted Root - Send Me On My Way  2.921322 happy
                                                                              JR JR - Gone  2.926596 happy
                                                                        Oh Honey - Be Okay  2.946616 happy
                                                                      BØRNS - Seeing Stars  2.961350 happy
                                                                  Perrin Lamb - Little Bit  2.967613 happy
                                                    Colbie Caillat - Brighter Than The Sun  2.972332 happy
                                                   Ray LaMontagne - You Are the Best Thing  3.061297 happy
                          Jimmy Cliff - Wonderful World, Beautiful People - Single Version  3.067851 happy
                                                                 Hunter Hunted - Lucky Day  3.129721 happy
                                                                Wild Cub - Thunder Clatter  3.142382 happy
                                                                   Morningsiders - Empress  3.173426 happy
                                                                  Vinyl Pinups - Gold Rays  3.184869 happy
 Sharon Jones & The Dap-Kings - I Just Dropped In To See What Condition My Condition Is In  3.248935 happy
                                                                Jamie Lidell - Another Day  3.257592 happy
                                                                  Twin Forks - Back To You  3.268261 happy
                                               Brett Dennen - Comeback Kid (That's My Dog)  3.540849 happy
                                            Oh, Hush! - Happy Place (feat. Hanna Ashbrook)  3.681280 happy
                                                                  The Well Pennies - Drive  3.900415 happy
                                        Michael Franti & Spearhead - The Sound Of Sunshine  4.233915 happy
                                 Michael Franti & Spearhead - I’m Alive (Life Sounds Like)  4.411475 happy
                                                                       Ezra Vine - Celeste  4.664254 happy

The orig_mood column tells how many more times the song was tagged as the given mood with respect to the control dataset. For instance, Kim Taylor's Build You Up is tagged as sad almost 12 times more than it is tagged with the control keywords. The more popular list looks like this:

                                                   artitle orig_mood label
                                  Birdy - Not About Angels  7.063241   sad
                                          Sia - Breathe Me  5.876277   sad
                                    All Time Low - Therapy  5.317348   sad
                                          Tom Odell - Heal  5.287393   sad
                            Ron Pope - A Drop In The Ocean  4.640795   sad
                         All Time Low - Remembering Sunday  4.460468   sad
                              A Fine Frenzy - Almost Lover  4.407335   sad
                 The Cinematic Orchestra - To Build A Home  4.395410   sad
                                     Kodaline - All I Want  4.284281   sad
                         Mayday Parade - Miserable At Best  4.254062   sad
                                         Birdy - Tee Shirt  4.247338   sad
                              The Lumineers - Slow It Down  4.232987   sad
                        Alex & Sierra - Little Do You Know  4.193428   sad
                           Mayday Parade - Terrible Things  4.065268   sad
                                    Damien Rice - 9 Crimes  3.877873   sad
                                          Daughter - Youth  3.858343   sad
             Justin Bieber - Nothing Like Us - Bonus Track  3.850651   sad
                                            Maroon 5 - Sad  3.740168   sad
                               Sam Smith - Not In That Way  3.680314   sad
             Iron & Wine - Flightless Bird, American Mouth  3.578837   sad
                                       Amber Run - I Found  3.570509   sad
               Haley Reinhart - Can't Help Falling in Love  3.514237   sad
                                       Bastille - Oblivion  3.504239   sad
                               Jaymes Young - I'll Be Good  3.497401   sad
                                       Birdy - Skinny Love  3.437574   sad
     Stevie Wonder - Signed, Sealed, Delivered (I'm Yours)  2.040985 happy
                            Jack Johnson - Banana Pancakes  2.135602 happy
                     Bobby McFerrin - Don't Worry Be Happy  2.178093 happy
                             Vampire Weekend - Unbelievers  2.207791 happy
                                Passion Pit - Carried Away  2.212609 happy
              Daryl Hall & John Oates - You Make My Dreams  2.224245 happy
                            Bleachers - I Wanna Get Better  2.260334 happy
                        Fitz and The Tantrums - The Walker  2.270052 happy
 Daryl Hall & John Oates - You Make My Dreams - Remastered  2.283607 happy
                         Noah And The Whale - 5 Years Time  2.292561 happy
                                   Grouplove - Tongue Tied  2.338586 happy
                               MisterWives - Our Own House  2.359122 happy
                                   Matt and Kim - Daylight  2.386265 happy
                                     Saint Motel - My Type  2.391831 happy
                                    Grouplove - Ways To Go  2.433496 happy
                 Edward Sharpe & The Magnetic Zeros - Home  2.493877 happy
                                 MisterWives - Reflections  2.530433 happy
                                     NONONO - Pumpin Blood  2.587748 happy
                              The Mowgli's - San Francisco  2.685225 happy
                                     BØRNS - Electric Love  2.718853 happy
                  Corinne Bailey Rae - Put Your Records On  2.732802 happy
                                   The Mowgli's - I'm Good  2.747544 happy
          Paul Simon - Me and Julio Down by the Schoolyard  2.796451 happy
                           Rusted Root - Send Me On My Way  2.921322 happy
                    Colbie Caillat - Brighter Than The Sun  2.972332 happy

I've experimented with ways of visualizing these results, but I couldn't come up with anything that looked good; I've left some of my experiments in the R analysis file stored in the code repository, if you want to check what I tried.

In any case, we are talking about music here, so instead I created playlists with the results: saddest songs [Spotify, YouTube], happiest songs [Spotify, YouTube]. Enjoy!


If you create a playlist with these songs using a different service (like Google Music or Apple Music), please drop me a line and I will update the article with a link to it.