Audio Features and Track Popularity

Jessica DeFreese
10 min readDec 7, 2021

Can a track’s audio features predict the popularity of a track on Spotify?

Project Overview and Problem Statement

The music industry has undergone many trends over the ages and developed with the times. What was once considered mainstream is now classic rock. Sub genres are constantly branching from others as artists experiment with popular trends, push the boundaries of genres, and create a new normal. How then, do record labels discern what artists are likely to be worth the investment? Spotify has become a major distributor of music and along the way has gathered data and written algorithms to assign popularity to songs based on streams, shares, and likes on their platform. From this, Spotify develops weekly Top 50 Playlists globally and for many individual countries or regions. Spotify has also developed a list of “audio features” for each song. The audio features include attributes such as acousticness, danceability, time signature, valence, etc. There is a set range for each audio feature allowing for the comparison of different song’s features. Could these features also shed light on what songs are more likely to be popular? While these features are not directly used in the calculation Spotify uses to determine a song’s popularity, is there a correlation? Could record companies use a song’s audio features to determine which artists are likely worth the investment?

In order to address this question, I used Spotify’s API to download the track list of the Global Top 50 playlist and the Top 50 playlists from the United States, Canada, Australia, Ireland, and the UK. The songs, their popularity score, and their audio features were compiled into a data frame to explore the relationship between a song’s audio features and it’s predicted popularity. I will explore this relationship through the stacking of various regression models, specifically standard regression models such as Lasso Regression and ensemble learning models such as Random Forest Regression. Using these regression models, I will predict the popularity score for the test data and calculate the mean squared error of the results.

You can view the code for this analysis here.

Metrics for Success

To determine if the method used to predict popularity scores is effective, I utilized the mean squared error (MSE) of the predictions for each method. This metric describes, on average in the test set, how far the predictions are from the actual value. The goal is to be able to predict a popularity score within 5 points (on the 0–100 scale set by Spotify) of the actual score. This would require a MSE of 0.05 or lower. It would not be reasonable to expect this level of error if the audio features were not reasonably correlated to the popularity score. Therefore, if this error cannot be attained, I can conclude that there is not a strong, direct correlation between the audio features of a track and the popularity of the track.

Part One: Data Exploration and Visualization

The six “Top 50” playlists combines resulted in a total of 106 unique tracks. The descriptive data for each track includes the following 16 variables:
- Track Name
- Album the Track was Release On
- Artist
- Release Date
- Length of the Track
- Popularity Score assigned by Spotify
- Acousticness
- Danceability
- Energy
- Instrumentalness
- Liveness
- Loudness
- Speechiness
- Tempo
- Time Signature
- Valence
Thus resulting in a data set with dimensions 106x16.

As seen in Figure 1 below, the majority of tracks in the Top 50 playlists were scored with a popularity of 80–95 on a scale of 0 to 100.

Figure 1. Distribution Density of Popularity

However, there is a density of scores around 45 that keep the distribution from being a truly normal distribution. This could be accounted for by a song being popular in a particular region (appearing on only one country’s top 50 playlist) while not being globally popular, and therefore, not receiving the same ratings from streams, shares, and likes that Spotify uses to calculate popularity.

After scaling all the audio features for a track to the same range, I created a star plot, seen in Figure 2, to determine the average magnitudes of the various audio features. It should be noted that Time Signature was removed from analysis because it is a predicted value by Spotify rather than a descriptive measure.

Figure 2. Star Graph of Average Audio Features

As seen in Figure 2 above, 3 features averaged very low values: Instrumentalness, Speechiness, and Acousticness. This is directly contrasted in the high average values of Energy, Danceability, and Loudness. From this, it is reasonable to conclude that, on average, widely popular songs tend to be more high energy and loud rather than soft and instrumental.

To continue the initial analysis of popularity across the compiled tracks, I looked at the release date of tracks to see if the amount of time a track was played correlated to the popularity.

Figure 3. Count by Release Date

As seen in Figure 3 above, 9 of the 106 tracks were released on June 25, 2021. However, the next 3 highest values are from October and September with the remaining top 10 release dates being from a wide variety of months. From this, there is no clear correlation between the time of release alone and the popularity of a song.

Figure 4. Count of Top 10 Artist in Compiled Track List

Next, I wanted to see if certain artists appeared in the track list a high number of times. As seen in Figure 4 above, Adele was the artists with the most tracks in the list at 9 tracks. However, this accounts for less than 10% of all tracks and is not high enough to be significant. It is also worth noting, a few weeks after I downloaded this data, Red Taylor’s Version by Taylor Swift was released and almost all tracks from the album appeared on the Global Top 50 playlist in the following week. From the data in Figure 4 alone, it is hard to say that an artist’s existing popularity greatly impacts a song’s popularity, but it is worth considering the changes to the charts when major artists such as Adele or Taylor Swift promote a new album and release it.

Overall, from the initial analysis of the combine popular tracks list, there is not a clear common thread between the Top 50 tracks. In the next section, I will use regression modeling on the audio features to see if an analysis of the existing tracks can help predict the popularity score of a future track.

Part Two: Methodology — Data Preprocessing

Spotify’s Web API provides an easy way to access and analyze data from Spotify’s libraries. In order to access this information, one only needs a Spotify Developer account. From there, just provide the client ID and secret code to login and request the desired information.

Note: For security purposes, I have redacted my personal information from the Spotify Download file in the GitHub repository, but I have indicated where a user may substitute their own information when running the file.

Using the playlist IDs for each of the Top 50 playlists I selected, I downloaded the track IDs for each song in the playlists, downloaded the audio features for each track, and created a data frame to store the track and its audio features. A total of six data frames were saved for the analysis discussed in the following sections. In addition to the audio features, I also downloaded basic track information such as Name, Album, Artist, Release Date, and Popularity. The playlist tracks and their information were downloaded on November 3rd, 2021.

The audio features used by Spotify to describe a song are acousticness, danceability, duration, energy, instrumentalness, key, liveness, loudness, mode, speechiness, tempo, time signature, and valence. All features have been assigned a numeric value by Spotify. A more detailed look into the Spotify audio features can be viewed here.

In order to fairly assess the impact of audio features on popularity, all features were scaled using the Min-Max Scaler. This ensures that all features are valued in the same range. Once all features were scaled, the data was split into test and train data sets where X is the audio features and y is the popularity score assigned by Spotify.

Part Three: Methodology — Implementation

I ran the test set through a series of different regression models, scoring each with the mean squared error. I tested Lasso, Ridge, and XGB for standard regression as well as Random Forest Regressor, Decision Tree Regressor, and Support Vector Regression (SVR) for ensemble learning regression.

Figure 5. Code Snippet of Regression Model Implementation

The code shown in Figure 5 above lists the implementation used for evaluating each regression model. In each instance, the default parameters were used. Additionally, all models utilized the same RepeatedKFold() parameters and mean squared error scoring.

During this process, it was difficult to determine a clear “winner” for the best regression model. Many of the MSE scores were close to each other that changing the train test split might impact the scoring.

From this initial list, Lasso Regression scored best with a mean squared error of 0.146 and a standard deviation of 0.054. To put that in perspective, the popularity scores had been scaled to a minimum of 0.0 and a maximum of 1.0. Furthermore, this MSE score is almost 3 times the ideal MSE score. It is not necessarily a bad model, but I wanted to see if I could improve the accuracy of the predictions and better represent the correlation between audio features and popularity scores.

Part Four: Methodology — Refinement

Next, I decided to evaluate a variety of stacking options. I would use the ensemble learning regression models as the base estimators and one of the standard regression models as the final estimator. To do this, I used Python’s itertools library to run through every combination of ensemble learning methods (a total of 24 options for base estimators). For each combination, I tried Lasso, Ridge, and XGB for the final estimator. The lowest mean squared error and its combination of estimators prints when the loops has evaluated each combination. The lowest mean squared error was produced by Random Forest, KNeighbors, and Decision Tree regressors as the base estimators and Ridge Regression as the final estimator with a score of 0.145.

Part Five: Results

While the MSE was improved by only one thousandth of a point, this was the model I used to complete my analysis. With this selection, I ran the test data through the model to find the predicted popularity scores and output a table with the predicted, actual, and difference values for each track in the test data (see Figure 6 below).

Figure 6. Predicted vs Actual Popularity Scores

There are important observations to make about this chart. First, we can see that all the predicted values fall in a range of ~0.725 to ~0.0.737 which is less than a tenth of a point. This is only slightly higher than the mean popularity taken from the entire data set (see Figure 7) and lower than the median.

Figure 7. Descriptive Statistics of Audio Features Data Frame

Additionally, all save two of the predicted values are lower than the actual popularity score, and, in many instances, the values are undervalued by more than 0.15. The small variability in predictions and the consistent undervaluing of popularity suggest that audio features themselves are not the driving force in popularity scoring. For example, all the audio features, other than Instrumentalness, show variability in the tracks. Most features have a standard deviation around 0.2 and quantile values that show a reasonable spread. If any of these factors were highly correlated to popularity, we would expect a similar spread, or at least more of a spread than we observed, in the predicted values. The actual popularity values in the test data set ranged from 0.72 to 1.0, but no predicted value reaches even 0.75. From this, it is reasonable to conclude that audio features alone do not necessarily correlate to the popularity of a track.

Part Six: Conclusion

This article examined the impact of audio features describing a track on Spotify on the popularity score of that track.

While predicted scores did not vary much given the set of audio features, the predictions did fall closely to the mean popularity scores of the sample data. Furthermore, in the initial analysis of the data, it was observed that the popular songs tended to have higher energy and danceability scores with low instrumentalness and liveness.

Part Seven: Improvements

A few changes could be made to this study to improve understanding of the effect audio features have on a track’s popularity. My study primarily focused on songs that Spotify had already identified as “popular.” These tracks, in general, likely score higher than the average track on Spotify. It might be useful to select a sample of tracks from multiple genres and individual user’s playlists to work with a wider range of inputs. It is also important to understand that Spotify does not take audio tracks into consideration when calculating popularity. Likes, shares, and streams are the driving force of this calculation. In other words, popularity is a whim of the masses and that is something very hard to quantify for predictions. In order to improve predictions, it would be important to understand the audience. Top 50 tracks, like those used in this study, are created for a general audience. However, if one was to analyze audio features’ impact on genres, or even sub genres, such as indie or country, a closer correlation might be found between audio features and a track’s success in that market. By breaking down the audience into more closely related groups, predictions could be improved for those groups.

The conclusions drawn in each section are not part of a formal study. A more in-depth review of my analysis is available here.

--

--