Methodology
What questions guided the research?
As we sought to investigate the possible relation between recommendation algorithms and algorithmic bias in music apps, we chose two of the most frequently used platforms in Brazil: Deezer1 and Spotify2. Both enable the user to listen to the contents “for free”, alternating publicity ads between songs. The exploratory data analysis was carried out by using Python programming language with Pandas library and the visualization, with Tableau software.
We, thus, established three guiding questions:
- i) Do Spotify and Deezer recommendation algorithms recommend music in different ways, depending on the gender of the listener?;
- ii) Do Spotify and Deezer privilege artists of one of the genders (male/female/non-binary) on their recommendations?;
- iii) Regarding these recommendations, are there differences when we listen to this or that musical genre? For example, will the recommendations for those who play an MPB song and a funk song have, proportionally, the same kind of indications regarding the genre of the artist?
It is important to highlight that our data collection did not enable us to distinguish the ethnic-racial identification of artists, nor to investigate whether black and white listeners received different recommendations. The impossibility of collecting these data limited our ability to measure how ethnic-racial belongings could be articulated to genre and musical rhythms. For this reason, we invite readers to observe, in their own searches, the presence or absence of non-white people on the recommendations they receive when playing songs and performing searches in our simulator.It is also crucial to recognize that, in the complex dynamics of production, implementation and use of recommendation systems, there is much to be attributed to social factors that are “external” to the streaming platforms we investigated. It is evident, for example, that Sertanejo music presents low representation of black artists in its production, due to factors referring to the formation of this musical genre in the Brazilian culture. Therefore, the invitation to observe the presence or absence of non-white people on the recommendations is not limited to thinking about what the algorithm recommends, but it also aims at understanding the interconnections between the inequalities that structure Brazilian society and how this can reflect in the algorithmic recommendations of songs.
“Sertanejo is one of the genres majorly composed by white people. Since the popularization of the rhythm, few black artists have stood out in defending the musical style. Among them, we find João Paulo, who had a duo with Daniel, and died in 1997; the singers Pena Branca & Xavantinho, who performed together until 1999; and Rick, who was Renner’s partner until 2015 and is now on a solo career.”3
The difficulty in collecting this kind of data leads us to the discussions that have taken place around the regulation of platforms and artificial intelligence, along with the growing demand for transparency in the collection and use of user data, which highlights the importance of allowing researchers access to this data. This openness would make it possible, for example, to understand the dynamics present in the music market and to understand the formation of consumer preferences and habits, providing fundamental insights to understand the organization of our society.
If not all can be attributed as responsibility or fault on the part of streaming platforms – seen that inequalities have been present in the musical industry since long before the creation of the internet -, we cannot ignore that, once these issues become present in the platforms’ universe, but are not addressed by them, we face an omission in the combat and mitigation of a social problem that is poignantly posed.
How were the artists and musical genres selected?
We decided to collect and analyze data from five musical genres widely present in Brazil, which are: Rap, Gospel, Brazilian Popular Music (MPB), Sertanejo and Funk. During the process, an idea of performing the same collection for the most played artists on both platforms emerged, in order to verify if the patterns identified in the genres Rap, Gospel, MPB, Sertanejo and Funk would also appear in the analysis of the most played artists. In the year 2021, when the data collection took place, Spotify released a list of the five most played artists, and Deezer released a list of the ten most played. Since the five most played artists on Spotify were also present on Deezer’s list, we decided to work with this Top 5 group in our research. However, when we realized that four of the artists on Top 5 belonged to sertanejo, we chose to modify our initial approach. We decided to remove sertanejo from the group of musical rhythms initially proposed and included it only among the Top 5 artists.
We, thus, established the scope with four musical genres: Rap, Gospel, MPB and Funk, as well as the artists present on the Top 5. We, then, selected two artists (a male and a female) of each musical genre with similar numbers of followers/listeners. These artists were chosen as representative of the respective musical genre for the results analyses. The artists selected by the researchers, here referred to as Not Top 5, were:
- Rap:
- Gospel:
- MPB:
- Funk:
- Sertanejo (later excluded):
The artists present on the Top 5 from Deezer and Spotify, were: Os Barões da Pisadinha, Gusttavo Lima, Marília Mendonça, Jorge & Mateus e Henrique & Juliano.
The choice of the musical genres above took place in order to include a diversity of artists, considering the main rhythms consumed nationally, as is the case of Sertanejo and Funk, but also in consumptions that may arise from different social understandings and attributed cultural values.
The values attributed to musical genres in Brazil pass through readings related to social class, gender, race and ethnicity. These values are, therefore, historically modifiable, and they dialogue quite directly with the relation of the country’s history and its constitution as a nation. Samba, for example, was a fundamental part in the construction of the ideal of Brazilian nation4. Pagode, at a certain historical moment, was created as a different redefinition of samba, which, according to Leci Brandão, is more of an industrial division than a cultural division. In addition, if samba was criminalized for a long period in the past, which also occurred to Rap, nowadays we can establish comparisons between these movements and Funk, which is seen as a musical genre linked to illegal practices, feeding the discrimination of those who listen to and enjoy the genre5. MPB, in its turn, was built as a broad social front that sought to make public the artists’ opposition to political and social decisions that, then, marked the country. Today, it is read as a music style more often consumed by higher social classes6.
Definition of profiles and simulation of listeners: step by step
Having defined the list of artists, the next step was to collect the songs recommended by Spotify and Deezer. To do so, a set of bots was created to simulate different listeners of the song playlists recommended to each of the defined artists. At the time of collection, both platforms allowed users to identify as male, female or non-binary. Thus, six bots were created for collecting each artist from each genre and the Top 5 group, on Spotify and Deezer. That is, two male bots, two female bots and two non-binary bots for 13 artists, totaling 78 bots.
Having created the accounts, the next step was to run the simulation of a playlist listener. Thus, a playlist was attributed to each of the groups of six bots. That is, the bot started by listening to Rico Dalassam, for example. From there, the bot would listen to the next 50 recommended songs. In order to enable the collection of the recommended songs, the web versions of Spotify and Deezer were used.
Due to hardware limitations, the simulation for all artists on the list did not occur simultaneously, as this would entail running 156 playlists in parallel (78 bots for Deezer + 78 bots for Spotify). In addition, the execution would happen twice, because there was an intention of verifying recommendations that took place in the morning and in the evening shifts. To deal with this limitation, we ran the simulation in different weeks, when each artist on the list was “listened to” by the bots. This way, the execution dropped to 12 playlists in parallel (6 bots for Deezer + 6 bots for Spotify) happening twice a day. Since we did not count on a computer exclusively dedicated to the experiment, the 12 playlists ran between the months of April and December 2021.
As the web version of the platforms was used, the parallel simulation took place in Web Browser tabs. To ensure that each bot would run a browser-independent instance, the container feature was used. In this research, Firefox was used with the Multi-Account container extension <https: addons.mozilla.org/en-us/firefox/addon/multi-account-containers/=””>.
Data Collection
The collection of the songs recommended to each bot was performed through scraping. To do so, a Python code was implemented using the Selenium libraries <https: www.selenium.dev/=””>, an extension to access containers in Firefox <https: addons.mozilla.org/en-us/firefox/addon/open-url-in-container/=””> and Beautiful Soup <https: www.crummy.com/software/beautifulsoup/bs4/doc/=””>. Selenium is a tool for automating browsers through programming. Given that the simulation took place in containers, the extension allowed the access of the automated code, with Selenium, to each of the container tabs. Finally, Beautiful Soup is a tool that enables the performance of tasks related to scraping data from Web pages.
The simulation and collection occurred weekly. After the simulation was completed, the code collected the data related to the 50 songs recommended by the platforms and stored them in a local database. These data were: song ID, song title, artist’s name, song length and song order in the playlist.