AI vs. music industry: The rise of AI voice cloning

Martina
12 October 2023, Thursday

Share this article on

AI voice cloning technology has been on the rise over the past couple of months, proving powerful in diverse sectors including the music industry. While we’ve touched upon the topic in our previous piece on AI and music, we feel it deserves its own article, diving deeper into its implications and related concerns for independent artists. Let’s get right into it.

The technology of AI voice cloning

Regardless of the controversy surrounding voice cloning, the technology presents a remarkable achievement in the advancement of artificial intelligence. While we will not cover the details, we’ve decided to outline the basics of how the technology works. In essence, voice cloning leverages sophisticated learning algorithms to replicate specific human voices.

What lies at the heart of this innovative process is the neural networks training which is driven by vast volumes of recorded voice data. AI models are trained to master countless vocal nuances, intonations, pitches, accents, or speeds to adopt synthesised speech which will as closely as possible imitate the source speaker’s voice.

An essential role is also played by deep learning architectures called generative adversarial networks (GANs) that mediate interaction between generator and discriminator networks. As the name suggests, generators aim to generate synthetic voices, while discriminators evaluate their authenticity against real human speech. The interaction between the two networks creates a repeated cycle of creation, evaluation and refinement, through which the AI voice cloning technology is constantly evolving.

A robot who imitates someone elses voice

AI voice cloning in music

At the very beginning of this article, we mentioned that voice cloning services are being utilised in a variety of areas and industries. That is, for example, for developing personalised and more human-like virtual assistants, creating viral social media posts, or enabling high-quality communication to those who have lost their ability to speak.

What makes the use of voice cloning in the music industry different from its use in other areas is that for musicians, their vocals are a sacred ingredient of their art and profession, too. In fact, we can argue that a singer's voice is a communication tool, a musical instrument and a monetization tool all in one.

Some particular examples brought art voice cloning to the spotlight. Back in 2021, Capitol Record made the news when they announced their decision to sign an entirely AI-voiced rapper FN Meka. It was shortly after this when fans discovered that FN Meka’s voice belonged to a human voice actor who was neither credited nor compensated for their work and talent.

This sparked off a controversy that was even further fueled by the virtual artist’s public persona and mannerisms which were viewed as supportive of racial stereotyping and cultural appropriation of Black culture. No wonder, therefore, that the artist was dropped from the label only 10 days after being signed.

Please accept marketing cookies to view this content.

You can probably also remember the song ‘heart on my sleeve’ that went viral earlier this year for featuring ‘realistically sounding’ vocals of Drake and The Weeknd generated by AI. The song was written and produced by a TikTok user ghostwriter977 who self-released it on streaming platforms, such as Spotify, Apple Music or YouTube.

The outcome? The track generated millions of views on TikTok and thousands of streams across platforms. And while Universal Music Group eventually took it down, it became a pivotal case in today’s discussions about the legality of AI-generated music on streaming platforms and the adoption of relevant copyright laws.

Most recently, a new break-up song featuring Taylor Swift’s vocals generated by AI surfaced on the internet imitating what her song ‘will’ sound like after her new alleged relationship falls apart in the future. The lyrics were reportedly written by a human TikTok user Will King but were further produced by another user using audio-editing software and tools to create the final AI-generated version of the song.

After only a day on YouTube, the song garnered almost 60,000 views and received mostly favourable reviews with viewers suggesting that the song be released on other streaming platforms, too.

Influence of AI voice cloning on musicians’ lives

1. Decreased value of human-generated work

Regardless of how extremely talented, skilled and experienced she is, Swift would most likely need a certain amount of time to finish the song (or to write her song about her own life without anyone else’s input). Let’s also not forget about the financial investment needed to record and produce the song. Ultimately, this would apply to every musician out there, who would need substantial time, money, and appropriate tools to have their music ready for release.

An ideal AI tool will only need a request, a click and maximum a few minutes to finish writing the song, producing it and generating vocals for it. Nowadays, realistically, you would need more than one AI tool to create the song from scratch but even then, the creative process would turn out cheaper, less time-consuming and more accessible than the one adopted by humans.

This is not to condemn human-generated work and champion AI at all costs but, ultimately, this is what the thought process might become for certain individuals in the music industry if no regulations and restrictions are put in place. As a result, both music production and consumption may become dominated by AI-generated tracks, pushing human-made music aside.

Additionally, with the AI voice cloning technology continuously improving and slowly mastering the art of impersonation, fans, too, may be less opposed to purchasing AI-generated music instead of the artist’s original work. We can see that happening with the cases we mentioned earlier.

Both Drake & The Weeknd’s and Taylor Swift’s songs have garnered immense popularity with fans asking for the songs to be completed and released on streaming platforms. This all could eventually lead to a decrease in music sales, loss of streams and thus substantial reduction in revenue for artists.

Some claim that such changes would have an impact on live performances, too, while others, on the contrary, emphasise that AI can never replace or even imitate a human connection that is an essential part of every concert.

The ongoing ABBA Voyage virtual concert residency in London, which is expected to run at least until the end of 2024, may suggest that it is right to be concerned. For now, however, this can only be a speculation as the virtual residency is taking place only because the original band members have decided not to tour again.

A photo taken at one of the ABBA Voyage concert (Source: IC Travel)

2. Identity theft

Voice cloning might not only be used by others to sell and earn money with music that has your cloned vocals in it (and that showcases your talent and performance qualities as well). It may also be used by others to steal your identity and use your voice in situations where you would not like to be involved and that may extend beyond the area of music creation (including various fraudulent activities).

One may argue that this could potentially happen only to artists who enjoy worldwide fame and commercial success. However, with the technology being so easily accessible, truly anyone with a voice can be affected. The consequences may be brutal, especially in today’s era of social media, great technological advancement (including deepfake) and excessive information sharing, where authenticity and truth is more difficult to detect and where any small wrongdoing (whether true or not) gets severely punished.

Eventually, this could result in both physical and psychological harm to the musicians, the fans, and, in case of fraudulent and criminal cases, the public, too. People can get easily manipulated, throwing shade and ‘cancelling’ others for things that they may not have even committed.

3. New creative and innovative opportunities

It’s important to note that voice cloning as well as other AI technologies are not necessarily all that bad. For one, they can be perceived as sources of new opportunities for artists and means of enhancing their art. By using AI tools, musicians may find new ways and forms to express themselves, opting for before-undiscovered sounds, instruments, and synergies.

At concerts and other live performances, AI tools may serve as a powerful entertainment tool, helping create unique experiences without necessarily replacing the main performer. Putting on a show like that is likely to attract great attention, which may lead to further audience growth and an increase in ticket sales.

On top of that, AI technology can open the door for more people to produce and put out their music independently, reducing financial dependence on record labels and gradually dismantling and reshaping how the music business operates (and the way individuals turn their passion for music into a professional career).

The ultimate truth about technological developments is that they are (almost always) initially intended to enhance and complement existing human capabilities and creativity, not replace them. That is, however, only possible with the adoption of relevant copyright laws and regulations. And what is the current situation with copyright laws and use of AI you might ask? Let’s have a look.

AI cloning and copyright protection of human work

We’ll begin by saying that while AI technology has been in development for years, it feels as if it happened overnight that individual tools and platforms have become an integral part of our everyday lives. One day, we had no clue what ChatGPT was and the next day, we were actively using it at work, schools and in our private lives, too. It’s therefore no wonder that from a legislative point of view, we have yet to keep up with these rapid changes.

Current copyright laws are almost exclusively devoted to protecting ‘fixed’, tangible creative expressions, like melodies, lyrics, song recordings and artwork. Intangible elements, including one’s voice, are left rather unprotected. Particularly in the USA, the proprietary ownership of one’s voice has not been included in the federal copyright law since the voice sounds are not considered ‘fixed’, as the statute requires.

Additionally, while laws designed to protect one’s privacy, prevent fraud and regulate consent may theoretically apply to voice cloning, there are currently no such laws or regulations designed solely to address challenges unique to the technology.

On top of that, In the US, ‘fair policies’ allow limited use of copyrighted material without the need for permission from the copyright holders. However, what constitutes a ‘limited fair use’, particularly in the context of AI technology, has not been defined.

The EU, on the contrary, has shown incentives to reform the copyright laws to address the respective issues in the virtual world. In 2021, the European Commission came up with the AI Act that is to rectify and regulate both the usage and development of AI by issuing guidelines for developers and users. However, although the current draft of the AI Act mandates transparency and data governance, it has been criticised for not sufficiently addressing the generative AI applications developed in the fields of video and audio content.

Such a complaint was brought up by the UVA, a global coalition of 35 European voice acting guilds, associations, and unions, which has further united with organisations from Switzerland, USA, or Asia to ensure no harm is caused to artistic heritage and human creativity by the use of AI in the dubbing and voice-over industry.

This ultimately took place after it was reported that several Skyrim mods (a video game) used AI to create pornographic content with the characters’ voices without the actors’ consent. As it was later shown, such practice is allegedly not banned in the respective copyright laws.

Are likeness laws the solution?

While the main focus in protection is directed toward copyright laws, many music industry insiders have emphasised the importance of so-called likeness laws. As they claim, an artist's likeness is based on their voice and so such laws will protect musicians against unauthorised use of their voice.

Such insiders have revealed that cloned content or content that sounds like a particular artist’s output is tricky in a way that such work does not consist of a direct copy of the artist’s actual creation.

One may claim that a cloned song (or a song with AI-generated voices) is only derivative of the original, which is a very hard argument for copyright owners. In the end, there have been multiple cases of artists borrowing someone else’s ideas when composing music and writing songs.

However, as it’s been the case of copyright lawsuits we’ve experienced in the music industry, the claims regarding infringement were fundamentally directed towards the melody of the songs – not towards someone imitating another individual’s voice or overall style.

The ultimate idea of the likeness laws is that the owner of the creation, whether that’s a song, a video, or a movie, does not only hold the respective rights to their work, but also have the right to control their reputation and monetize their identity.

In the past, the most renowned lawsuit cases connected to likeness and the right of publicity circled around the unauthorised use of individual images and other content in video games. This was also the case of the American band No Doubt that filed a lawsuit against the video game publisher, Activision, claiming that their likeness was used to play music in the game called Band Hero without their consent.

Whether likeness laws are key in musicians’ legal protection against unauthorised AI-generated content remains a question as further discussion on the topic needs to take place. What’s already clear, however, is that a modernised legal approach designed to tackle the ever-evolving AI landscape in music is an absolute necessity to care for artists’ works, talents and identities.