MusicLM: Google’s artificial intelligence generates music in different genres at a frequency of 24 kHz

Zoom in / An AI-generated image of an exploding musical sphere.

Ars Technica

On Thursday, researchers from Google announced a new model of artificial intelligence called MusicLM It can create 24kHz musical sound from textual descriptions, such as “a soothing violin melody backed by a distorted guitar riff.” It can also convert a light melody into a different musical style and output music for several minutes.

MusicLM uses an artificial intelligence model trained on what Google calls “A A large dataset of unlabeled music”, along with captions from MusicCaps, a new data set of 5,521 pairs of musical transcripts. MusicCaps gets its text descriptions from human experts and its matching audio clips from Google sound seta collection of over 2 million 10-second audio clips pulled from YouTube videos.

In general, MusicLM works in two main parts: first, it takes a series of phonetic symbols (parts of sound) and maps them to semantic symbols (words that represent meaning) in the comments for training. The second part receives user feedback and/or audio input and generates audio tokens (parts of the audio that make up the resulting song output). The system is based on an earlier model of artificial intelligence called AudioLM (Introduced by Google in September) along with other components such as SoundStream And Mulan.

Google claims MusicLM outperforms Previous AI music generators in audio quality and adherence to text descriptions. on MusicLM offer page, Google offers several examples of an AI model in action, where audio is generated from “rich captions” that describe the feel of the music and even the (which so far remains obscure) vocals. Here is an example of the rich explanation they provide:

Slow tempo, reggae song led by bass and drums. Continuous electric guitar. High pitched with ringing tones. The singing is relaxed with a sense of ease, very expressive.

Google also shows off MusicLM’s “Long Generation” (creating five-minute music clips from a simple prompt), “Story Mode” (which takes a series of text prompts and turns them into a turn-by-turn series of musical tunes), “Text and Melody Adaptation” (which is what It requires inputting a human buzzing or beeping sound and changing it to match the pattern specified in a prompt), generating music that matches the mood of the photo captions.

Diagram of the MusicLM AI music generation model taken from his academic paper.
Zoom in / Diagram of the MusicLM AI music generation model taken from his academic paper.

google search

What’s more, Google search delves into MusicLM’s ability to recreate specific instruments (like flute, cello, and guitar), different music genres, different music experience levels, places (jailbreak, gym), time periods (a club in the 1950s), and more.

AI-generated music is by no means a new idea, but AI music-creation methods in previous decades often created music notation that was subsequently played manually or through a synthesizer, whereas MusicLM generates the raw audio frequencies of the music. Also, in December, we’ve got you covered Spread, an AI hobby project that can similarly generate music from text descriptions, though not with high fidelity. Google refers to Riffusion in its MusicLM Academic paper‘, saying that MusicLM surpasses it in terms of quality.

In the MusicLM paper, its creators outline potential impacts of MusicLM, including potential “misappropriation of creative content” (i.e. copyright issues), potential biases for cultures underrepresented in the training data, and potential cultural appropriation issues. As a result, Google stresses that more work is needed to address these risks, and they hold back on the code: “We have no plans to release prototypes at this point.”

Google researchers are already looking towards future improvements: “Future work may focus on creating lyrics, along with improving text adaptation and audio quality. Another aspect is modeling higher-level song structure such as intro, verse, and chorus. Music with a higher sample rate is an additional goal.” .

It’s probably not an exaggeration to suggest that AI researchers will continue to improve music generation technology so that anyone can create studio-quality music in any style just by describing it — though no one can predict exactly when or how that goal will be achieved. Exactly it will affect the music industry. Stay tuned for more developments.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top