AI has success in art and music by researchers figuring out how to break up the work of creativity into AI solvable tasks

One of Ray Kurzweil’s predictions on the path to the Technological Singularity is that Artificial intelligence will be making art and music.

Ray Kurzweil’s “2019” predictions which actually are 2011 to 2029 predictions that are on the Technological Singularity timeline are below.

Using AI is one of the geekiest ways to make tunes, and has been around since the 80s. It’s a thriving area of research with dedicated academic conferences. And with the recent boom in machine learning, it also means the quality of music created by AI seems to be getting better too.

AI researchers are achieving more success by figuring out how to break up the work of creativity into AI solvable tasks.

Success in performing previously considered human only tasks involves understanding how to perform it and then converting it to problems and approaches that AI can address.

How many human tasks will be found to be “not as difficult as previously believed” ?
How many human tasks will be found to be beyond scaled up advanced versions of deep learning and big data ?

Researchers from the University of Toronto in Canada have trained recurrent neural networks to make an all-singing and dancing AI. A paper submitted to ICLR 2017, an academic conference for machine learning, shows that artificially-intelligent software can not only process data, it can create art, too.

Deep Learning looks at popular music songs and then explores music based upon the patterns of popular songs

Machines don’t have a wild, unlimited creative streak like humans, however. The process of making music is analytical rather than emotional. They don’t understand music, but they can do maths.

Although the neural network has been encoded with the rules of musical theory, it can’t make something out of nothing. It has to learn by examples, so the team analysed the chords in 100 hours of pop music to learn about common patterns of notes and melodies.

So the neural network generates music by weighing up the probabilities of what note should go next, according to the scale it’s working in. Although there are 12 notes, only six or seven of those notes are usually part of a scale.

First, the input scale is chosen, and the initial layer of the neural network decides what key it should play the music in. The possible range of notes is already known from the scale, and the system chooses the combination by learning the patterns in pop music.

The second layer decides how long the key should be played for. Unlike jazz, which is trickier to play and more unpredictable, pop music has a repetitive structure that is easier to analyse and produce.

A third layer picks the chords to go along with the melody, and the fourth is for drums. All layers work simultaneously to give an output combination of notes at specific timings to create a song that sounds pretty convincing.

Hang Chu (a PhD student) and Raquel Urtasan and Sanja Fidler (both associate professors) all work at the University of Toronto as researchers in computer vision, but became intrigued to see if the underlying principles of good pop music could be captured in algorithms.

Conclusion and Future Work

They have presented a hierarchical approach to pop song generation which exploits music theory in the model design. In contrast to past work, their approach is able to generate multi-track music. Their human studies shows the strength of their framework compared to an existing strong baseline. They additionally proposed two new applications: neural dancing and karaoke, and neural story singing. They next discuss the limitations and avenues for future work. As most existing approaches their method’s objective is to learn to produce music at the note level. This can be unsuitable for music, as music is flexible and intentionally made to be unpredictable when it is composed. This calls for a deeper study of music theory, as in this paper we are only scratching the surface.

Arxiv- Song From PI: A Musically Plausible Network for Pop Music Generation


Researchers present a novel framework for generating pop music. Their model is a hierarchical Recurrent Neural Network, where the layers and the structure of the hierarchy encode our prior knowledge about how pop music is composed. In particular, the bottom layers generate the melody, while the higher levels produce the drums and chords. they conduct several human studies that show strong preference of our generated music over that produced by the recent method by Google. They additionally show two applications of their framework: neural dancing and karaoke, as well as neural story singing.