July 4, 2023

Making a neural network that makes music

For the past several months I’ve been trying to make music with a neural network. After trying many different concepts, I settled on something very similar to Music Transformer. Here are some somewhat curated samples:

They’re not great, but that’s fine. It’s a lot better than nothing, which is what I was getting when I first started.

How it works, briefly

If you read the Music Transformer paper then you’ll understand most of it. Things I do differently:

The training data

All of the training data is in MIDI format. This is nice because MIDI is really simple and easy to work with. But on the other hand, MIDIs are really rare, so it’s hard to build up a big high-quality dataset. I’ve experimented with automatic MP3-to-MIDI conversion, but it’s not good. Basically I’ve just had to seek out MIDIs wherever they can be found. Shout out to Makinporing for saving the old MSPAF MIDI/sheet music stuff.

I don’t discriminate on the length or style of a song, or the number of instruments, so there’s a decent variety in my dataset. Of course, everything gets converted to a single-track piano, but almost every song still sounds decent in piano-only form.

An illustration of the variety in the dataset.

Future plans

I’d like to add another transformer stack to handle volume, but I haven’t decided how to discretize that yet. I also want to come up with ways to control the model’s output. I experimented a bit with temperature, but it just didn’t seem useful to me. Lower temperatures made the outputs monotonous, and higher temperatures resulted in gibberish. Eventually I’m thinking of running a server in the Colab notebook so I can control it with a GUI that I run locally.