demo.mp4
- Abstract
- Key Musical Concepts
- How It Works
- Dataset used
- Models
- Project structure
- Download project
- Results
"Recent advances in generative models have made the automated production of music an important area of deep learning research. This paper presents a simplified Generative Adversarial Network (GAN), inspired by MidiNet paper, for symbolic music generation using the MAESTRO dataset. Its importance is in the fact that it demonstrates that a minimal, interpretable model can achieve stable and musically coherent results by addressing practical training challenges like mode collapse and non-convergence using techniques like minibatch discrimination and hyperparameter tuning. The main result is a successful training process, achieved by adjusting learning rates and update steps. This enables the generator to produce piano roll melodies without collapsing. This work provides a reproducible baseline that can be used as a good practical starting point for other experimental research in music generation."
More details can be found in Symbolic-Domain_Music_Generation_with_GANs.pdf.
This section briefly defines the principal musical terms present in this project.
Piano Roll: A binary matrix representing musical data. It maps 128 MIDI notes in 16 time-steps per bar.
Melody: A monophonic sequence where only one note is active at each time-step. It is extracted by selecting the highest-velocity note from the full piano roll in each frame.
Chord: A 13-dimensional vector that encodes the chord iteself, specifying the first key and whether the chord is major or minor. It is derived from the most frequent chord in the previous bar of music.
Octave: All musical notes are normalized into a fixed two-octave range (MIDI notes 60-83 equivalent to C4-B5). This helps in detecting training issues like mode collapse.
The main part of this project is a GAN architecture designed for creating melodies. The entire process, from raw data to generated music, follows a specific pipeline:
- Input Data: The process starts with raw MIDI files from the MAESTRO Dataset.
- Preprocessing: Each MIDI file is converted into a piano roll representation, a binary matrix where notes are mapped over time steps. This stage includes extracting the main melody from chords and removing pauses.
- Data Augmentation: To increase the dataset size, the melodies are circularly shifted up by one semitone recursively 11 times, creating 12 versions of each melody bar.
- GAN Training: The augmented dataset is used to train the GAN models. The training treats common GAN issues like mode collapse and instability through fine-tuned hyperparameters and techniques like mini-batch discrimination.
- Music Generation: Once trained, the generator model can create new piano roll sequences, which are then converted back into MIDI files that can be stored and replayed.
- MAESTRO Dataset: 200+ hours of piano performances.
We also used The Lakh MIDI Dataset to perform some tests, but here we have not reported a snapshot of this dataset.
Consider that we want create a dataset for model_v2.
You need to open the file create_dataset2.py in data folder and modfy the global variables:
# Define the directory of the input midi file.
# The specified directory must contains '.midi' and/or '.mid' files.
INPUT_DIR_PATH = "./raw/maestro-v3.0.0/all/"
# Define where the dataset will be saved.
OUT_DIR_PATH = "./preprocessed/maestro-v3.0.0/dataset2/"
# Define the name of the dataset (must end with '.h5').
OUT_FILE_NAME = "all.h5"Note that by defalut the name of input directory path is ./raw/maestro-v3.0.0/all/ but this directory is not provided.
During our study, we manually copied the entire MAESTRO dataset into this directory.
This setup allowed us to randomly select files from the dataset to create a testing dataset.
Finaly you can create the dataset with:
python create_dataset2.pyA compressed file will be crated in the specified path.
By default we perform data augmentation when creating the dataset. In general this is not a good practice, but in this way we were able to pre-load the entire dataset entirely in GPU before starting the training of the architecture. The dataset contains binary matrices, so it's size should be relatively small and shuld fit entirely on the GPU.
This project implements three distinct GAN models with increasing complexity:
model_v1: A baseline DCGAN that generates single, one-bar-long melodies from a random noise vector.model_v2: A Conditional DCGAN that generates a melody bar conditioned on the preceding bar, encouraging more harmonically coherent sequences.model_v3: An extension ofmodel_v2that is also conditioned on the chord associated with the previous bar, adding another layer of musical context to the generation process.
Consider that we want to train model_v2 using the dataset stored in /preprocessed/maestro-v3.0.0/dataset2/dataset_name.h5.
We suggest to use model_v2 or model_v3, since the first model generates very short melodies that are not particularly
interesting to listen to.
Open notebook train_model_v2.ipynb, go to the "Setup" section and change the value of the DATASET_PATH variable.
# Dataset path.
DATASET_PATH = "data/preprocessed/maestro-v3.0.0/dataset2/dataset_name.h5"The just run all the cells to train the model.
To test a model you should open notebook tester_model_v2.ipynb and change the line that load the checkpiont.
# Load model from checkpoint.
ckp_path = "checkpoint_path.ckpt"
model = GAN.load_from_checkpoint(ckp_path)Then you can just run the notebook, 10 MIDI files will be created in ./outputs/songs/.
If instead you want to generate melodies using model_v3 you need to open notebook tester_model_v3.ipynb.
You need to perform the same steps described above.
At the end of this file you can see that, for a single melody, 6 MIDI files are created.
Each of them differs from the "chord pattern" applied.
.
├── data
│ ├── preprocessed
│ │ └── maestro-v3.0.0
│ │ ├── dataset1 # Datasets for model_v1
│ │ ├── dataset2 # Datasets for model_v2
│ │ └── dataset3 # Datasets for model_v3
│ ├── raw
│ │ └── maestro-v3.0.0 # The Maestro Dataset
│ │
│ ├── create_dataset1.py # Used to create a dataset for model_v1
│ ├── create_dataset2.py # Used to create a dataset for model_v2
│ ├── create_dataset3.py # Used to create a dataset for model_v3
│ │
│ └── midi_preprocessing.py # Contains all the functions used to create the dataset.
│
├── models # Contains the source code of model_v1, model_v2, model_v3
│
├── outputs
│ ├── checkpoints # Checkpoints of the three models, divided by dataset
│ └── songs # Some good songs ouputted by the three models
│
├── utils # Contains utils functions used in different parts of the project
│
├── tester_model_v2.ipynb # Notebook used to test a trained model_v2
├── tester_model_v3.ipynb # Notebook used to test a trained model_v3
├── train_model_v1.ipynb # Notebook used to train a model_v1
├── train_model_v2.ipynb # Notebook used to train a model_v2
└── train_model_v3.ipynb # Notebook used to train a model_v3- Clone the repo:
git clone https://github.com/Ultimi-Sumiti/DL_project/GAN_Music_Generator.git
- Install dependencies:
pip install -r requirements.txt
(Note that the reported songs were not generated only with the models that you can load using the checkpoints.)
In the following we report the ouputs inside the songs directory so you can listen to them direclty here without downloading the files.
Also, in the video, you can visualize the piano roll of the melody.
We do not provide samples generated my model_v1 since they are not particularly interesting to listen to.
Melodies generated after training on the MAESTRO dataset have filenames that start with "maestro". Some melodies were generated after training on the full ABBA directory from the Lakh MIDI Dataset; these can be recognized by filenames starting with "abba".
Note that some melodies generated by model_v2 are 9 bars long.
This was due to a mistake in the code.
Only melody, no chords.
mestro_11.mp4
maestro_13.mp4
abba_4.mp4
abba_3.mp4
maestro_12.mp4
maestro_18.mp4
abba_5.mp4
maestro_20.mp4
Melody and chords.
Melodies generated with this model have lower quality compared to those produced by the previous version.
This is partly because we dedicated significantly more time to tuning and experimenting with Model v2.
We observed that training Model v3 was more challenging: finding suitable
hyperparameter values to stabilize the
