Presentation, code snippets and scripts for Can machines play the piano? presentation at PyData London 2024
To start presentation run:
streamlit run --server.port 4001 presentation.py
To load a midi file into a dataframe, we use:
import fortepyan as ff
piece = ff.MidiPiece.from_file(path="data/midi/piano.mid")
python midi_basics/midi_to_dataframe.py
You can find maestro dataset with "notes" and "source" column at huggingface
from datasets import load_dataset
dataset = load_dataset("roszcz/maestro-sustain-v2")
Split | Records | Duration (hours) | Number of notes (millions) | |
---|---|---|---|---|
0 | Train | 962 | 159.4174 | 5.6593 |
1 | Validation | 137 | 19.4627 | 0.6394 |
2 | Test | 177 | 20.0267 | 0.7414 |
3 | Total | 1,276 | 198.9068 | 7.0402 |
We can calculate average notes per second in maestro dataset by counting rows in dataframes created from dataset and dividing them by total time.
for record in dataset:
total_notes += len(record["notes"]["pitch"])
total_time += max(record["notes"]["start"]) - min(record["notes"]["start"])
By using "start" time we are calculating how many notes were pressed in a second on average.
python midi_basics/notes_per_second.py
To visualize and listen to a midi file, we can use streamlit-pianoroll component.
import fortepyan as ff
import streamlit_pianoroll
from datasets import load_dataset
dataset = load_dataset("roszcz/maestro-sustain-v2", split="test")
record = dataset[77]
piece = ff.MidiPiece.from_huggingface(record=record)
streamlit_pianoroll.from_fortepyan(piece=piece)
streamlit run midi_basics/streamlit_piece.py
We can compare the distribution of note pitches between two MIDI pieces using matplotlib histograms. This can provide insights into the pitch range and distribution within each piece.
python -m streamlit run --server.port 4014 midi_basics/compare_pieces.py
You can compare the distribution of note durations between MIDI pieces composed by different composers using histograms generated with matplotlib. This comparison helps in understanding the temporal characteristics of musical compositions and may reveal stylistic differences or compositional preferences.
python -m streamlit run --server.port 4015 midi_basics/compare_composers.py
Augmentation review:
python -m streamlit run --server.port 4016 modelling/augmentation.py
Predicting a sub-sequence of notes within defined range from a sequence of notes deprived of it is an interesting downstream task and a possible benchmark task. The results of generating them with a model can be interesting, as they show the model's understanding of musical structure and harmony.
Here is a review of the sub-sequence extraction
python -m streamlit run --server.port 4017 modelling/extract_notes.py
Voice | Range ( in pitch value ) |
---|---|
BASS | 21-48 |
TENOR | 43-81 |
ALTO | 53-84 |
SOPRANO | 60-96 |
TREBLE | 60-108 |
The Maestro dataset used in the experiments can be found here:
https://huggingface.co/datasets/roszcz/maestro-sustain-v2
You can also check out our organization GitHub with tools and experiments:
https://github.com/your-organization-link
For questions, reach out to Wojtek Matejuk at:
[email protected]
Explore and play with MIDI and share your compositions on:
https://pianoroll.io
If you play the piano and want to help source training data, track your practices there.
Run with docker:
docker build -t pydata-london-24 .
docker run -p 4334:4334 pydata-london-24
This repository uses pre-commit hooks with forced python formatting (black, flake8, and isort):
pip install pre-commit
pre-commit install
Whenever you execute git commit
the files altered / added within the commit will be checked and corrected.
black
and isort
can modify files locally - if that happens you have to git add
them again.
You might also be prompted to introduce some fixes manually.
To run the hooks against all files without running git commit
:
pre-commit run --all-files