midi-pydata-london-24

Presentation, code snippets and scripts for Can machines play the piano? presentation at PyData London 2024

Presentation

To start presentation run:

streamlit run --server.port 4001 presentation.py

MIDI data

MIDI to dataframe

To load a midi file into a dataframe, we use:

import fortepyan as ff

piece = ff.MidiPiece.from_file(path="data/midi/piano.mid")

python midi_basics/midi_to_dataframe.py

Maestro dataset

You can find maestro dataset with "notes" and "source" column at huggingface

from datasets import load_dataset

dataset = load_dataset("roszcz/maestro-sustain-v2")

	Split	Records	Duration (hours)	Number of notes (millions)
0	Train	962	159.4174	5.6593
1	Validation	137	19.4627	0.6394
2	Test	177	20.0267	0.7414
3	Total	1,276	198.9068	7.0402

Average notes per second

We can calculate average notes per second in maestro dataset by counting rows in dataframes created from dataset and dividing them by total time.

for record in dataset:
    total_notes += len(record["notes"]["pitch"])
    total_time += max(record["notes"]["start"]) - min(record["notes"]["start"])

By using "start" time we are calculating how many notes were pressed in a second on average.

python midi_basics/notes_per_second.py

streamlit-pianoroll

To visualize and listen to a midi file, we can use streamlit-pianoroll component.

import fortepyan as ff
import streamlit_pianoroll
from datasets import load_dataset

dataset = load_dataset("roszcz/maestro-sustain-v2", split="test")
record = dataset[77]

piece = ff.MidiPiece.from_huggingface(record=record)
streamlit_pianoroll.from_fortepyan(piece=piece)

streamlit run midi_basics/streamlit_piece.py

Comparing MIDI Pieces

Plotting Note Pitches Comparison and Note Time Difference Comparison

We can compare the distribution of note pitches between two MIDI pieces using matplotlib histograms. This can provide insights into the pitch range and distribution within each piece.

python -m streamlit run --server.port 4014 midi_basics/compare_pieces.py

Duration Distribution Comparison

You can compare the distribution of note durations between MIDI pieces composed by different composers using histograms generated with matplotlib. This comparison helps in understanding the temporal characteristics of musical compositions and may reveal stylistic differences or compositional preferences.

python -m streamlit run --server.port 4015 midi_basics/compare_composers.py

Modelling

Augmentation

Augmentation review:

python -m streamlit run --server.port 4016 modelling/augmentation.py

Note range extraction

Predicting a sub-sequence of notes within defined range from a sequence of notes deprived of it is an interesting downstream task and a possible benchmark task. The results of generating them with a model can be interesting, as they show the model's understanding of musical structure and harmony.

Here is a review of the sub-sequence extraction

python -m streamlit run --server.port 4017 modelling/extract_notes.py

Voice	Range ( in pitch value )
BASS	21-48
TENOR	43-81
ALTO	53-84
SOPRANO	60-96
TREBLE	60-108

Important Links

The Maestro dataset used in the experiments can be found here:
https://huggingface.co/datasets/roszcz/maestro-sustain-v2

You can also check out our organization GitHub with tools and experiments:
https://github.com/your-organization-link

For questions, reach out to Wojtek Matejuk at:
[email protected]

Explore and play with MIDI and share your compositions on:
https://pianoroll.io

If you play the piano and want to help source training data, track your practices there.

Deployment

Run with docker:

docker build -t pydata-london-24 .
docker run -p 4334:4334 pydata-london-24

Code Style

This repository uses pre-commit hooks with forced python formatting (black, flake8, and isort):

pip install pre-commit
pre-commit install

Whenever you execute git commit the files altered / added within the commit will be checked and corrected. black and isort can modify files locally - if that happens you have to git add them again. You might also be prompted to introduce some fixes manually.

To run the hooks against all files without running git commit:

pre-commit run --all-files

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
midi_basics		midi_basics
modelling		modelling
tokenization		tokenization
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
browse_dataset.py		browse_dataset.py
presentation.py		presentation.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

midi-pydata-london-24

Presentation

MIDI data

MIDI to dataframe

Maestro dataset

Average notes per second

streamlit-pianoroll

Comparing MIDI Pieces

Plotting Note Pitches Comparison and Note Time Difference Comparison

Duration Distribution Comparison

Modelling

Augmentation

Note range extraction

Important Links

Deployment

Code Style

About

Releases

Packages

Languages

Nospoko/midi-pydata-london-24

Folders and files

Latest commit

History

Repository files navigation

midi-pydata-london-24

Presentation

MIDI data

MIDI to dataframe

Maestro dataset

Average notes per second

streamlit-pianoroll

Comparing MIDI Pieces

Plotting Note Pitches Comparison and Note Time Difference Comparison

Duration Distribution Comparison

Modelling

Augmentation

Note range extraction

Important Links

Deployment

Code Style

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages