musicbert-large

Model Description

MusicBERT large is a 24-layer BERT-style masked language model trained on REMI+BPE symbolic music sequences extracted from the GigaMIDI corpus. It is tailored for symbolic music understanding, fill-mask style infilling, and as a backbone for downstream generative tasks.

  • Checkpoint: 120000 steps
  • Hidden size: 1024
  • Parameters: ~330M
  • Validation loss: 1.19900381565094

Training Configuration

  • Objective: Masked language modeling with span-aware masking
  • Dataset: GigaMIDI (REMI tokens → BPE, vocab size 50000)
  • Sequence length: 1024
  • Max events per MIDI: 2048

Inference Example

Using with MIDI files

import torch
from transformers import BertForMaskedLM
from miditok import MusicTokenizer

# Load model and tokenizer
model = BertForMaskedLM.from_pretrained("manoskary/musicbert-large")
tokenizer = MusicTokenizer.from_pretrained("manoskary/miditok-REMI")

# Convert MIDI to BPE tokens (MIDI → REMI → BPE pipeline)
midi_path = "path/to/your/file.mid"
tok_seq = tokenizer(midi_path)
bpe_ids = tok_seq.ids

# Mask some tokens for prediction
import random
mask_token_id = 3  # MASK_None token
input_ids = bpe_ids.copy()
mask_positions = random.sample(range(1, len(input_ids)-1), k=5)
for pos in mask_positions:
    input_ids[pos] = mask_token_id

# Run inference
input_tensor = torch.tensor([input_ids])
with torch.no_grad():
    outputs = model(input_tensor)
    predictions = outputs.logits[0, mask_positions, :].argmax(dim=-1)

print("Predicted token IDs:", predictions.tolist())

Limitations and Risks

  • Model is trained purely on symbolic data; it does not produce audio directly.
  • The GigaMIDI dataset is biased towards Western tonal music.
  • Long-form structure beyond 1024 tokens requires chunking or iterative decoding.
  • Generated continuations may need post-processing to ensure musical coherence.

Citation

If you use this checkpoint, please cite the original MusicBERT introduction and the GigaMIDI dataset.

Downloads last month
34
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train manoskary/musicbert-large