File size: 5,558 Bytes
dfbb306 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 | ---
license: bsd-3-clause
tags:
- meg
- brain-signals
- phoneme-classification
- conformer
- libribrain
- speech-recognition
datasets:
- pnpl/LibriBrain
metrics:
- f1
library_name: pytorch
model-index:
- name: megconformer-phoneme-classification
results:
- task:
type: audio-classification
name: Phoneme classification
dataset:
name: LibriBrain 2025 PNPL (Standard track, phoneme task)
type: pnpl/LibriBrain
split: holdout
metrics:
- name: F1-macro
type: f1
value: 0.6583 # 65.83 %
args:
average: macro
---
# MEGConformer for Phoneme Classification
Conformer-based MEG decoder for 39-class phoneme classification from ARPAbet phoneme set, trained with 5 different random seeds.
## Model Performance
| Seed | Val F1-Macro | Checkpoint |
|------|--------------|------------|
| 7 (best) | **63.92%** | `seed-7/pytorch_model.ckpt` |
| 18 | 63.86% | `seed-18/pytorch_model.ckpt` |
| 17 | 58.74% | `seed-17/pytorch_model.ckpt` |
| 1 | 58.64% | `seed-1/pytorch_model.ckpt` |
| 2 | 58.10% | `seed-2/pytorch_model.ckpt` |
**Note:** Individual seeds were not evaluated on the holdout set. The ensemble of all 5 seeds achieved **65.8% F1-macro** on the competition holdout.
## Quick Start
### Single Model Inference
```python
import torch
from huggingface_hub import hf_hub_download
from libribrain_experiments.models.configurable_modules.classification_module import (
ClassificationModule,
)
# Download best checkpoint (seed-7)
checkpoint_path = hf_hub_download(
repo_id="zuazo/megconformer-phoneme-classification",
filename="seed-7/pytorch_model.ckpt",
)
# Choose device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load model
model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device)
model.eval()
# Inference
meg_signal = torch.randn(1, 306, 125, device=device) # (batch, channels, time)
with torch.no_grad():
logits = model(meg_signal)
probabilities = torch.softmax(logits, dim=1)
prediction = torch.argmax(logits, dim=1)
print(f"Predicted phoneme class: {prediction.item()}")
print(f"Confidence: {probabilities[0, prediction].item():.2%}")
```
### Ensemble Inference (Recommended)
The ensemble approach averages predictions from all 5 seeds and achieves the best performance:
```python
import torch
from huggingface_hub import hf_hub_download
from libribrain_experiments.models.configurable_modules.classification_module import (
ClassificationModule,
)
# Choose device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Load all available seeds (as in the paper)
seeds = [7, 18, 17, 1, 2]
models = []
for seed in seeds:
checkpoint_path = hf_hub_download(
repo_id="zuazo/megconformer-phoneme-classification",
filename=f"seed-{seed}/pytorch_model.ckpt",
)
model = ClassificationModule.load_from_checkpoint(
checkpoint_path, map_location=device
)
model.eval().to(device)
models.append(model)
# Example MEG input: (batch=1, channels=306, time=125)
meg_signal = torch.randn(1, 306, 125, device=device)
with torch.no_grad():
probs_list = []
preds_list = []
for model in models:
logits = model(meg_signal) # (1, C)
probs = torch.softmax(logits, dim=1) # (1, C)
probs_list.append(probs)
preds_list.append(probs.argmax(dim=1)) # (1,)
# Stack predictions from all models: shape (num_models, batch_size)
preds = torch.stack(preds_list, dim=0) # (M, 1)
# We have a single example in the batch, so index 0
per_model_preds = preds[:, 0] # (M,)
num_classes = probs_list[0].size(1)
# Count votes per class
votes = torch.bincount(per_model_preds, minlength=num_classes).float()
# Majority-vote class (ties resolved by smallest index)
majority_class = int(votes.argmax().item())
# "Confidence" = fraction of models voting for the chosen class
confidence = (votes[majority_class] / votes.sum()).item()
print(f"Ensemble (majority vote) predicted phoneme class: {majority_class}")
print(f"Vote share for that class: {confidence:.2%}")
```
## Model Details
- **Architecture**: Conformer (custom size)
- Hidden size: 256
- FFN dim: 2048
- Layers: 7
- Attention heads: 12
- Depthwise conv kernel: 31
- **Input**: 306-channel MEG signals
- **Window size**: 0.5 seconds (125 samples at 250 Hz)
- **Output**: 39-class phoneme classification (ARPAbet phoneme set)
- **Training**: [LibriBrain](https://huggingface.co/datasets/pnpl/LibriBrain) 2025 Standard track
- **Grouping**: 100 single-trial examples averaged per training sample
## Reproducibility
All 5 random seeds are provided. For best results on new data, we recommend using the ensemble approach, which achieved **65.8% F1-macro** on the competition holdout set.
## Citation
```bibtex
@misc{dezuazo2025megconformerconformerbasedmegdecoder,
title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification},
author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas},
year={2025},
eprint={2512.01443},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.01443},
}
```
## License
The 3-Clause BSD License
## Links
- **Paper**: [arXiv:2512.01443](https://arxiv.org/abs/2512.01443)
- **Code**: [GitHub](https://github.com/neural2speech/libribrain-experiments)
- **Competition**: [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/) |