File size: 5,558 Bytes

dfbb306

---
license: bsd-3-clause
tags:
- meg
- brain-signals
- phoneme-classification
- conformer
- libribrain
- speech-recognition
datasets:
- pnpl/LibriBrain
metrics:
- f1
library_name: pytorch

model-index:
- name: megconformer-phoneme-classification
  results:
  - task:
      type: audio-classification
      name: Phoneme classification
    dataset:
      name: LibriBrain 2025 PNPL (Standard track, phoneme task)
      type: pnpl/LibriBrain
      split: holdout
    metrics:
    - name: F1-macro
      type: f1
      value: 0.6583   # 65.83 %
      args:
        average: macro
---

# MEGConformer for Phoneme Classification

Conformer-based MEG decoder for 39-class phoneme classification from ARPAbet phoneme set, trained with 5 different random seeds.

## Model Performance

| Seed | Val F1-Macro | Checkpoint |
|------|--------------|------------|
| 7 (best) | **63.92%** | `seed-7/pytorch_model.ckpt` |
| 18 | 63.86% | `seed-18/pytorch_model.ckpt` |
| 17 | 58.74% | `seed-17/pytorch_model.ckpt` |
| 1 | 58.64% | `seed-1/pytorch_model.ckpt` |
| 2 | 58.10% | `seed-2/pytorch_model.ckpt` |

**Note:** Individual seeds were not evaluated on the holdout set. The ensemble of all 5 seeds achieved **65.8% F1-macro** on the competition holdout.

## Quick Start

### Single Model Inference
```python
import torch
from huggingface_hub import hf_hub_download

from libribrain_experiments.models.configurable_modules.classification_module import (
    ClassificationModule,
)

# Download best checkpoint (seed-7)
checkpoint_path = hf_hub_download(
    repo_id="zuazo/megconformer-phoneme-classification",
    filename="seed-7/pytorch_model.ckpt",
)

# Choose device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model
model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device)
model.eval()

# Inference
meg_signal = torch.randn(1, 306, 125, device=device)  # (batch, channels, time)

with torch.no_grad():
    logits = model(meg_signal)
    probabilities = torch.softmax(logits, dim=1)
    prediction = torch.argmax(logits, dim=1)

print(f"Predicted phoneme class: {prediction.item()}")
print(f"Confidence: {probabilities[0, prediction].item():.2%}")
```

### Ensemble Inference (Recommended)

The ensemble approach averages predictions from all 5 seeds and achieves the best performance:
```python
import torch
from huggingface_hub import hf_hub_download

from libribrain_experiments.models.configurable_modules.classification_module import (
    ClassificationModule,
)

# Choose device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load all available seeds (as in the paper)
seeds = [7, 18, 17, 1, 2]
models = []

for seed in seeds:
    checkpoint_path = hf_hub_download(
        repo_id="zuazo/megconformer-phoneme-classification",
        filename=f"seed-{seed}/pytorch_model.ckpt",
    )
    model = ClassificationModule.load_from_checkpoint(
        checkpoint_path, map_location=device
    )
    model.eval().to(device)
    models.append(model)

# Example MEG input: (batch=1, channels=306, time=125)
meg_signal = torch.randn(1, 306, 125, device=device)

with torch.no_grad():
    probs_list = []
    preds_list = []

    for model in models:
        logits = model(meg_signal)  # (1, C)
        probs = torch.softmax(logits, dim=1)  # (1, C)
        probs_list.append(probs)
        preds_list.append(probs.argmax(dim=1))  # (1,)

    # Stack predictions from all models: shape (num_models, batch_size)
    preds = torch.stack(preds_list, dim=0)  # (M, 1)

    # We have a single example in the batch, so index 0
    per_model_preds = preds[:, 0]  # (M,)

    num_classes = probs_list[0].size(1)
    # Count votes per class
    votes = torch.bincount(per_model_preds, minlength=num_classes).float()

    # Majority-vote class (ties resolved by smallest index)
    majority_class = int(votes.argmax().item())

    # "Confidence" = fraction of models voting for the chosen class
    confidence = (votes[majority_class] / votes.sum()).item()

print(f"Ensemble (majority vote) predicted phoneme class: {majority_class}")
print(f"Vote share for that class: {confidence:.2%}")
```

## Model Details

- **Architecture**: Conformer (custom size)
  - Hidden size: 256
  - FFN dim: 2048
  - Layers: 7
  - Attention heads: 12
  - Depthwise conv kernel: 31
- **Input**: 306-channel MEG signals
- **Window size**: 0.5 seconds (125 samples at 250 Hz)
- **Output**: 39-class phoneme classification (ARPAbet phoneme set)
- **Training**: [LibriBrain](https://huggingface.co/datasets/pnpl/LibriBrain) 2025 Standard track
- **Grouping**: 100 single-trial examples averaged per training sample

## Reproducibility

All 5 random seeds are provided. For best results on new data, we recommend using the ensemble approach, which achieved **65.8% F1-macro** on the competition holdout set.

## Citation
```bibtex
@misc{dezuazo2025megconformerconformerbasedmegdecoder,
      title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification}, 
      author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas},
      year={2025},
      eprint={2512.01443},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.01443}, 
}
```

## License

The 3-Clause BSD License

## Links

- **Paper**: [arXiv:2512.01443](https://arxiv.org/abs/2512.01443)
- **Code**: [GitHub](https://github.com/neural2speech/libribrain-experiments)
- **Competition**: [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/)