File size: 5,558 Bytes
dfbb306
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
license: bsd-3-clause
tags:
- meg
- brain-signals
- phoneme-classification
- conformer
- libribrain
- speech-recognition
datasets:
- pnpl/LibriBrain
metrics:
- f1
library_name: pytorch

model-index:
- name: megconformer-phoneme-classification
  results:
  - task:
      type: audio-classification
      name: Phoneme classification
    dataset:
      name: LibriBrain 2025 PNPL (Standard track, phoneme task)
      type: pnpl/LibriBrain
      split: holdout
    metrics:
    - name: F1-macro
      type: f1
      value: 0.6583   # 65.83 %
      args:
        average: macro
---

# MEGConformer for Phoneme Classification

Conformer-based MEG decoder for 39-class phoneme classification from ARPAbet phoneme set, trained with 5 different random seeds.

## Model Performance

| Seed | Val F1-Macro | Checkpoint |
|------|--------------|------------|
| 7 (best) | **63.92%** | `seed-7/pytorch_model.ckpt` |
| 18 | 63.86% | `seed-18/pytorch_model.ckpt` |
| 17 | 58.74% | `seed-17/pytorch_model.ckpt` |
| 1 | 58.64% | `seed-1/pytorch_model.ckpt` |
| 2 | 58.10% | `seed-2/pytorch_model.ckpt` |

**Note:** Individual seeds were not evaluated on the holdout set. The ensemble of all 5 seeds achieved **65.8% F1-macro** on the competition holdout.

## Quick Start

### Single Model Inference
```python
import torch
from huggingface_hub import hf_hub_download

from libribrain_experiments.models.configurable_modules.classification_module import (
    ClassificationModule,
)

# Download best checkpoint (seed-7)
checkpoint_path = hf_hub_download(
    repo_id="zuazo/megconformer-phoneme-classification",
    filename="seed-7/pytorch_model.ckpt",
)

# Choose device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load model
model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device)
model.eval()

# Inference
meg_signal = torch.randn(1, 306, 125, device=device)  # (batch, channels, time)

with torch.no_grad():
    logits = model(meg_signal)
    probabilities = torch.softmax(logits, dim=1)
    prediction = torch.argmax(logits, dim=1)

print(f"Predicted phoneme class: {prediction.item()}")
print(f"Confidence: {probabilities[0, prediction].item():.2%}")
```

### Ensemble Inference (Recommended)

The ensemble approach averages predictions from all 5 seeds and achieves the best performance:
```python
import torch
from huggingface_hub import hf_hub_download

from libribrain_experiments.models.configurable_modules.classification_module import (
    ClassificationModule,
)

# Choose device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load all available seeds (as in the paper)
seeds = [7, 18, 17, 1, 2]
models = []

for seed in seeds:
    checkpoint_path = hf_hub_download(
        repo_id="zuazo/megconformer-phoneme-classification",
        filename=f"seed-{seed}/pytorch_model.ckpt",
    )
    model = ClassificationModule.load_from_checkpoint(
        checkpoint_path, map_location=device
    )
    model.eval().to(device)
    models.append(model)

# Example MEG input: (batch=1, channels=306, time=125)
meg_signal = torch.randn(1, 306, 125, device=device)

with torch.no_grad():
    probs_list = []
    preds_list = []

    for model in models:
        logits = model(meg_signal)  # (1, C)
        probs = torch.softmax(logits, dim=1)  # (1, C)
        probs_list.append(probs)
        preds_list.append(probs.argmax(dim=1))  # (1,)

    # Stack predictions from all models: shape (num_models, batch_size)
    preds = torch.stack(preds_list, dim=0)  # (M, 1)

    # We have a single example in the batch, so index 0
    per_model_preds = preds[:, 0]  # (M,)

    num_classes = probs_list[0].size(1)
    # Count votes per class
    votes = torch.bincount(per_model_preds, minlength=num_classes).float()

    # Majority-vote class (ties resolved by smallest index)
    majority_class = int(votes.argmax().item())

    # "Confidence" = fraction of models voting for the chosen class
    confidence = (votes[majority_class] / votes.sum()).item()

print(f"Ensemble (majority vote) predicted phoneme class: {majority_class}")
print(f"Vote share for that class: {confidence:.2%}")
```

## Model Details

- **Architecture**: Conformer (custom size)
  - Hidden size: 256
  - FFN dim: 2048
  - Layers: 7
  - Attention heads: 12
  - Depthwise conv kernel: 31
- **Input**: 306-channel MEG signals
- **Window size**: 0.5 seconds (125 samples at 250 Hz)
- **Output**: 39-class phoneme classification (ARPAbet phoneme set)
- **Training**: [LibriBrain](https://huggingface.co/datasets/pnpl/LibriBrain) 2025 Standard track
- **Grouping**: 100 single-trial examples averaged per training sample

## Reproducibility

All 5 random seeds are provided. For best results on new data, we recommend using the ensemble approach, which achieved **65.8% F1-macro** on the competition holdout set.

## Citation
```bibtex
@misc{dezuazo2025megconformerconformerbasedmegdecoder,
      title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification}, 
      author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas},
      year={2025},
      eprint={2512.01443},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.01443}, 
}
```

## License

The 3-Clause BSD License

## Links

- **Paper**: [arXiv:2512.01443](https://arxiv.org/abs/2512.01443)
- **Code**: [GitHub](https://github.com/neural2speech/libribrain-experiments)
- **Competition**: [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/)