--- license: bsd-3-clause tags: - meg - brain-signals - phoneme-classification - conformer - libribrain - speech-recognition datasets: - pnpl/LibriBrain metrics: - f1 library_name: pytorch model-index: - name: megconformer-phoneme-classification results: - task: type: audio-classification name: Phoneme classification dataset: name: LibriBrain 2025 PNPL (Standard track, phoneme task) type: pnpl/LibriBrain split: holdout metrics: - name: F1-macro type: f1 value: 0.6583 # 65.83 % args: average: macro --- # MEGConformer for Phoneme Classification Conformer-based MEG decoder for 39-class phoneme classification from ARPAbet phoneme set, trained with 5 different random seeds. ## Model Performance | Seed | Val F1-Macro | Checkpoint | |------|--------------|------------| | 7 (best) | **63.92%** | `seed-7/pytorch_model.ckpt` | | 18 | 63.86% | `seed-18/pytorch_model.ckpt` | | 17 | 58.74% | `seed-17/pytorch_model.ckpt` | | 1 | 58.64% | `seed-1/pytorch_model.ckpt` | | 2 | 58.10% | `seed-2/pytorch_model.ckpt` | **Note:** Individual seeds were not evaluated on the holdout set. The ensemble of all 5 seeds achieved **65.8% F1-macro** on the competition holdout. ## Quick Start ### Single Model Inference ```python import torch from huggingface_hub import hf_hub_download from libribrain_experiments.models.configurable_modules.classification_module import ( ClassificationModule, ) # Download best checkpoint (seed-7) checkpoint_path = hf_hub_download( repo_id="zuazo/megconformer-phoneme-classification", filename="seed-7/pytorch_model.ckpt", ) # Choose device device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Load model model = ClassificationModule.load_from_checkpoint(checkpoint_path, map_location=device) model.eval() # Inference meg_signal = torch.randn(1, 306, 125, device=device) # (batch, channels, time) with torch.no_grad(): logits = model(meg_signal) probabilities = torch.softmax(logits, dim=1) prediction = torch.argmax(logits, dim=1) print(f"Predicted phoneme class: {prediction.item()}") print(f"Confidence: {probabilities[0, prediction].item():.2%}") ``` ### Ensemble Inference (Recommended) The ensemble approach averages predictions from all 5 seeds and achieves the best performance: ```python import torch from huggingface_hub import hf_hub_download from libribrain_experiments.models.configurable_modules.classification_module import ( ClassificationModule, ) # Choose device device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Load all available seeds (as in the paper) seeds = [7, 18, 17, 1, 2] models = [] for seed in seeds: checkpoint_path = hf_hub_download( repo_id="zuazo/megconformer-phoneme-classification", filename=f"seed-{seed}/pytorch_model.ckpt", ) model = ClassificationModule.load_from_checkpoint( checkpoint_path, map_location=device ) model.eval().to(device) models.append(model) # Example MEG input: (batch=1, channels=306, time=125) meg_signal = torch.randn(1, 306, 125, device=device) with torch.no_grad(): probs_list = [] preds_list = [] for model in models: logits = model(meg_signal) # (1, C) probs = torch.softmax(logits, dim=1) # (1, C) probs_list.append(probs) preds_list.append(probs.argmax(dim=1)) # (1,) # Stack predictions from all models: shape (num_models, batch_size) preds = torch.stack(preds_list, dim=0) # (M, 1) # We have a single example in the batch, so index 0 per_model_preds = preds[:, 0] # (M,) num_classes = probs_list[0].size(1) # Count votes per class votes = torch.bincount(per_model_preds, minlength=num_classes).float() # Majority-vote class (ties resolved by smallest index) majority_class = int(votes.argmax().item()) # "Confidence" = fraction of models voting for the chosen class confidence = (votes[majority_class] / votes.sum()).item() print(f"Ensemble (majority vote) predicted phoneme class: {majority_class}") print(f"Vote share for that class: {confidence:.2%}") ``` ## Model Details - **Architecture**: Conformer (custom size) - Hidden size: 256 - FFN dim: 2048 - Layers: 7 - Attention heads: 12 - Depthwise conv kernel: 31 - **Input**: 306-channel MEG signals - **Window size**: 0.5 seconds (125 samples at 250 Hz) - **Output**: 39-class phoneme classification (ARPAbet phoneme set) - **Training**: [LibriBrain](https://huggingface.co/datasets/pnpl/LibriBrain) 2025 Standard track - **Grouping**: 100 single-trial examples averaged per training sample ## Reproducibility All 5 random seeds are provided. For best results on new data, we recommend using the ensemble approach, which achieved **65.8% F1-macro** on the competition holdout set. ## Citation ```bibtex @misc{dezuazo2025megconformerconformerbasedmegdecoder, title={MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification}, author={Xabier de Zuazo and Ibon Saratxaga and Eva Navas}, year={2025}, eprint={2512.01443}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2512.01443}, } ``` ## License The 3-Clause BSD License ## Links - **Paper**: [arXiv:2512.01443](https://arxiv.org/abs/2512.01443) - **Code**: [GitHub](https://github.com/neural2speech/libribrain-experiments) - **Competition**: [LibriBrain 2025](https://neural-processing-lab.github.io/2025-libribrain-competition/)