--- base_model: unsloth/gemma-3n-E4B-it tags: - text-generation-inference - transformers - unsloth - gemma3n - egyptian - codeswitch - arabic - masry license: apache-2.0 language: - en - ar datasets: - MohamedRashad/arabic-english-code-switching pipeline_tag: automatic-speech-recognition --- # ๐Ÿ‡ช๐Ÿ‡ฌ๐ŸŽ™ MasriSwitch-Gemma3n-Transcriber-v1 **MasriSwitch-Gemma3n-Transcriber** is an automatic speech transcription model specialized for **Egyptian Arabic** with strong **English code-switching** capabilities. This model is **one of the very few publicly available systems** explicitly optimized for: - Egyptian Arabic dialect transcription - Natural Arabic โ†” English code-switching - Short and medium-length real-world audio The model is trained using: - **MohamedRashad/arabic-english-code-switching** dataset - **A private Egyptian speech dataset** containing real conversational audio, voice notes, and mixed Arabic/English speech recordings --- ## ๐Ÿ” Overview **MasriSwitch-Gemma3n-Transcriber** is built on the Gemma3n conditional generation architecture and fine-tuned to understand natural Egyptian speech patterns, including mixed Arabic/English utterances commonly used in daily life, workplaces, and online content. It is suitable for: - Social media content transcription - Customer support calls - Meetings, voice notes, and interviews - Research in dialectal ASR - Multilingual speech processing --- ## โœจ Features - ๐Ÿ—ฃ Egyptian Arabic dialect-aware transcription - ๐Ÿ”€ Accurate English code-switching support - ๐ŸŽง Strong performance on informal, real-world speech - โšก Optimized for short (10โ€“30s) audio segments - ๐Ÿค– Built using the Gemma3n generation-based ASR pipeline --- ## ๐ŸŽฏ Intended Use Use this model for: - Speech-to-text systems - Captioning and subtitling - Chat or voice assistant pipelines - Indexing/searching Arabic audio content - Research and experimentation --- ## โš ๏ธ Limitations - Best results with clean audio and single speakers - Not optimized for Gulf, Levantine, or MSA-only speech - Struggles with: - Heavy noise - Overlapping speakers - Fast speech - Long recordings should be segmented (20โ€“30s recommended) --- ## ๐Ÿ›ก Safety & Privacy - Transcriptions may include sensitive user data โ€” handle with care. - Should not be used for high-stakes decisions without human review. - Biases in training data may affect accuracy. --- # ๐Ÿงช Inference Example (Python) ```python import torch from transformers import AutoProcessor, Gemma3nForConditionalGeneration MODEL_ID = "oddadmix/egyptian-code-switching-b4-g2-merged" def load_model_and_processor(model_id=MODEL_ID, device=None): if device is None: device = "cuda" if torch.cuda.is_available() else "cpu" print(f"Loading model {model_id} to device {device}...") model = Gemma3nForConditionalGeneration.from_pretrained( model_id, torch_dtype=torch.bfloat16 if device == "cuda" else None, device_map="auto" if device == "cuda" else None, ).eval() if not any(p.device.type == "cuda" for p in model.parameters()) and device == "cuda": model.to("cuda") processor = AutoProcessor.from_pretrained(model_id) return model, processor, device def transcribe_file(model, processor, audio_path, max_new_tokens=128): if not audio_path: raise ValueError("audio_path must point to an audio file") messages = [ { "role": "system", "content": [ {"type": "text", "text": "You are an assistant that transcribes speech accurately."} ], }, { "role": "user", "content": [ {"type": "audio", "url": audio_path}, {"type": "text", "text": "Please transcribe this audio."} ], }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ) device = next(model.parameters()).device inputs = {k: v.to(device) for k, v in inputs.items()} input_len = inputs["input_ids"].shape[-1] with torch.inference_mode(): generated = model.generate( **inputs, max_new_tokens=max_new_tokens, do_sample=False, ) gen_tokens = generated[0][input_len:] text = processor.decode(gen_tokens, skip_special_tokens=True) return text if __name__ == "__main__": audio_path = "path/to/audio.wav" model, processor, device = load_model_and_processor() transcription = transcribe_file(model, processor, audio_path, max_new_tokens=256) print("Transcription:", transcription)