nolitai-2b โ Meeting Intelligence Model (MLX 4-bit)
A fine-tuned Qwen3-1.7B model specialized for extracting structured meeting intelligence from transcripts. Optimized for Apple Silicon inference via MLX.
Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3-1.7B |
| Parameters | 1.7B (4-bit quantized, ~948 MB) |
| Training | QLoRA (rank=16, alpha=640, scale=40x) on q/k/v/o attention projections |
| Framework | MLX (Apple Silicon optimized) |
| Languages | English, Portuguese, Spanish, French, German |
Capabilities
Given a meeting transcript, nolitai-2b extracts:
- Action Items โ Tasks with owners, deadlines, and priority
- Decisions โ Key decisions made during the meeting
- Key Points โ Important discussion topics
- Questions โ Open questions raised but not resolved
- Summaries โ Concise, specific meeting summaries (no filler phrases)
Example
Input:
Extract insights from this meeting transcript:
[10:00] Sarah: We need to finalize the Q4 budget by Friday.
[10:02] Mike: I'll prepare the marketing numbers today.
[10:05] Sarah: Great. Let's also decide on the conference โ I vote for Web Summit.
[10:07] Mike: Agreed. Web Summit it is.
Output:
{
"actionItems": [
{"task": "Prepare marketing numbers for Q4 budget", "owner": "Mike", "deadline": "today", "priority": "high"}
],
"decisions": [
{"content": "Attending Web Summit conference", "madeBy": "Sarah, Mike"}
],
"keyPoints": [
{"content": "Q4 budget finalization deadline is Friday"}
],
"questions": []
}
Performance
Evaluated on a held-out validation set (97.4% overall):
| Task | Score |
|---|---|
| Insight Extraction (action items, decisions, questions) | 100% |
| Meeting Summaries | 94.1% |
| Overall | 97.4% |
Usage with MLX
from mlx_lm import load, generate
model, tokenizer = load("SearchingBinary/nolitai-2b")
prompt = """Extract insights from this meeting transcript:
[10:00] Alice: The new API is ready for testing.
[10:02] Bob: I'll write the integration tests by Wednesday.
[10:05] Alice: Should we use the staging or production environment?
"""
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=text, max_tokens=500)
print(response)
Usage with Swift (MLX Swift)
import MLXLLM
let model = try await LLMModelFactory.shared.load(
hub: .init(id: "SearchingBinary/nolitai-2b")
)
Training Details
- Method: QLoRA (4-bit NF4 quantization + LoRA adapters)
- LoRA Config: rank=16, alpha=640 (scale=40x), dropout=0.05
- Target Modules: q_proj, k_proj, v_proj, o_proj
- Dataset: ~10K examples across 5 languages (en, pt, es, fr, de)
- Epochs: 2
- Learning Rate: 1e-5 (cosine scheduler, 5% warmup)
- Hardware: NVIDIA A40 48GB (RunPod)
- Training Time: ~85 minutes
- Final Eval Loss: 0.0178 (98.2% token accuracy)
Intended Use
This model is designed for:
- On-device meeting intelligence extraction
- Real-time meeting summarization on Apple Silicon Macs
- Multilingual meeting support (5 languages)
Limitations
- Optimized for meeting transcripts โ may not generalize well to other text formats
- Best results with structured transcript input (timestamps + speaker labels)
- 4-bit quantization may slightly reduce quality vs full precision
- Requires Apple Silicon (M1/M2/M3/M4) for MLX inference
Part of nolit.ai
This model powers nolit.ai โ a native macOS meeting copilot that processes everything locally on your Mac. Not Lost in Translation โ lit up by AI.
License
Apache 2.0
- Downloads last month
- 214
Model size
0.3B params
Tensor type
F16
ยท
U32
ยท
Hardware compatibility
Log In
to add your hardware
4-bit