nolitai-2b โ€” Meeting Intelligence Model (MLX 4-bit)

A fine-tuned Qwen3-1.7B model specialized for extracting structured meeting intelligence from transcripts. Optimized for Apple Silicon inference via MLX.

Model Details

Property Value
Base Model Qwen/Qwen3-1.7B
Parameters 1.7B (4-bit quantized, ~948 MB)
Training QLoRA (rank=16, alpha=640, scale=40x) on q/k/v/o attention projections
Framework MLX (Apple Silicon optimized)
Languages English, Portuguese, Spanish, French, German

Capabilities

Given a meeting transcript, nolitai-2b extracts:

  • Action Items โ€” Tasks with owners, deadlines, and priority
  • Decisions โ€” Key decisions made during the meeting
  • Key Points โ€” Important discussion topics
  • Questions โ€” Open questions raised but not resolved
  • Summaries โ€” Concise, specific meeting summaries (no filler phrases)

Example

Input:

Extract insights from this meeting transcript:

[10:00] Sarah: We need to finalize the Q4 budget by Friday.
[10:02] Mike: I'll prepare the marketing numbers today.
[10:05] Sarah: Great. Let's also decide on the conference โ€” I vote for Web Summit.
[10:07] Mike: Agreed. Web Summit it is.

Output:

{
  "actionItems": [
    {"task": "Prepare marketing numbers for Q4 budget", "owner": "Mike", "deadline": "today", "priority": "high"}
  ],
  "decisions": [
    {"content": "Attending Web Summit conference", "madeBy": "Sarah, Mike"}
  ],
  "keyPoints": [
    {"content": "Q4 budget finalization deadline is Friday"}
  ],
  "questions": []
}

Performance

Evaluated on a held-out validation set (97.4% overall):

Task Score
Insight Extraction (action items, decisions, questions) 100%
Meeting Summaries 94.1%
Overall 97.4%

Usage with MLX

from mlx_lm import load, generate

model, tokenizer = load("SearchingBinary/nolitai-2b")

prompt = """Extract insights from this meeting transcript:

[10:00] Alice: The new API is ready for testing.
[10:02] Bob: I'll write the integration tests by Wednesday.
[10:05] Alice: Should we use the staging or production environment?
"""

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
response = generate(model, tokenizer, prompt=text, max_tokens=500)
print(response)

Usage with Swift (MLX Swift)

import MLXLLM

let model = try await LLMModelFactory.shared.load(
    hub: .init(id: "SearchingBinary/nolitai-2b")
)

Training Details

  • Method: QLoRA (4-bit NF4 quantization + LoRA adapters)
  • LoRA Config: rank=16, alpha=640 (scale=40x), dropout=0.05
  • Target Modules: q_proj, k_proj, v_proj, o_proj
  • Dataset: ~10K examples across 5 languages (en, pt, es, fr, de)
  • Epochs: 2
  • Learning Rate: 1e-5 (cosine scheduler, 5% warmup)
  • Hardware: NVIDIA A40 48GB (RunPod)
  • Training Time: ~85 minutes
  • Final Eval Loss: 0.0178 (98.2% token accuracy)

Intended Use

This model is designed for:

  • On-device meeting intelligence extraction
  • Real-time meeting summarization on Apple Silicon Macs
  • Multilingual meeting support (5 languages)

Limitations

  • Optimized for meeting transcripts โ€” may not generalize well to other text formats
  • Best results with structured transcript input (timestamps + speaker labels)
  • 4-bit quantization may slightly reduce quality vs full precision
  • Requires Apple Silicon (M1/M2/M3/M4) for MLX inference

Part of nolit.ai

This model powers nolit.ai โ€” a native macOS meeting copilot that processes everything locally on your Mac. Not Lost in Translation โ€” lit up by AI.

License

Apache 2.0

Downloads last month
214
Safetensors
Model size
0.3B params
Tensor type
F16
ยท
U32
ยท
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SearchingBinary/nolitai-2b

Finetuned
Qwen/Qwen3-1.7B
Adapter
(307)
this model