π JiRack Ternary 1.5B (Alpha v1.3 + RoPe fix )
High-Performance Ternary-Quantized Transformer | [PATENT PENDING]
Inventor: Konstantin Vladimirovich Grabko
Organization: CMS Manhattan JiRack Technology
Official Site: www.cmsmanhattan.com
β οΈ Intellectual Property Notice
The architecture, weights, and methods (BRE, SWA Fusion, and HD-FFN) contained herein are the proprietary intellectual property of Konstantin Vladimirovich Grabko.
- Status: Patent Pending (U.S. & International Claims Filed).
- Usage: Commercial use requires a signed execution of the CMS Manhattan JiRack License V.1.2.
π Project Overview: The 1.5B "Wide-FFN" Advantage
JiRack-1.5B is a redesigned architecture powered by the JiRack BitNet v2.0 Specification. It is specifically optimized for high-throughput inference on non-NVIDIA hardware (AMD ROCm/HIP) and cloud-native environments (AWS Lightsail, GCP, Azure) without heavy CPU reliance.
Key Innovations:
- High-Density FFN (HD-FFN): Unlike standard 1B models, JiRack-1.5B utilizes an 8192-dimension intermediate layer within ternary constraints, offering 3B-class semantic quality.
- SWA Fusion (SwiGLU-Attention): A novel compute kernel that fused FFN and Attention passes, maintaining thermal stability < 80Β°C even under heavy load.
- Buffered Routing Embedding (BRE): Minimizes data movement between HBM and compute units, specifically optimized for ROCm environments.
β οΈ Changes
Alpha v1.3 has RoPe fix. So high Quality response now
Old model renamed to model_tag.safetensors . it is for RAG with many counter questions style .
The model had trainings with Supervised fine tuning technics.
It is sasy to use LoRa to fine tune as LLama 3.2 1b LoRa adapter .
Will do model triple compression for production soon
π Technical Specifications
| Feature | JiRack-1.5B Specification |
|---|---|
| Base Model | Meta-Llama-3.2-1B (Redesigned) |
| Hidden Size | 2048 |
| Intermediate Size | 8192 (Ultra-Wide for 1B class) |
| Layer Count | 16 Decoder Layers |
| Quantization | 1.58-bit (Ternary {-1, 0, 1}) |
| VRAM Requirement | ~2.5 GB (including KV-Cache & LoRA) |
| Size in RAM | Model will get double or triple compression for production |
Training & Fine-Tuning Efficiency
β Consumer GPU Ready: Fine-tuning via LoRA (r=8) requires only ~8 MB of additional weights. β 70% VRAM Reduction: Fits on RTX 3060, GTX 1660 Ti, or high-end integrated GPUs.
π¬ Chat Demo (Alpha Phase)
Current Checkpoint: Feb 26, 2026 (Logic Verification Mode)
User: Hello
JiRack: What are youβre talking to? Can I help with that?
(Gen Time: 54.10s on CPU)
User: Tell me a joke
JiRack: What is it? Do you want to hear jokes about how the punch line works, and why Iβm called βa dickheadβ?
(Gen Time: 86.79s on CPU)
Note: The current
.ptfile is ~6 GB due to heavy embedded metadata for research tracking. The final production version will be cleaned and compressed to ~1.5 GB - 2.0 GB.
π Repository Contents
JiRackTernaryPyTorch_1b.py: Core architecture file.invention_description.md: Detailed technical breakdown for patent examiners.performance_data.md: Benchmarks on ROCm/AMD hardware.NDA.md: Confidentiality agreement for commercial evaluators.
π§ Contact & Licensing
For joint venture opportunities, hardware integration, or licensing inquiries:
- Email: grabko@cmsmanhattan.com
- Phone: +1 (516) 777-0945
- Location: New York, USA
- Downloads last month
- 199