💎 JiRack Ternary 1.5B (Alpha v1.3 + RoPe fix )

High-Performance Ternary-Quantized Transformer | [PATENT PENDING]

Inventor: Konstantin Vladimirovich Grabko
Organization: CMS Manhattan JiRack Technology
Official Site: www.cmsmanhattan.com

⚠️ Intellectual Property Notice

The architecture, weights, and methods (BRE, SWA Fusion, and HD-FFN) contained herein are the proprietary intellectual property of Konstantin Vladimirovich Grabko.

Status: Patent Pending (U.S. & International Claims Filed).
Usage: Commercial use requires a signed execution of the CMS Manhattan JiRack License V.1.2.

🚀 Project Overview: The 1.5B "Wide-FFN" Advantage

JiRack-1.5B is a redesigned architecture powered by the JiRack BitNet v2.0 Specification. It is specifically optimized for high-throughput inference on non-NVIDIA hardware (AMD ROCm/HIP) and cloud-native environments (AWS Lightsail, GCP, Azure) without heavy CPU reliance.

Key Innovations:

High-Density FFN (HD-FFN): Unlike standard 1B models, JiRack-1.5B utilizes an 8192-dimension intermediate layer within ternary constraints, offering 3B-class semantic quality.
SWA Fusion (SwiGLU-Attention): A novel compute kernel that fused FFN and Attention passes, maintaining thermal stability < 80°C even under heavy load.
Buffered Routing Embedding (BRE): Minimizes data movement between HBM and compute units, specifically optimized for ROCm environments.

⚠️ Changes

Alpha v1.3 has RoPe fix. So high Quality response now Old model renamed to model_tag.safetensors . it is for RAG with many counter questions style . The model had trainings with Supervised fine tuning technics. It is sasy to use LoRa to fine tune as LLama 3.2 1b LoRa adapter .
Will do model triple compression for production soon

📊 Technical Specifications

Feature	JiRack-1.5B Specification
Base Model	Meta-Llama-3.2-1B (Redesigned)
Hidden Size	2048
Intermediate Size	8192 (Ultra-Wide for 1B class)
Layer Count	16 Decoder Layers
Quantization	1.58-bit (Ternary {-1, 0, 1})
VRAM Requirement	~2.5 GB (including KV-Cache & LoRA)
Size in RAM	Model will get double or triple compression for production

Training & Fine-Tuning Efficiency

✅ Consumer GPU Ready: Fine-tuning via LoRA (r=8) requires only ~8 MB of additional weights. ✅ 70% VRAM Reduction: Fits on RTX 3060, GTX 1660 Ti, or high-end integrated GPUs.

💬 Chat Demo (Alpha Phase)

Current Checkpoint: Feb 26, 2026 (Logic Verification Mode)

User: Hello
JiRack: What are you’re talking to? Can I help with that?
(Gen Time: 54.10s on CPU)

User: Tell me a joke
JiRack: What is it? Do you want to hear jokes about how the punch line works, and why I’m called “a dickhead”?
(Gen Time: 86.79s on CPU)

Note: The current .pt file is ~6 GB due to heavy embedded metadata for research tracking. The final production version will be cleaned and compressed to ~1.5 GB - 2.0 GB.

📂 Repository Contents

JiRackTernaryPyTorch_1b.py: Core architecture file.
invention_description.md: Detailed technical breakdown for patent examiners.
performance_data.md: Benchmarks on ROCm/AMD hardware.
NDA.md: Confidentiality agreement for commercial evaluators.

📧 Contact & Licensing

For joint venture opportunities, hardware integration, or licensing inquiries:

Email: grabko@cmsmanhattan.com
Phone: +1 (516) 777-0945
Location: New York, USA

Downloads last month: 199