πŸ’Ž JiRack Ternary 1.5B (Alpha v1.3 + RoPe fix )

High-Performance Ternary-Quantized Transformer | [PATENT PENDING]

Inventor: Konstantin Vladimirovich Grabko
Organization: CMS Manhattan JiRack Technology
Official Site: www.cmsmanhattan.com


⚠️ Intellectual Property Notice

The architecture, weights, and methods (BRE, SWA Fusion, and HD-FFN) contained herein are the proprietary intellectual property of Konstantin Vladimirovich Grabko.

  • Status: Patent Pending (U.S. & International Claims Filed).
  • Usage: Commercial use requires a signed execution of the CMS Manhattan JiRack License V.1.2.

πŸš€ Project Overview: The 1.5B "Wide-FFN" Advantage

JiRack-1.5B is a redesigned architecture powered by the JiRack BitNet v2.0 Specification. It is specifically optimized for high-throughput inference on non-NVIDIA hardware (AMD ROCm/HIP) and cloud-native environments (AWS Lightsail, GCP, Azure) without heavy CPU reliance.

Key Innovations:

  1. High-Density FFN (HD-FFN): Unlike standard 1B models, JiRack-1.5B utilizes an 8192-dimension intermediate layer within ternary constraints, offering 3B-class semantic quality.
  2. SWA Fusion (SwiGLU-Attention): A novel compute kernel that fused FFN and Attention passes, maintaining thermal stability < 80Β°C even under heavy load.
  3. Buffered Routing Embedding (BRE): Minimizes data movement between HBM and compute units, specifically optimized for ROCm environments.

⚠️ Changes

Alpha v1.3 has RoPe fix. So high Quality response now Old model renamed to model_tag.safetensors . it is for RAG with many counter questions style . The model had trainings with Supervised fine tuning technics. It is sasy to use LoRa to fine tune as LLama 3.2 1b LoRa adapter .
Will do model triple compression for production soon


πŸ“Š Technical Specifications

Feature JiRack-1.5B Specification
Base Model Meta-Llama-3.2-1B (Redesigned)
Hidden Size 2048
Intermediate Size 8192 (Ultra-Wide for 1B class)
Layer Count 16 Decoder Layers
Quantization 1.58-bit (Ternary {-1, 0, 1})
VRAM Requirement ~2.5 GB (including KV-Cache & LoRA)
Size in RAM Model will get double or triple compression for production

Training & Fine-Tuning Efficiency

βœ… Consumer GPU Ready: Fine-tuning via LoRA (r=8) requires only ~8 MB of additional weights. βœ… 70% VRAM Reduction: Fits on RTX 3060, GTX 1660 Ti, or high-end integrated GPUs.


πŸ’¬ Chat Demo (Alpha Phase)

Current Checkpoint: Feb 26, 2026 (Logic Verification Mode)

User: Hello
JiRack: What are you’re talking to? Can I help with that?
(Gen Time: 54.10s on CPU)

User: Tell me a joke
JiRack: What is it? Do you want to hear jokes about how the punch line works, and why I’m called β€œa dickhead”?
(Gen Time: 86.79s on CPU)

Note: The current .pt file is ~6 GB due to heavy embedded metadata for research tracking. The final production version will be cleaned and compressed to ~1.5 GB - 2.0 GB.


πŸ“‚ Repository Contents

  • JiRackTernaryPyTorch_1b.py: Core architecture file.
  • invention_description.md: Detailed technical breakdown for patent examiners.
  • performance_data.md: Benchmarks on ROCm/AMD hardware.
  • NDA.md: Confidentiality agreement for commercial evaluators.

πŸ“§ Contact & Licensing

For joint venture opportunities, hardware integration, or licensing inquiries:

Downloads last month
199
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support