granite-4.0-h-1b-DISTILL-glm-4.7
This model is a fine-tuned version of ibm-granite/granite-4.0-h-1b trained on conversational data.
Model Details
- Base Model: ibm-granite/granite-4.0-h-1b
- Fine-tuning Dataset: TeichAI/glm-4.7-2000x
- Training Loss: 0.6364
- Context Length: 1048576 tokens
Quantized Versions (GGUF)
🔗 GGUF versions available here: granite-4.0-h-1b-DISTILL-glm-4.7-GGUF
| Format | Size | Use Case |
|---|---|---|
| Q2_K | Smallest | Low memory, reduced quality |
| Q4_K_M | Recommended | Best balance |
| Q5_K_M | Good | Higher quality |
| Q8_0 | Large | Near lossless |
| F16 | Largest | Original precision |
Usage
Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7")
tokenizer = AutoTokenizer.from_pretrained("glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7")
messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Ollama (GGUF)
ollama run hf.co/glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7-GGUF:Q4_K_M
llama.cpp
llama-cli --hf-repo glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7-GGUF --hf-file granite-4.0-h-1b-distill-glm-4.7-q4_k_m.gguf -p "Hello"
Training Details
- Epochs: 3
- Learning Rate: 2e-5
- Batch Size: 1 (with gradient accumulation)
- Precision: FP16
- Hardware: Multi-GPU with DeepSpeed ZeRO-3
License
Apache 2.0
- Downloads last month
- 16