granite-4.0-h-1b-DISTILL-glm-4.7

This model is a fine-tuned version of ibm-granite/granite-4.0-h-1b trained on conversational data.

Model Details

  • Base Model: ibm-granite/granite-4.0-h-1b
  • Fine-tuning Dataset: TeichAI/glm-4.7-2000x
  • Training Loss: 0.6364
  • Context Length: 1048576 tokens

Quantized Versions (GGUF)

🔗 GGUF versions available here: granite-4.0-h-1b-DISTILL-glm-4.7-GGUF

Format Size Use Case
Q2_K Smallest Low memory, reduced quality
Q4_K_M Recommended Best balance
Q5_K_M Good Higher quality
Q8_0 Large Near lossless
F16 Largest Original precision

Usage

Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7")
tokenizer = AutoTokenizer.from_pretrained("glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7")

messages = [{"role": "user", "content": "Hello, how are you?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Ollama (GGUF)

ollama run hf.co/glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7-GGUF:Q4_K_M

llama.cpp

llama-cli --hf-repo glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7-GGUF --hf-file granite-4.0-h-1b-distill-glm-4.7-q4_k_m.gguf -p "Hello"

Training Details

  • Epochs: 3
  • Learning Rate: 2e-5
  • Batch Size: 1 (with gradient accumulation)
  • Precision: FP16
  • Hardware: Multi-GPU with DeepSpeed ZeRO-3

License

Apache 2.0

Downloads last month
16
Safetensors
Model size
1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7

Finetuned
(7)
this model
Quantizations
1 model

Dataset used to train glogwa68/granite-4.0-h-1b-DISTILL-glm-4.7