YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Model Card for STiFLeR7/Phi-3-mini-4k-gguf
Model Details
Model Description
This is the model card of a ๐ค Transformers-compatible model pushed on the Hub. The model is a Phi-3 Mini 4K Instruct variant quantized into GGUF 4-bit format (~2.3 GB) for efficient inference with llama.cpp, llama-cpp-python, or Ollama.
- Developed by: Microsoft (base model), quantized and shared by STiFLeR7
- Funded by [optional]: [More Information Needed]
- Shared by [optional]: STiFLeR7
- Model type: Causal Language Model (decoder-only)
- Language(s): English (primary)
- License: Responsible AI License (from Microsoft Phi-3)
- Finetuned from: microsoft/Phi-3-mini-4k-instruct
Model Sources
- Repository: STiFLeR7/Phi-3-mini-4k-gguf
- Paper [optional]: [More Information Needed]
- Demo [optional]: [More Information Needed]
Uses
Direct Use
- Text generation in resource-constrained environments.
- Running instruction-tuned Phi-3 Mini on CPU or GPU via GGUF.
- Edge inference and deployment with
llama.cpp,llama-cpp-python, or Ollama.
Downstream Use [optional]
- Can serve as a lightweight assistant for experimentation.
- Basis for further evaluation and benchmarking of quantized models.
Out-of-Scope Use
- High-stakes decision making (e.g., medical, legal, financial).
- Safety-critical applications.
- Applications requiring full precision or fine-tuning.
Bias, Risks, and Limitations
- Model inherits biases and limitations from the Phi-3 Mini 4K base.
- Quantization may reduce reasoning and factual accuracy.
- Not optimized for fine-tuning.
- Context length limited to 4K tokens.
Recommendations: Users should be aware of biases and reduced accuracy introduced by quantization.
How to Get Started with the Model
With llama.cpp
./main -m Phi-3-mini-4k-gguf/model.gguf -p "Write a short poem about AI."
With llama-cpp-python
from llama_cpp import Llama
llm = Llama(model_path="Phi-3-mini-4k-gguf/model.gguf")
print(llm("Hello world", max_tokens=128))
With Ollama
ollama create phi3-mini-4k-gguf -f Phi-3-mini-4k-gguf/model.gguf
ollama run phi3-mini-4k-gguf
Training Details
Training Data
- Inherits data sources from [microsoft/Phi-3-mini-4k-instruct].
Training Procedure
- Quantization performed using
llama.cppGGUF conversion tools.
Training Hyperparameters
- Quantization: 4-bit (e.g., q4_K_M or similar; specify if known).
- File size: ~2.3 GB.
Speeds, Sizes, Times [optional]
- Optimized for CPU/GPU inference in edge scenarios.
Evaluation
Testing Data
- Not separately evaluated post-quantization. Inherits evaluation from base model.
Factors
- Precision loss due to quantization.
- Edge vs. GPU deployment performance may vary.
Metrics
- Not benchmarked beyond functional testing.
Results
- Produces valid completions at reduced memory cost.
Summary
- Model Type: Quantized Phi-3 Mini 4K Instruct (GGUF 4-bit).
- Use Case: Efficient inference on CPUs and GPUs.
- Limitations: Reduced reasoning accuracy, inherits base model biases.
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator from Lacoste et al. (2019).
- Hardware Type: CPU/GPU (for inference).
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications
Model Architecture and Objective
- Transformer decoder-only architecture.
- Instruction-tuned variant of Phi-3 Mini.
Compute Infrastructure
- Quantization performed locally (details omitted).
Hardware
- Tested on CPU + consumer GPU.
Software
llama.cpp,llama-cpp-python, Hugging Face Hub.
Citation [optional]
If you use this model, please cite the base model:
@misc{microsoft2024phi3,
title={Phi-3: Language Models with Long Context},
author={Microsoft Research},
year={2024},
url={https://huggingface.co/microsoft/Phi-3-mini-4k-instruct}
}
Glossary [optional]
- GGUF: New binary format for quantized models used by
llama.cpp. - Quantization: Technique to reduce model size and improve inference efficiency by lowering precision.
More Information [optional]
For detailed usage of GGUF format, see llama.cpp GGUF documentation.
Model Card Authors [optional]
- STiFLeR7
Model Card Contact
- Hugging Face: STiFLeR7
- Downloads last month
- 5
Hardware compatibility
Log In to add your hardware
4-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐ Ask for provider support