YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for STiFLeR7/Phi-3-mini-4k-gguf

Model Details

Model Description This is the model card of a 🤗 Transformers-compatible model pushed on the Hub. The model is a Phi-3 Mini 4K Instruct variant quantized into GGUF 4-bit format (~2.3 GB) for efficient inference with llama.cpp, llama-cpp-python, or Ollama.

Developed by: Microsoft (base model), quantized and shared by STiFLeR7
Funded by [optional]: [More Information Needed]
Shared by [optional]: STiFLeR7
Model type: Causal Language Model (decoder-only)
Language(s): English (primary)
License: Responsible AI License (from Microsoft Phi-3)
Finetuned from: microsoft/Phi-3-mini-4k-instruct

Model Sources

Repository: STiFLeR7/Phi-3-mini-4k-gguf
Paper [optional]: [More Information Needed]
Demo [optional]: [More Information Needed]

Uses

Direct Use

Text generation in resource-constrained environments.
Running instruction-tuned Phi-3 Mini on CPU or GPU via GGUF.
Edge inference and deployment with llama.cpp, llama-cpp-python, or Ollama.

Downstream Use [optional]

Can serve as a lightweight assistant for experimentation.
Basis for further evaluation and benchmarking of quantized models.

Out-of-Scope Use

High-stakes decision making (e.g., medical, legal, financial).
Safety-critical applications.
Applications requiring full precision or fine-tuning.

Bias, Risks, and Limitations

Model inherits biases and limitations from the Phi-3 Mini 4K base.
Quantization may reduce reasoning and factual accuracy.
Not optimized for fine-tuning.
Context length limited to 4K tokens.

Recommendations: Users should be aware of biases and reduced accuracy introduced by quantization.

How to Get Started with the Model

With `llama.cpp`

./main -m Phi-3-mini-4k-gguf/model.gguf -p "Write a short poem about AI."

With `llama-cpp-python`

from llama_cpp import Llama

llm = Llama(model_path="Phi-3-mini-4k-gguf/model.gguf")
print(llm("Hello world", max_tokens=128))

With Ollama

ollama create phi3-mini-4k-gguf -f Phi-3-mini-4k-gguf/model.gguf
ollama run phi3-mini-4k-gguf

Training Details

Training Data

Inherits data sources from [microsoft/Phi-3-mini-4k-instruct].

Training Procedure

Quantization performed using llama.cpp GGUF conversion tools.

Training Hyperparameters

Quantization: 4-bit (e.g., q4_K_M or similar; specify if known).
File size: ~2.3 GB.

Speeds, Sizes, Times [optional]

Optimized for CPU/GPU inference in edge scenarios.

Evaluation

Testing Data

Not separately evaluated post-quantization. Inherits evaluation from base model.

Factors

Precision loss due to quantization.
Edge vs. GPU deployment performance may vary.

Metrics

Not benchmarked beyond functional testing.

Results

Produces valid completions at reduced memory cost.

Summary

Model Type: Quantized Phi-3 Mini 4K Instruct (GGUF 4-bit).
Use Case: Efficient inference on CPUs and GPUs.
Limitations: Reduced reasoning accuracy, inherits base model biases.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator from Lacoste et al. (2019).

Hardware Type: CPU/GPU (for inference).
Hours used: [More Information Needed]
Cloud Provider: [More Information Needed]
Compute Region: [More Information Needed]
Carbon Emitted: [More Information Needed]

Technical Specifications

Model Architecture and Objective

Transformer decoder-only architecture.
Instruction-tuned variant of Phi-3 Mini.

Compute Infrastructure

Quantization performed locally (details omitted).

Hardware

Tested on CPU + consumer GPU.

Software

llama.cpp, llama-cpp-python, Hugging Face Hub.

Citation [optional]

If you use this model, please cite the base model:

@misc{microsoft2024phi3,
  title={Phi-3: Language Models with Long Context},
  author={Microsoft Research},
  year={2024},
  url={https://huggingface.co/microsoft/Phi-3-mini-4k-instruct}
}

Glossary [optional]

GGUF: New binary format for quantized models used by llama.cpp.
Quantization: Technique to reduce model size and improve inference efficiency by lowering precision.

More Information [optional]

For detailed usage of GGUF format, see llama.cpp GGUF documentation.

Model Card Authors [optional]

STiFLeR7

Model Card Contact

Hugging Face: STiFLeR7

Downloads last month: 5

GGUF

Model size

4B params

Architecture

phi3

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

STiFLeR7
/

Phi-3-mini-4k-gguf

Model Card for STiFLeR7/Phi-3-mini-4k-gguf

Model Details

Model Sources

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

How to Get Started with the Model

With `llama.cpp`

With `llama-cpp-python`

With Ollama

Training Details

Training Data

Training Procedure

Training Hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data

Factors

Metrics

Results

Summary

Environmental Impact

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Space using STiFLeR7/Phi-3-mini-4k-gguf 1

Model Card for STiFLeR7/Phi-3-mini-4k-gguf

Model Details

Model Sources

Uses

Direct Use

Downstream Use [optional]

Out-of-Scope Use

Bias, Risks, and Limitations

How to Get Started with the Model

With llama.cpp

With llama-cpp-python

With Ollama

Training Details

Training Data

Training Procedure

Training Hyperparameters

Speeds, Sizes, Times [optional]

Evaluation

Testing Data

Factors

Metrics

Results

Summary

Environmental Impact

Technical Specifications

Model Architecture and Objective

Compute Infrastructure

Hardware

Software

Citation [optional]

Glossary [optional]

More Information [optional]

Model Card Authors [optional]

Model Card Contact

Space using STiFLeR7/Phi-3-mini-4k-gguf 1

With `llama.cpp`

With `llama-cpp-python`