SandLogicTechnologies commited on
Commit
687b929
·
verified ·
1 Parent(s): e8edd2c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -3
README.md CHANGED
@@ -1,3 +1,105 @@
1
- ---
2
- license: llama3
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: llama3
3
+ datasets:
4
+ - teknium/OpenHermes-2.5
5
+ language:
6
+ - en
7
+ base_model:
8
+ - NousResearch/Hermes-2-Pro-Llama-3-8B
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ tags:
12
+ - llama
13
+ - llama3
14
+ - DPO
15
+ - RLHF
16
+ - Function calling
17
+ - Quantized
18
+ ---
19
+
20
+
21
+ # Quantized Hermes 2 Pro Models
22
+
23
+ This repository provides quantized GGUF versions of Hermes 2 Pro model. Hermes 2 Pro is an upgraded version of Nous Hermes 2, trained on a cleaned OpenHermes 2.5
24
+ dataset plus a new in-house Function Calling and JSON Mode dataset. These 4-bit and 5-bit quantized variants retain the original model’s strengths excels at
25
+ general tasks, structured JSON outputs, and reliable function calling (90% accuracy in Fireworks.AI evals). With a special system prompt, multi-turn function calling,
26
+ and new single-token tags like <tools> and <tool_call>, it’s optimized for agentic parsing and streaming.
27
+
28
+ ## Model Overview
29
+
30
+ - **Original Model**: Meta-Llama-3-8B
31
+ - **Quantized Versions**:
32
+ - Q4_K_M (4-bit quantization)
33
+ - Q5_K_M (5-bit quantization)
34
+ - **Architecture**: Decoder-only transformer
35
+ - **Base Model**: Hermes-2-Pro-Llama-3-8B
36
+ - **Modalities**: Text only
37
+ - **Developer**: Nous Research
38
+ - **License**: [Llama 3 Community License Agreement](https://huggingface.co/meta-llama/Meta-Llama-3-8B/blob/main/LICENSE)
39
+ - **Language**: English
40
+
41
+ ## Quantization Details
42
+
43
+ ### Q4_K_M Version
44
+ - Approx. ~75% size reduction
45
+ - Lower memory footprint (~4.58 GB)
46
+ - Best suited for deployment on edge devices or low-resource GPUs
47
+ - Slight performance degradation in complex reasoning scenarios
48
+
49
+ ### Q5_K_M Version
50
+ - Approx. ~71% size reduction
51
+ - Higher fidelity (~5.38 GB)
52
+ - Better performance retention, recommended when quality is a priority.
53
+
54
+ ## Key Features
55
+
56
+ - Retrained on a cleaned OpenHermes-2.5 dataset with added Function-Calling & JSON-Mode data.
57
+ - Strong Function Calling performance (≈90% in partnered evaluation) and structured JSON output accuracy (≈84%).
58
+ - Uses ChatML prompt format and a special tool_use chat template to produce multi-turn, machine-parsable tool calls.
59
+ - Adds single-token markers to help streaming/agent parsing: <tools>, <tool_call>, <tool_response> (and closing tags).
60
+
61
+ ### Usage
62
+ Hermes 2 Pro — Llama-3 8B is ideal for building agents that require reliable function calling, structured JSON outputs, and strong reasoning. Its 8B size balances capability with efficiency, making it suitable for research, prototyping, and real-world applications.
63
+
64
+ **llama.cpp (text-only)**
65
+ ```sh
66
+ ./llama-cli -hf SandLogicTechnologies/Hermes-2-Pro-GGUF -p "Write a python script designed for adding to a library on data cleaning"
67
+ ```
68
+
69
+
70
+ ## Model Data
71
+
72
+ ### Pretraining Overview
73
+
74
+ Hermes 2 Pro — Llama-3 8B was trained on a refined version of the OpenHermes-2.5 dataset, combined with a custom Function Calling and JSON Mode corpus developed in-house. The data mix includes high-quality web content, code, reasoning tasks, STEM material, and multilingual samples. This targeted training enables the model to excel not only at general conversation but also at structured output generation and reliable tool use.
75
+
76
+ ## Recommended Use Cases
77
+
78
+ - **Function Calling & Tool Use**
79
+ Powering agentic workflows where the model selects and invokes external tools or APIs using reliable JSON-based calls.
80
+
81
+ - **Structured JSON Outputs**
82
+ Generating machine-readable responses that conform to a schema, useful for automation, integration with services, and structured data extraction.
83
+
84
+ - **Resource-conscious Deployment**
85
+ The 8B parameter size makes it suitable for smaller GPUs and cloud environments, balancing performance with accessibility.
86
+
87
+ - **Low-resource deployment**
88
+ Low-resource deployment runs AI models efficiently on limited hardware like CPUs, edge devices, or small GPUs.
89
+
90
+ ---
91
+
92
+ ## Acknowledgments
93
+
94
+ These quantized models are based on the original work by the **NousResearch** development team.
95
+
96
+ Special thanks to:
97
+ - The [NousResearch](https://huggingface.co/NousResearch) team for developing and releasing the [Hermes-2-Pro-Llama-3-8B](https://huggingface.co/NousResearch/Hermes-2-Pro-Llama-3-8B) model.
98
+
99
+ - **Georgi Gerganov** and the entire [`llama.cpp`](https://github.com/ggerganov/llama.cpp) open-source community for enabling efficient model quantization and inference via the GGUF format.
100
+
101
+ ---
102
+
103
+
104
+ ## Contact
105
+ For any inquiries or support, please contact us at support@sandlogic.com or visit our [Website](https://www.sandlogic.com/).