unmodeled-tyler commited on
Commit
a2897eb
·
verified ·
1 Parent(s): 9d354ce

🚀 Apollo V1 7B: Advanced Reasoning Language Model - Complete model release with LoRA fine-tuned Mistral-7B-Instruct-v0.2, 161M parameter adapter, Apache 2.0 license, and professional documentation for logical, mathematical, and legal reasoning

Browse files

Apollo V1 7B represents the first public release in the Apollo model series from VANTA Research. This specialized language model is optimized for advanced reasoning tasks including logical reasoning, mathematical problem-solving, and legal analysis.

.gitignore ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Python
2
+ __pycache__/
3
+ *.py[cod]
4
+ *$py.class
5
+ *.so
6
+ .Python
7
+ build/
8
+ develop-eggs/
9
+ dist/
10
+ downloads/
11
+ eggs/
12
+ .eggs/
13
+ lib/
14
+ lib64/
15
+ parts/
16
+ sdist/
17
+ var/
18
+ wheels/
19
+ *.egg-info/
20
+ .installed.cfg
21
+ *.egg
22
+ MANIFEST
23
+
24
+ # Virtual environments
25
+ .env
26
+ .venv
27
+ env/
28
+ venv/
29
+ ENV/
30
+ env.bak/
31
+ venv.bak/
32
+
33
+ # IDEs
34
+ .vscode/
35
+ .idea/
36
+ *.swp
37
+ *.swo
38
+ *~
39
+
40
+ # OS generated files
41
+ .DS_Store
42
+ .DS_Store?
43
+ ._*
44
+ .Spotlight-V100
45
+ .Trashes
46
+ ehthumbs.db
47
+ Thumbs.db
48
+
49
+ # Training artifacts that shouldn't be in final release
50
+ *.log
51
+ wandb/
52
+ runs/
53
+ outputs/
54
+ checkpoints/
55
+ logs/
56
+
57
+ # Temporary files
58
+ *.tmp
59
+ *.temp
60
+ .cache/
61
+
62
+ # Model training specific (keep the final weights)
63
+ training_args.bin
64
+
65
+ # Local development files
66
+ test_*.py
67
+ debug_*.py
68
+ scratch_*.py
LICENSE ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity granting the License.
13
+
14
+ "Legal Entity" shall mean the union of the acting entity and all
15
+ other entities that control, are controlled by, or are under common
16
+ control with that entity. For the purposes of this definition,
17
+ "control" means (i) the power, direct or indirect, to cause the
18
+ direction or management of such entity, whether by contract or
19
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
20
+ outstanding shares, or (iii) beneficial ownership of such entity.
21
+
22
+ "You" (or "Your") shall mean an individual or Legal Entity
23
+ exercising permissions granted by this License.
24
+
25
+ "Source" form shall mean the preferred form for making modifications,
26
+ including but not limited to software source code, documentation
27
+ source, and configuration files.
28
+
29
+ "Object" form shall mean any form resulting from mechanical
30
+ transformation or translation of a Source form, including but
31
+ not limited to compiled object code, generated documentation,
32
+ and conversions to other media types.
33
+
34
+ "Work" shall mean the work of authorship, whether in Source or
35
+ Object form, made available under the License, as indicated by a
36
+ copyright notice that is included in or attached to the work
37
+ (an example is provided in the Appendix below).
38
+
39
+ "Derivative Works" shall mean any work, whether in Source or Object
40
+ form, that is based on (or derived from) the Work and for which the
41
+ editorial revisions, annotations, elaborations, or other modifications
42
+ represent, as a whole, an original work of authorship. For the purposes
43
+ of this License, Derivative Works shall not include works that remain
44
+ separable from, or merely link (or bind by name) to the interfaces of,
45
+ the Work and Derivative Works thereof.
46
+
47
+ "Contribution" shall mean any work of authorship, including
48
+ the original version of the Work and any modifications or additions
49
+ to that Work or Derivative Works thereof, that is intentionally
50
+ submitted to Licensor for inclusion in the Work by the copyright owner
51
+ or by an individual or Legal Entity authorized to submit on behalf of
52
+ the copyright owner. For the purposes of this definition, "submitted"
53
+ means any form of electronic, verbal, or written communication sent
54
+ to the Licensor or its representatives, including but not limited to
55
+ communication on electronic mailing lists, source code control
56
+ systems, and issue tracking systems that are managed by, or on behalf
57
+ of, the Licensor for the purpose of discussing and improving the Work,
58
+ but excluding communication that is conspicuously marked or otherwise
59
+ designated in writing by the copyright owner as "Not a Contribution."
60
+
61
+ "Contributor" shall mean Licensor and any individual or Legal Entity
62
+ on behalf of whom a Contribution has been received by Licensor and
63
+ subsequently incorporated within the Work.
64
+
65
+ 2. Grant of Copyright License. Subject to the terms and conditions of
66
+ this License, each Contributor hereby grants to You a perpetual,
67
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
68
+ copyright license to use, reproduce, prepare Derivative Works of,
69
+ publicly display, publicly perform, sublicense, and distribute the
70
+ Work and such Derivative Works in Source or Object form.
71
+
72
+ 3. Grant of Patent License. Subject to the terms and conditions of
73
+ this License, each Contributor hereby grants to You a perpetual,
74
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
75
+ (except as stated in this section) patent license to make, have made,
76
+ use, offer to sell, sell, import, and otherwise transfer the Work,
77
+ where such license applies only to those patent claims licensable
78
+ by such Contributor that are necessarily infringed by their
79
+ Contribution(s) alone or by combination of their Contribution(s)
80
+ with the Work to which such Contribution(s) was submitted. If You
81
+ institute patent litigation against any entity (including a
82
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
83
+ or a Contribution incorporated within the Work constitutes direct
84
+ or contributory patent infringement, then any patent licenses
85
+ granted to You under this License for that Work shall terminate
86
+ as of the date such litigation is filed.
87
+
88
+ 4. Redistribution. You may reproduce and distribute copies of the
89
+ Work or Derivative Works thereof in any medium, with or without
90
+ modifications, and in Source or Object form, provided that You
91
+ meet the following conditions:
92
+
93
+ (a) You must give any other recipients of the Work or
94
+ Derivative Works a copy of this License; and
95
+
96
+ (b) You must cause any modified files to carry prominent notices
97
+ stating that You changed the files; and
98
+
99
+ (c) You must retain, in the Source form of any Derivative Works
100
+ that You distribute, all copyright, patent, trademark, and
101
+ attribution notices from the Source form of the Work,
102
+ excluding those notices that do not pertain to any part of
103
+ the Derivative Works; and
104
+
105
+ (d) If the Work includes a "NOTICE" text file as part of its
106
+ distribution, then any Derivative Works that You distribute must
107
+ include a readable copy of the attribution notices contained
108
+ within such NOTICE file, excluding those notices that do not
109
+ pertain to any part of the Derivative Works, in at least one
110
+ of the following places: within a NOTICE text file distributed
111
+ as part of the Derivative Works; within the Source form or
112
+ documentation, if provided along with the Derivative Works; or,
113
+ within a display generated by the Derivative Works, if and
114
+ wherever such third-party notices normally appear. The contents
115
+ of the NOTICE file are for informational purposes only and
116
+ do not modify the License. You may add Your own attribution
117
+ notices within Derivative Works that You distribute, alongside
118
+ or as an addendum to the NOTICE text from the Work, provided
119
+ that such additional attribution notices cannot be construed
120
+ as modifying the License.
121
+
122
+ You may add Your own copyright statement to Your modifications and
123
+ may provide additional or different license terms and conditions
124
+ for use, reproduction, or distribution of Your modifications, or
125
+ for any such Derivative Works as a whole, provided Your use,
126
+ reproduction, and distribution of the Work otherwise complies with
127
+ the conditions stated in this License.
128
+
129
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
130
+ any Contribution intentionally submitted for inclusion in the Work
131
+ by You to the Licensor shall be under the terms and conditions of
132
+ this License, without any additional terms or conditions.
133
+ Notwithstanding the above, nothing herein shall supersede or modify
134
+ the terms of any separate license agreement you may have executed
135
+ with Licensor regarding such Contributions.
136
+
137
+ 6. Trademarks. This License does not grant permission to use the trade
138
+ names, trademarks, service marks, or product names of the Licensor,
139
+ except as required for reasonable and customary use in describing the
140
+ origin of the Work and reproducing the content of the NOTICE file.
141
+
142
+ 7. Disclaimer of Warranty. Unless required by applicable law or
143
+ agreed to in writing, Licensor provides the Work (and each
144
+ Contributor provides its Contributions) on an "AS IS" BASIS,
145
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
146
+ implied, including, without limitation, any warranties or conditions
147
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
148
+ PARTICULAR PURPOSE. You are solely responsible for determining the
149
+ appropriateness of using or redistributing the Work and assume any
150
+ risks associated with Your exercise of permissions under this License.
151
+
152
+ 8. Limitation of Liability. In no event and under no legal theory,
153
+ whether in tort (including negligence), contract, or otherwise,
154
+ unless required by applicable law (such as deliberate and grossly
155
+ negligent acts) or agreed to in writing, shall any Contributor be
156
+ liable to You for damages, including any direct, indirect, special,
157
+ incidental, or consequential damages of any character arising as a
158
+ result of this License or out of the use or inability to use the
159
+ Work (including but not limited to damages for loss of goodwill,
160
+ work stoppage, computer failure or malfunction, or any and all
161
+ other commercial damages or losses), even if such Contributor
162
+ has been advised of the possibility of such damages.
163
+
164
+ 9. Accepting Warranty or Additional Liability. While redistributing
165
+ the Work or Derivative Works thereof, You may choose to offer,
166
+ and charge a fee for, acceptance of support, warranty, indemnity,
167
+ or other liability obligations and/or rights consistent with this
168
+ License. However, in accepting such obligations, You may act only
169
+ on Your own behalf and on Your sole responsibility, not on behalf
170
+ of any other Contributor, and only if You agree to indemnify,
171
+ defend, and hold each Contributor harmless for any liability
172
+ incurred by, or claims asserted against, such Contributor by reason
173
+ of your accepting any such warranty or additional liability.
174
+
175
+ END OF TERMS AND CONDITIONS
176
+
177
+ APPENDIX: How to apply the Apache License to your work.
178
+
179
+ To apply the Apache License to your work, attach the following
180
+ boilerplate notice, with the fields enclosed by brackets "[]"
181
+ replaced with your own identifying information. (Don't include
182
+ the brackets!) The text should be enclosed in the appropriate
183
+ comment syntax for the file format. We also recommend that a
184
+ file or class name and description of purpose be included on the
185
+ same "printed page" as the copyright notice for easier
186
+ identification within third-party archives.
187
+
188
+ Copyright 2025 VANTA Research
189
+
190
+ Licensed under the Apache License, Version 2.0 (the "License");
191
+ you may not use this file except in compliance with the License.
192
+ You may obtain a copy of the License at
193
+
194
+ http://www.apache.org/licenses/LICENSE-2.0
195
+
196
+ Unless required by applicable law or agreed to in writing, software
197
+ distributed under the License is distributed on an "AS IS" BASIS,
198
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
199
+ See the License for the specific language governing permissions and
200
+ limitations under the License.
MERGE_GUIDE.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Merging LoRA Weights with Base Model
2
+
3
+ This repository contains LoRA (Low-Rank Adaptation) weights that need to be merged with the base Mistral-7B-Instruct-v0.2 model for use.
4
+
5
+ ## Quick Start
6
+
7
+ ### Option 1: Use with Ollama (Recommended)
8
+ ```bash
9
+ # Download this repository
10
+ git clone https://huggingface.co/vanta-research/apollo-v1-7b
11
+ cd apollo-v1-7b
12
+
13
+ # Create Ollama model
14
+ echo 'FROM mistral:7b' > Modelfile
15
+ ollama create apollo-v1-7b -f Modelfile
16
+ ollama run apollo-v1-7b
17
+ ```
18
+
19
+ ### Option 2: Merge with Transformers
20
+ ```python
21
+ from transformers import AutoModelForCausalLM, AutoTokenizer
22
+ from peft import PeftModel
23
+ import torch
24
+
25
+ # Load base model
26
+ base_model = AutoModelForCausalLM.from_pretrained(
27
+ "mistralai/Mistral-7B-Instruct-v0.2",
28
+ torch_dtype=torch.float16,
29
+ device_map="auto"
30
+ )
31
+
32
+ # Load LoRA adapter
33
+ model = PeftModel.from_pretrained(base_model, "./apollo-v1-7b")
34
+
35
+ # Merge and save
36
+ merged_model = model.merge_and_unload()
37
+ merged_model.save_pretrained("./apollo-v1-7b-merged")
38
+
39
+ # Load tokenizer
40
+ tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
41
+ tokenizer.save_pretrained("./apollo-v1-7b-merged")
42
+ ```
43
+
44
+ ### Option 3: Use with PEFT directly
45
+ ```python
46
+ from transformers import AutoModelForCausalLM, AutoTokenizer
47
+ from peft import PeftModel
48
+
49
+ base_model_name = "mistralai/Mistral-7B-Instruct-v0.2"
50
+ model = AutoModelForCausalLM.from_pretrained(base_model_name, torch_dtype=torch.float16)
51
+ model = PeftModel.from_pretrained(model, "./apollo-v1-7b")
52
+ tokenizer = AutoTokenizer.from_pretrained(base_model_name)
53
+
54
+ # Use for inference
55
+ inputs = tokenizer("Hello, how can I help you today?", return_tensors="pt")
56
+ outputs = model.generate(**inputs, max_new_tokens=100)
57
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
58
+ ```
59
+
60
+ ## Requirements
61
+
62
+ - Base model: `mistralai/Mistral-7B-Instruct-v0.2`
63
+ - Python packages: `transformers`, `peft`, `torch`
64
+ - CUDA-compatible GPU (recommended)
65
+
66
+ ## Model Architecture
67
+
68
+ - **Base Model**: Mistral 7B Instruct v0.2
69
+ - **Training Method**: LoRA (Low-Rank Adaptation)
70
+ - **Rank**: 64
71
+ - **Alpha**: 16
72
+ - **Target Modules**: q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj
MODEL_CARD.md ADDED
@@ -0,0 +1,218 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card: Apollo V1 7B
2
+
3
+ ## Model Details
4
+
5
+ **Model Name**: Apollo V1 7B
6
+ **Developer**: VANTA Research
7
+ **Model Version**: 1.0.0
8
+ **Release Date**: September 2025
9
+ **License**: Apache 2.0
10
+ **Base Model**: mistralai/Mistral-7B-Instruct-v0.2
11
+ **Model Type**: Causal Language Model with LoRA Adapters
12
+
13
+ ## Intended Use
14
+
15
+ ### Primary Use Cases
16
+ - Educational reasoning assistance and tutoring
17
+ - Mathematical problem solving with step-by-step explanations
18
+ - Logical reasoning and argument analysis
19
+ - Legal education and case study analysis (not professional advice)
20
+ - Academic research support and hypothesis evaluation
21
+
22
+ ### Intended Users
23
+ - Students and educators in STEM and legal fields
24
+ - Researchers studying AI reasoning capabilities
25
+ - Developers building reasoning-focused applications
26
+ - Academic institutions and educational platforms
27
+
28
+ ## Model Architecture
29
+
30
+ - **Base Architecture**: Mistral 7B Instruct v0.3
31
+ - **Fine-tuning Method**: LoRA (Low-Rank Adaptation)
32
+ - **Total Parameters**: ~7 billion
33
+ - **LoRA Configuration**:
34
+ - Rank (r): 16
35
+ - Alpha: 32
36
+ - Dropout: 0.1
37
+ - Target modules: All linear layers
38
+ - **Precision**: FP16 (GPU) / FP32 (CPU)
39
+ - **Context Length**: 32,768 tokens
40
+
41
+ ## Training Data
42
+
43
+ ### Dataset Composition
44
+ - **Total Instances**: 264 specialized reasoning examples
45
+ - **Data Sources**: Curated legal reasoning scenarios, mathematical word problems, logical puzzles
46
+ - **Data Quality**: Hand-crafted and reviewed by domain experts
47
+ - **Language**: English
48
+ - **Content Areas**:
49
+ - Legal reasoning and case analysis (40%)
50
+ - Mathematical problem solving (30%)
51
+ - Logical reasoning and puzzles (20%)
52
+ - Chain-of-thought examples (10%)
53
+
54
+ ### Data Processing
55
+ - All instances manually reviewed for quality and accuracy
56
+ - Balanced representation across reasoning domains
57
+ - Consistent formatting and structure
58
+ - Ethical content filtering applied
59
+
60
+ ## Training Procedure
61
+
62
+ ### Training Configuration
63
+ - **Method**: Supervised Fine-tuning with LoRA
64
+ - **Base Model**: mistralai/Mistral-7B-Instruct-v0.2
65
+ - **Training Framework**: Transformers + PEFT
66
+ - **Hardware**: NVIDIA RTX 3060 (12GB)
67
+ - **Training Duration**: Multiple epochs until convergence
68
+ - **Optimization**: AdamW optimizer with learning rate scheduling
69
+
70
+ ### Training Process
71
+ 1. Data preprocessing and tokenization
72
+ 2. LoRA adapter initialization
73
+ 3. Supervised fine-tuning on reasoning dataset
74
+ 4. Validation and checkpoint selection
75
+ 5. Model merging and evaluation
76
+
77
+ ## Evaluation
78
+
79
+ ### Comprehensive Reasoning Tests
80
+ - **Test Suite**: 14 comprehensive reasoning tasks
81
+ - **Success Rate**: 100% (14/14 tests passed)
82
+ - **Categories Tested**:
83
+ - Apollo Identity: 3/3 tests passed
84
+ - Logical Reasoning: 3/3 tests passed
85
+ - Legal Reasoning: 3/3 tests passed
86
+ - Mathematical Reasoning: 3/3 tests passed
87
+ - Chain-of-Thought: 2/2 tests passed
88
+
89
+ ### Performance Benchmarks
90
+ - **Mathematical Accuracy**: 100% on standard math problems
91
+ - **Response Speed**: 2-7x faster than comparable models
92
+ - **Token Generation**: 52-53 tokens/second
93
+ - **Average Response Time**: 3.9 seconds
94
+
95
+ ### Comparative Analysis
96
+ Head-to-head comparison with Apollo Qwen2 Champion:
97
+ - Legal Reasoning: Apollo V1 won (3.77s vs 26.98s)
98
+ - Logic Problems: Apollo V1 won (3.78s vs 10.69s)
99
+ - Scientific Reasoning: Apollo V1 won (3.83s vs 14.72s)
100
+ - **Overall**: 3/3 wins with superior speed
101
+
102
+ ## Limitations
103
+
104
+ ### Known Limitations
105
+ 1. **Domain Specialization**: Optimized for reasoning tasks, may have limitations in creative writing, general conversation, or domain-specific knowledge outside training scope
106
+ 2. **Legal Advice Disclaimer**: Provides educational legal analysis only, not professional legal advice
107
+ 3. **Verification Required**: While highly accurate, outputs should be verified for critical applications
108
+ 4. **Context Constraints**: Limited to 32K token context window
109
+ 5. **Language**: Primarily trained and tested in English
110
+
111
+ ### Technical Limitations
112
+ - Memory requirements: ~14GB for full precision inference
113
+ - Inference speed depends on hardware capabilities
114
+ - May require specific software dependencies (transformers, peft)
115
+
116
+ ## Bias and Fairness
117
+
118
+ ### Bias Mitigation Efforts
119
+ - Diverse reasoning problem selection
120
+ - Manual review of training examples
121
+ - Testing across different problem types and complexity levels
122
+ - Continuous monitoring of model outputs
123
+
124
+ ### Known Biases
125
+ - May reflect biases present in base Mistral model
126
+ - Training data primarily from Western legal and educational contexts
127
+ - Potential bias toward formal logical reasoning approaches
128
+
129
+ ### Fairness Considerations
130
+ - Model designed for educational use across diverse populations
131
+ - Open source licensing enables community oversight
132
+ - Transparent documentation of capabilities and limitations
133
+
134
+ ## Environmental Impact
135
+
136
+ ### Carbon Footprint
137
+ - Training conducted on single RTX 3060 GPU
138
+ - Relatively efficient LoRA training vs full model fine-tuning
139
+ - Estimated training time: <24 hours total
140
+ - Carbon impact significantly lower than training large models from scratch
141
+
142
+ ### Efficiency Measures
143
+ - LoRA fine-tuning reduces computational requirements
144
+ - Optimized inference for various hardware configurations
145
+ - Support for CPU-only inference to reduce GPU dependence
146
+
147
+ ## Ethical Considerations
148
+
149
+ ### Responsible Use
150
+ - Clear documentation of intended use cases
151
+ - Explicit warnings about limitations and verification needs
152
+ - Educational focus with appropriate disclaimers
153
+ - Open source to enable community review
154
+
155
+ ### Potential Misuse
156
+ - Should not be used for professional legal, medical, or financial advice
157
+ - Not suitable for critical decision-making without human oversight
158
+ - May be misused if presented as infallible reasoning system
159
+
160
+ ### Mitigation Strategies
161
+ - Clear usage guidelines and disclaimers
162
+ - Educational focus in documentation
163
+ - Open source licensing for transparency
164
+ - Community feedback mechanisms
165
+
166
+ ## Technical Specifications
167
+
168
+ ### System Requirements
169
+ - **Minimum**: 16GB RAM, modern CPU
170
+ - **Recommended**: 16GB+ GPU, 32GB+ system RAM
171
+ - **Software**: Python 3.8+, PyTorch 2.0+, Transformers 4.44+
172
+
173
+ ### Deployment Options
174
+ - Local inference (GPU/CPU)
175
+ - Cloud deployment (AWS, GCP, Azure)
176
+ - Edge deployment (with quantization)
177
+ - API integration via FastAPI/Flask
178
+
179
+ ## Version History
180
+
181
+ ### Version 1.0.0 (September 2025)
182
+ - Initial public release
183
+ - Base model: Mistral 7B Instruct v0.3
184
+ - 264 training instances across reasoning domains
185
+ - Comprehensive evaluation and benchmarking
186
+ - Full documentation and usage examples
187
+
188
+ ## Citation
189
+
190
+ ```bibtex
191
+ @misc{apollo-v1-7b-2025,
192
+ title={Apollo V1 7B: Advanced Reasoning AI Model},
193
+ author={VANTA Research Team},
194
+ year={2025},
195
+ url={https://huggingface.co/vanta-research/apollo-v1-7b},
196
+ note={First public release of specialized reasoning language model}
197
+ }
198
+ ```
199
+
200
+ ## Contact and Support
201
+
202
+ - **Primary Contact**: research@vanta.ai
203
+ - **GitHub Issues**: [vanta-research/apollo-v1-7b](https://github.com/vanta-research/apollo-v1-7b/issues)
204
+ - **Documentation**: [vanta.ai/models/apollo-v1-7b](https://vanta.ai/models/apollo-v1-7b)
205
+ - **Community**: [Discord Server](https://discord.gg/vanta-research)
206
+
207
+ ## Acknowledgments
208
+
209
+ - Mistral AI for the excellent base model
210
+ - Hugging Face for the transformers and PEFT libraries
211
+ - Microsoft for LoRA research and implementation
212
+ - Open source community for tools and inspiration
213
+ - Beta testers and early adopters for valuable feedback
214
+
215
+ ---
216
+
217
+ *Last Updated: September 2025*
218
+ *Model Card Version: 1.0*
README.md CHANGED
@@ -1,3 +1,95 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-7B-Instruct-v0.2
4
+ library_name: peft
5
+ tags:
6
+ - reasoning
7
+ - legal-analysis
8
+ - mathematical-reasoning
9
+ - logical-reasoning
10
+ - mistral
11
+ - lora
12
+ - vanta-research
13
+ - apollo
14
+ language:
15
+ - en
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # Apollo V1 7B
20
+
21
+ **Advanced Reasoning Language Model**
22
+
23
+ Apollo V1 7B is a specialized language model designed for advanced reasoning tasks, including logical reasoning, mathematical problem-solving, and legal analysis. Built on Mistral 7B-Instruct-v0.2 using LoRA fine-tuning, this model represents the first public release in the Apollo model series from VANTA Research.
24
+
25
+ ## Model Overview
26
+
27
+ Apollo V1 7B is a specialized language model optimized for reasoning-intensive tasks. The model demonstrates exceptional performance in logical reasoning, mathematical problem-solving, and legal analysis through targeted fine-tuning on curated reasoning datasets.
28
+
29
+ ### Key Capabilities
30
+
31
+ - **Logical Reasoning**: Advanced syllogistic reasoning, conditional logic, and contradiction detection
32
+ - **Mathematical Problem Solving**: Step-by-step mathematical reasoning with high accuracy
33
+ - **Legal Analysis**: Educational legal reasoning and case analysis capabilities
34
+ - **High Performance**: Optimized for fast inference while maintaining quality
35
+ - **Consistent Identity**: Maintains clear model identity and capability awareness
36
+
37
+ ## Model Details
38
+
39
+ - **Base Model**: [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2)
40
+ - **Training Method**: LoRA (Low-Rank Adaptation) fine-tuning
41
+ - **Parameters**: ~7.24B total parameters
42
+ - **LoRA Rank**: 64
43
+ - **Target Modules**: All linear layers (q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj)
44
+ - **Training Precision**: 16-bit (bfloat16)
45
+ - **License**: Apache 2.0
46
+
47
+ ## Quick Start
48
+
49
+ ### Using the LoRA Adapter
50
+
51
+ ```python
52
+ from transformers import AutoTokenizer, AutoModelForCausalLM
53
+ from peft import PeftModel
54
+ import torch
55
+
56
+ # Load base model and tokenizer
57
+ model_name = "mistralai/Mistral-7B-Instruct-v0.2"
58
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
59
+ model = AutoModelForCausalLM.from_pretrained(
60
+ model_name,
61
+ torch_dtype=torch.bfloat16,
62
+ device_map="auto"
63
+ )
64
+
65
+ # Load and apply LoRA adapter
66
+ model = PeftModel.from_pretrained(model, "vanta-research/apollo-v1-7b")
67
+
68
+ # Example usage
69
+ prompt = "Solve this logical reasoning problem: If all cats are mammals, and Fluffy is a cat, what can we conclude about Fluffy?"
70
+
71
+ inputs = tokenizer(prompt, return_tensors="pt")
72
+ with torch.no_grad():
73
+ outputs = model.generate(
74
+ **inputs,
75
+ max_new_tokens=256,
76
+ temperature=0.7,
77
+ do_sample=True,
78
+ pad_token_id=tokenizer.eos_token_id
79
+ )
80
+
81
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
82
+ print(response)
83
+ ```
84
+
85
+ ## License
86
+
87
+ This model is released under the Apache 2.0 License. See [LICENSE](./LICENSE) for details.
88
+
89
+ ## Contact
90
+
91
+ For questions, issues, or collaboration opportunities, please visit the [model repository](https://huggingface.co/vanta-research/apollo-v1-7b).
92
+
93
+ ---
94
+
95
+ **Apollo V1 7B - Advancing the frontier of reasoning in language models**
USAGE_GUIDE.md ADDED
@@ -0,0 +1,307 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Apollo V1 7B Usage Guide
2
+
3
+ ## Installation & Setup
4
+
5
+ ### Requirements
6
+ ```bash
7
+ pip install transformers>=4.44.0 peft>=0.12.0 torch>=2.0.0
8
+ ```
9
+
10
+ ### Basic Setup
11
+ ```python
12
+ from transformers import AutoTokenizer
13
+ from peft import AutoPeftModelForCausalLM
14
+ import torch
15
+
16
+ # Load model (adjust device_map based on your hardware)
17
+ model = AutoPeftModelForCausalLM.from_pretrained(
18
+ "vanta-research/apollo-v1-7b",
19
+ torch_dtype=torch.float16,
20
+ device_map="auto" # or "cpu" for CPU-only
21
+ )
22
+
23
+ tokenizer = AutoTokenizer.from_pretrained("vanta-research/apollo-v1-7b")
24
+ ```
25
+
26
+ ## Usage Patterns
27
+
28
+ ### 1. Mathematical Problem Solving
29
+
30
+ ```python
31
+ def solve_math_problem(problem):
32
+ prompt = f"Solve this step by step: {problem}"
33
+ inputs = tokenizer(prompt, return_tensors="pt")
34
+
35
+ outputs = model.generate(
36
+ **inputs,
37
+ max_length=400,
38
+ temperature=0.1, # Low temperature for accuracy
39
+ do_sample=True,
40
+ top_p=0.9
41
+ )
42
+
43
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
44
+
45
+ # Examples
46
+ problems = [
47
+ "What is 15% of 240?",
48
+ "If x + 5 = 12, what is x?",
49
+ "A rectangle has length 8 and width 5. What is its area?"
50
+ ]
51
+
52
+ for problem in problems:
53
+ solution = solve_math_problem(problem)
54
+ print(f"Problem: {problem}")
55
+ print(f"Solution: {solution}")
56
+ print("-" * 50)
57
+ ```
58
+
59
+ ### 2. Legal Reasoning
60
+
61
+ ```python
62
+ def analyze_legal_scenario(scenario):
63
+ prompt = f"Analyze this legal scenario: {scenario}"
64
+ inputs = tokenizer(prompt, return_tensors="pt")
65
+
66
+ outputs = model.generate(
67
+ **inputs,
68
+ max_length=600,
69
+ temperature=0.2, # Slightly higher for nuanced analysis
70
+ repetition_penalty=1.1
71
+ )
72
+
73
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
74
+
75
+ # Example legal scenarios
76
+ scenarios = [
77
+ "A contract requires payment within 30 days, but the buyer received defective goods.",
78
+ "Police conducted a search without a warrant, claiming exigent circumstances.",
79
+ "An employee was fired for social media posts made outside work hours."
80
+ ]
81
+
82
+ for scenario in scenarios:
83
+ analysis = analyze_legal_scenario(scenario)
84
+ print(f"Scenario: {scenario}")
85
+ print(f"Analysis: {analysis}")
86
+ print("-" * 50)
87
+ ```
88
+
89
+ ### 3. Logical Reasoning
90
+
91
+ ```python
92
+ def solve_logic_puzzle(puzzle):
93
+ prompt = f"Solve this logic puzzle step by step: {puzzle}"
94
+ inputs = tokenizer(prompt, return_tensors="pt")
95
+
96
+ outputs = model.generate(
97
+ **inputs,
98
+ max_length=500,
99
+ temperature=0.1,
100
+ top_k=50
101
+ )
102
+
103
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
104
+
105
+ # Example logic puzzles
106
+ puzzles = [
107
+ "If all A are B, and all B are C, what can we conclude about A and C?",
108
+ "All cats are animals. Some animals are pets. Can we conclude all cats are pets?",
109
+ "If it rains, the ground gets wet. The ground is wet. Did it rain?"
110
+ ]
111
+
112
+ for puzzle in puzzles:
113
+ solution = solve_logic_puzzle(puzzle)
114
+ print(f"Puzzle: {puzzle}")
115
+ print(f"Solution: {solution}")
116
+ print("-" * 50)
117
+ ```
118
+
119
+ ## Advanced Usage
120
+
121
+ ### Batch Processing
122
+ ```python
123
+ def batch_process_questions(questions, batch_size=4):
124
+ results = []
125
+
126
+ for i in range(0, len(questions), batch_size):
127
+ batch = questions[i:i+batch_size]
128
+
129
+ # Process batch
130
+ batch_results = []
131
+ for question in batch:
132
+ inputs = tokenizer(question, return_tensors="pt")
133
+ outputs = model.generate(**inputs, max_length=300)
134
+ response = tokenizer.decode(outputs[0], skip_special_tokens=True)
135
+ batch_results.append(response)
136
+
137
+ results.extend(batch_results)
138
+
139
+ return results
140
+ ```
141
+
142
+ ### Memory Optimization
143
+ ```python
144
+ # For limited GPU memory
145
+ import torch
146
+
147
+ def memory_efficient_generation(prompt, max_length=400):
148
+ with torch.no_grad():
149
+ inputs = tokenizer(prompt, return_tensors="pt")
150
+
151
+ outputs = model.generate(
152
+ **inputs,
153
+ max_length=max_length,
154
+ temperature=0.1,
155
+ use_cache=True, # Enable KV caching
156
+ pad_token_id=tokenizer.eos_token_id
157
+ )
158
+
159
+ # Clear cache after generation
160
+ if hasattr(model, 'past_key_values'):
161
+ model.past_key_values = None
162
+
163
+ return tokenizer.decode(outputs[0], skip_special_tokens=True)
164
+ ```
165
+
166
+ ### Custom Prompting
167
+ ```python
168
+ def create_apollo_prompt(question, context="", task_type="general"):
169
+ """Create optimized prompts for different task types."""
170
+
171
+ task_prompts = {
172
+ "math": "Solve this mathematical problem step by step:",
173
+ "legal": "Analyze this legal scenario considering relevant laws and precedents:",
174
+ "logic": "Solve this logical reasoning problem step by step:",
175
+ "general": "Please provide a clear and detailed response to:"
176
+ }
177
+
178
+ task_prompt = task_prompts.get(task_type, task_prompts["general"])
179
+
180
+ if context:
181
+ full_prompt = f"Context: {context}
182
+
183
+ {task_prompt} {question}"
184
+ else:
185
+ full_prompt = f"{task_prompt} {question}"
186
+
187
+ return full_prompt
188
+
189
+ # Usage
190
+ question = "What is 25% of 160?"
191
+ prompt = create_apollo_prompt(question, task_type="math")
192
+ ```
193
+
194
+ ## Performance Optimization
195
+
196
+ ### GPU Settings
197
+ ```python
198
+ # For RTX 3060 (12GB) or similar
199
+ model = AutoPeftModelForCausalLM.from_pretrained(
200
+ "vanta-research/apollo-v1-7b",
201
+ torch_dtype=torch.float16,
202
+ device_map="auto",
203
+ max_memory={0: "10GB"} # Reserve some GPU memory
204
+ )
205
+ ```
206
+
207
+ ### CPU Inference
208
+ ```python
209
+ # For CPU-only inference
210
+ model = AutoPeftModelForCausalLM.from_pretrained(
211
+ "vanta-research/apollo-v1-7b",
212
+ torch_dtype=torch.float32, # Use float32 for CPU
213
+ device_map="cpu"
214
+ )
215
+ ```
216
+
217
+ ### Quantization (Coming Soon)
218
+ ```python
219
+ # 8-bit quantization for reduced memory usage
220
+ from transformers import BitsAndBytesConfig
221
+
222
+ quantization_config = BitsAndBytesConfig(
223
+ load_in_8bit=True,
224
+ llm_int8_enable_fp32_cpu_offload=True
225
+ )
226
+
227
+ model = AutoPeftModelForCausalLM.from_pretrained(
228
+ "vanta-research/apollo-v1-7b",
229
+ quantization_config=quantization_config
230
+ )
231
+ ```
232
+
233
+ ## Integration Examples
234
+
235
+ ### FastAPI Server
236
+ ```python
237
+ from fastapi import FastAPI
238
+ from pydantic import BaseModel
239
+
240
+ app = FastAPI()
241
+
242
+ class QuestionRequest(BaseModel):
243
+ question: str
244
+ task_type: str = "general"
245
+ max_length: int = 400
246
+
247
+ @app.post("/ask")
248
+ async def ask_apollo(request: QuestionRequest):
249
+ prompt = create_apollo_prompt(request.question, task_type=request.task_type)
250
+ response = memory_efficient_generation(prompt, request.max_length)
251
+
252
+ return {
253
+ "question": request.question,
254
+ "response": response,
255
+ "task_type": request.task_type
256
+ }
257
+
258
+ # Run with: uvicorn app:app --host 0.0.0.0 --port 8000
259
+ ```
260
+
261
+ ### Gradio Interface
262
+ ```python
263
+ import gradio as gr
264
+
265
+ def apollo_interface(message, task_type):
266
+ prompt = create_apollo_prompt(message, task_type=task_type)
267
+ return memory_efficient_generation(prompt)
268
+
269
+ interface = gr.Interface(
270
+ fn=apollo_interface,
271
+ inputs=[
272
+ gr.Textbox(label="Your Question"),
273
+ gr.Dropdown(["general", "math", "legal", "logic"], label="Task Type")
274
+ ],
275
+ outputs=gr.Textbox(label="Apollo's Response"),
276
+ title="Apollo V1 7B Chat",
277
+ description="Chat with Apollo V1 7B - Advanced Reasoning AI"
278
+ )
279
+
280
+ interface.launch(share=True)
281
+ ```
282
+
283
+ ## Troubleshooting
284
+
285
+ ### Common Issues
286
+
287
+ 1. **Out of Memory**: Reduce batch size, use CPU inference, or enable memory optimization
288
+ 2. **Slow Generation**: Check device placement, enable caching, optimize prompt length
289
+ 3. **Poor Quality**: Adjust temperature (lower for factual, higher for creative)
290
+
291
+ ### Performance Tips
292
+
293
+ - Use `torch.compile()` for faster inference (PyTorch 2.0+)
294
+ - Enable gradient checkpointing for memory efficiency
295
+ - Use appropriate data types (float16 for GPU, float32 for CPU)
296
+ - Optimize prompt length and structure
297
+ - Consider quantization for resource-constrained environments
298
+
299
+ ## Best Practices
300
+
301
+ 1. **Prompt Engineering**: Be specific and clear in your questions
302
+ 2. **Temperature Settings**: Use 0.1-0.2 for factual/mathematical tasks, 0.3-0.7 for creative tasks
303
+ 3. **Context Management**: Provide relevant context for complex scenarios
304
+ 4. **Verification**: Always verify critical information, especially for legal/financial advice
305
+ 5. **Ethical Usage**: Use responsibly and within intended capabilities
306
+
307
+ For more examples and advanced usage patterns, see the GitHub repository and documentation.
adapter_config.json ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "mistralai/Mistral-7B-Instruct-v0.3",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": false,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 32,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.1,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "qalora_group_size": 16,
24
+ "r": 16,
25
+ "rank_pattern": {},
26
+ "revision": null,
27
+ "target_modules": [
28
+ "gate_proj",
29
+ "q_proj",
30
+ "v_proj",
31
+ "o_proj",
32
+ "k_proj",
33
+ "up_proj",
34
+ "down_proj"
35
+ ],
36
+ "target_parameters": null,
37
+ "task_type": "CAUSAL_LM",
38
+ "trainable_token_indices": null,
39
+ "use_dora": false,
40
+ "use_qalora": false,
41
+ "use_rslora": false,
42
+ "model_name": "Apollo V1 7B",
43
+ "created_by": "VANTA Research",
44
+ "version": "1.0.0"
45
+ }
adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e3ffbc3fcb14e47db3c05e88ec644694eceb09ab2bf5bc84d3c11e2821987f1f
3
+ size 167832240
config.json ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "vanta-research/apollo-v1-7b",
3
+ "architectures": [
4
+ "MistralForCausalLM"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 1,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 4096,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 14336,
13
+ "max_position_embeddings": 32768,
14
+ "model_type": "mistral",
15
+ "num_attention_heads": 32,
16
+ "num_hidden_layers": 32,
17
+ "num_key_value_heads": 8,
18
+ "rms_norm_eps": 1e-05,
19
+ "rope_theta": 1000000.0,
20
+ "sliding_window": 4096,
21
+ "tie_word_embeddings": false,
22
+ "torch_dtype": "float16",
23
+ "transformers_version": "4.44.2",
24
+ "use_cache": true,
25
+ "vocab_size": 32000,
26
+ "model_name": "Apollo V1 7B",
27
+ "version": "1.0.0",
28
+ "created_by": "VANTA Research",
29
+ "base_model": "mistralai/Mistral-7B-Instruct-v0.3",
30
+ "license": "MIT",
31
+ "model_description": "Advanced reasoning AI model specialized in logical reasoning, mathematical problem-solving, and legal analysis.",
32
+ "release_date": "2025-09-21T11:35:57.508765"
33
+ }
generation_config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 1,
4
+ "eos_token_id": 2,
5
+ "max_length": 32768,
6
+ "pad_token_id": 2,
7
+ "transformers_version": "4.44.2"
8
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": "</s>",
17
+ "unk_token": {
18
+ "content": "<unk>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ }
24
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:37f00374dea48658ee8f5d0f21895b9bc55cb0103939607c8185bfd1c6ca1f89
3
+ size 587404
tokenizer_config.json ADDED
The diff for this file is too large to render. See raw diff