TheStageAI
/

Elastic-Mistral-7B-Instruct-v0.3

Text Generation

text2text-generation

Model card Files Files and versions

xet

Community

hypothetical commited on Apr 9

Commit

47f284b

verified ·

1 Parent(s): f835c5b

Update README.md

Browse files

Files changed (1) hide show

README.md +20 -35

README.md CHANGED Viewed

@@ -47,7 +47,6 @@ from elastic_models.transformers import AutoModelForCausalLM
 # model confugaration as well
 model_name = "mistralai/Mistral-7B-Instruct-v0.3"
 hf_token = ''
-hf_cache_dir = ''
 device = torch.device("cuda")
 # Create mode
@@ -57,7 +56,6 @@ tokenizer = AutoTokenizer.from_pretrained(
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     token=hf_token,
-    cache_dir=hf_cache_dir,
     torch_dtype=torch.bfloat16,
     attn_implementation="sdpa",
     mode='s'
@@ -85,28 +83,29 @@ print(f"# Q:\n{prompt}\n")
 print(f"# A:\n{output}\n")
 ```
-### Installation
-__System requirements__
 * GPUs: H100, L40s
 * CPU: AMD, Intel
-* OS: Linux #TODO
 * Python: 3.10-3.12
-To work with our models
 ```shell
 pip install thestage
 pip install elastic_models
 ```
-Then go to app.thestage.ai, login and generate API token from your profile page. Set up API token as follows:
 ```shell
 thestage config set --api-token <YOUR_API_TOKEN>
@@ -126,10 +125,10 @@ For quality evaluation we have used: #TODO link to github
 | Metric/Model  | S | M | L | XL | Original | W8A8, int8 |
 |---------------|---|---|---|----|----------|------------|
-| MMLU          | 0 | 0 | 0 | 0  | 0        | 0          |
-| PIQA          | 0 | 0 | 0 | 0  | 0        | 0          |
-| Arc Challenge | 0 | 0 | 0 | 0  | 0        | 0          |
-| Winogrande    | 0 | 0 | 0 | 0  | 0        | 0          |
 * **MMLU**:Evaluates general knowledge across 57 subjects including science, humanities, engineering, and more. Shows model's ability to handle diverse academic topics.
@@ -139,32 +138,18 @@ For quality evaluation we have used: #TODO link to github
 ### Latency benchmarks
-We have profiled models in different scenarios:
-<table>
-<tr><th> 100 input/300 output; tok/s </th><th> 1000 input/1000 output; tok/s </th></tr>
-<tr><td>
-| GPU/Model | S   | M | L | XL | Original | W8A8, int8 |
-|-----------|-----|---|---|----|----------|------------|
-| H100      | 189 | 0 | 0 | 0  | 48       | 0          |
-| L40s      | 79  | 0 | 0 | 0  | 42       | 0          |
-</td><td>
 | GPU/Model | S   | M | L | XL | Original | W8A8, int8 |
 |-----------|-----|---|---|----|----------|------------|
-| H100      | 189 | 0 | 0 | 0  | 48       | 0          |
-| L40s      | 79  | 0 | 0 | 0  | 42       | 0          |
-</td></tr> </table>
 ## Links
 * __Platform__: [app.thestage.ai](app.thestage.ai)
-* __Elastic models Github__: [app.thestage.ai](app.thestage.ai)
 * __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
 * __Contact email__: contact@thestage.ai

 # model confugaration as well
 model_name = "mistralai/Mistral-7B-Instruct-v0.3"
 hf_token = ''
 device = torch.device("cuda")
 # Create mode
 model = AutoModelForCausalLM.from_pretrained(
     model_name,
     token=hf_token,
     torch_dtype=torch.bfloat16,
     attn_implementation="sdpa",
     mode='s'
 print(f"# A:\n{output}\n")
 ```
+__System requirements:__
 * GPUs: H100, L40s
 * CPU: AMD, Intel
 * Python: 3.10-3.12
+To work with our models just run these lines in your terminal:
 ```shell
 pip install thestage
 pip install elastic_models
+pip install flash_attn==2.7.3 --no-build-isolation
+pip uninstall apex
+echo "{
+    "meta-llama/Llama-3.2-1B-Instruct": 6,
+    "mistralai/Mistral-7B-Instruct-v0.3": 7,
+    "black-forest-labs/FLUX.1-schnell": 1,
+    "black-forest-labs/FLUX.1-dev": 5
+}" > model_name_id.json
+export ELASTIC_MODEL_ID_MAPPING=./model_name_id.json
 ```
+Then go to [app.thestage.ai](https://app.thestage.ai), login and generate API token from your profile page. Set up API token as follows:
 ```shell
 thestage config set --api-token <YOUR_API_TOKEN>
 | Metric/Model  | S | M | L | XL | Original | W8A8, int8 |
 |---------------|---|---|---|----|----------|------------|
+| MMLU          | 59.7 | 60.1 |  60.8 | 61.4  | 61.4        | 28          |
+| PIQA          | 80.8 | 82   | 81.7  | 81.5  | 81.5        | 65.3          |
+| Arc Challenge | 56.6 | 55.1 | 56.8  | 57.4  | 57.4        | 33.2         |
+| Winogrande    | 73.2 | 72.3 | 73.2  | 74.1  | 74.1        | 57          |
 * **MMLU**:Evaluates general knowledge across 57 subjects including science, humanities, engineering, and more. Shows model's ability to handle diverse academic topics.
 ### Latency benchmarks
+__100 input/300 output; tok/s:__
 | GPU/Model | S   | M | L | XL | Original | W8A8, int8 |
 |-----------|-----|---|---|----|----------|------------|
+| H100      | 189 | 166 | 148 | 134  | 49       | 192          |
+| L40s      | 79  | 68 | 59 | 47  | 38    | 82         |
 ## Links
 * __Platform__: [app.thestage.ai](app.thestage.ai)
+<!-- * __Elastic models Github__: [app.thestage.ai](app.thestage.ai) -->
 * __Subscribe for updates__: [TheStageAI X](https://x.com/TheStageAI)
 * __Contact email__: contact@thestage.ai