Spaces:

gyrmo
/

CitizenClimate

Sleeping

App Files Files

CitizenClimate

Commit History

Added a prompt helper to help manage the tokens, reduced the summary size to 800

abf50ce
verified

gyrmo commited on 9 days ago

Memory issues solved

9312fe3
verified

gyrmo commited on 9 days ago

I now have more GPU, therefore I have now reduced the GPU utililisation to 0.8

9b002ee
verified

gyrmo commited on 9 days ago

Removed the quantization line - that's crashed vLLM.

5788d26
verified

gyrmo commited on 10 days ago

Increase the max model length, and corrected the quantisation to awq_merlin.

d5cb9c0
verified

gyrmo commited on 10 days ago

Added a background wait to counter the problem

887dea0
verified

gyrmo commited on 10 days ago

Changed max model length to 3600 to improve hte KV cache issue..

6e99e67
verified

gyrmo commited on 10 days ago

Reduced the time in the wait for vLLM fnuction.

dee8a12
verified

gyrmo commited on 10 days ago

Reduced max model length, and increased gpu utilisation to 0.9

647e94c
verified

gyrmo commited on 10 days ago

Specified chat mode, and made sure that the message was streamed for a nice UI action.

47362ec
verified

gyrmo commited on 10 days ago

Reduced the GPU utilization and specified the quantization method.

7c71431
verified

gyrmo commited on 10 days ago

Added a memory buffer, and moved the wait llm function to the main bit for gradio.

6c941da
verified

gyrmo commited on 10 days ago

Checking something

09d0f27
verified

gyrmo commited on 11 days ago

Update vllm_server.py

a2902fc
verified

gyrmo commited on 11 days ago

Changed the GPU utilisation to 0.95

c8e1e0f
verified

gyrmo commited on 11 days ago

I have added some server specifics because the gradio bit isn't starting up.

c54877c
verified

gyrmo commited on 11 days ago

Upgraded the max model length to 8092.

49b04f7
verified

gyrmo commited on 11 days ago

Changed the model from FP4 to AWQ

49d9cf3
verified

gyrmo commited on 11 days ago

Changed the model from Instruct FP4 to AWQ

4c45212
verified

gyrmo commited on 11 days ago

Decreased the maximum model lenght to 3408.

5691068
verified

gyrmo commited on 11 days ago

Moved the embedding model to the CPU. This will allow me to have more space on the GPU for the LLM.

824aa63
verified

gyrmo commited on 11 days ago

Changed the time from 5 to 20 seconds.

faefdf8
verified

gyrmo commited on 11 days ago

Switched the model from Llama 3.3-70B to Llama-3.3-70B-Instruct-FP4.

152d1ec
verified

gyrmo commited on 11 days ago

Switching to a pre-quantised version of llama 3.3-70B sourced from Nvidia.

4a09bfe
verified

gyrmo commited on 11 days ago

Added missing library socket.

1a107ca
verified

gyrmo commited on 11 days ago

Rectified indentation error on line 47.

ac2bec1
verified

gyrmo commited on 11 days ago

Added start server and wait for server

45a3359
verified

gyrmo commited on 12 days ago

Updated vllm_server to include a wait for vllm portion that ensures that the model is up before the chat section loads.

10f7946
verified

gyrmo commited on 12 days ago

Updated "--max-model-len" from 2048 to 8092 to increase the context window and reduce the negative token error.

7a99a4e
verified

gyrmo commited on 12 days ago

Reduced the max-token-sizes to 2048.

0ee1471
verified

gyrmo commited on 12 days ago

Ensuring that it's Python 3.10

563fb77
verified

gyrmo commited on 12 days ago

Set python to 3.10

f6980fa
verified

gyrmo commited on 12 days ago

Removed the vllm package specification to ensure a compatible installation.

3416e14
verified

gyrmo commited on 12 days ago

Updated requirements

6e8c952
verified

gyrmo commited on 12 days ago

Changes added to move to vLLM

d16e0c5
verified

gyrmo commited on 15 days ago

Uploading a vllm server file

5f1293d
verified

gyrmo commited on 15 days ago

Specified the provider (groq) anf edited max new tokens to 16k.

001ddc4
verified

gyrmo commited on 17 days ago

Increased the max new tokens and edited the system prompt to remove the in text citations. Reduced the temperature to 0.5 since there was a lot more faff in the response.

ab77125
verified

gyrmo commited on 17 days ago

Forgot ot update to the new database.

a3cdb48
verified

gyrmo commited on 17 days ago

Removed the sources display to update the look of the chatbot to something more friendly.

81b60dc
verified

gyrmo commited on 17 days ago

Uploading the updated database to change things up.

3e845af
verified

gyrmo commited on 17 days ago

Updated the temperature to 0.8 , and the max new tokens to 8192. Removed the context window, might put it in later as 16000.

f24aa68
verified

gyrmo commited on 17 days ago

Updated max new tokens to 4096 to improve the cut offs.

10448be
verified

gyrmo commited on 17 days ago

Switched back to an inference provider for speed

496a4ed
verified

gyrmo commited on 17 days ago

I forgot a comma.

e5fbced
verified

gyrmo commited on 17 days ago

Added back in the tokenizer

d141afa
verified

gyrmo commited on 18 days ago

Removed max_new_tokens

4d11293
verified

gyrmo commited on 18 days ago

Added the HF_TOKEN collection

b3261a5
verified

gyrmo commited on 18 days ago

Updated HuggingFaceLLM parameters

6e3b747
verified

gyrmo commited on 18 days ago

Removed the provider=auto to fix type error

899be85
verified

gyrmo commited on 18 days ago

Commit History

Added a prompt helper to help manage the tokens, reduced the summary size to 800 abf50ce verified

Memory issues solved 9312fe3 verified

I now have more GPU, therefore I have now reduced the GPU utililisation to 0.8 9b002ee verified

Removed the quantization line - that's crashed vLLM. 5788d26 verified

Increase the max model length, and corrected the quantisation to awq_merlin. d5cb9c0 verified

Added a background wait to counter the problem 887dea0 verified

Changed max model length to 3600 to improve hte KV cache issue.. 6e99e67 verified

Reduced the time in the wait for vLLM fnuction. dee8a12 verified

Reduced max model length, and increased gpu utilisation to 0.9 647e94c verified

Specified chat mode, and made sure that the message was streamed for a nice UI action. 47362ec verified

Reduced the GPU utilization and specified the quantization method. 7c71431 verified

Added a memory buffer, and moved the wait llm function to the main bit for gradio. 6c941da verified

Checking something 09d0f27 verified

Update vllm_server.py a2902fc verified

Changed the GPU utilisation to 0.95 c8e1e0f verified

I have added some server specifics because the gradio bit isn't starting up. c54877c verified

Upgraded the max model length to 8092. 49b04f7 verified

Changed the model from FP4 to AWQ 49d9cf3 verified

Changed the model from Instruct FP4 to AWQ 4c45212 verified

Decreased the maximum model lenght to 3408. 5691068 verified

Moved the embedding model to the CPU. This will allow me to have more space on the GPU for the LLM. 824aa63 verified

Changed the time from 5 to 20 seconds. faefdf8 verified

Switched the model from Llama 3.3-70B to Llama-3.3-70B-Instruct-FP4. 152d1ec verified

Switching to a pre-quantised version of llama 3.3-70B sourced from Nvidia. 4a09bfe verified

Added missing library socket. 1a107ca verified

Rectified indentation error on line 47. ac2bec1 verified

Added start server and wait for server 45a3359 verified

Updated vllm_server to include a wait for vllm portion that ensures that the model is up before the chat section loads. 10f7946 verified

Updated "--max-model-len" from 2048 to 8092 to increase the context window and reduce the negative token error. 7a99a4e verified

Reduced the max-token-sizes to 2048. 0ee1471 verified

Ensuring that it's Python 3.10 563fb77 verified

Set python to 3.10 f6980fa verified

Removed the vllm package specification to ensure a compatible installation. 3416e14 verified

Updated requirements 6e8c952 verified

Changes added to move to vLLM d16e0c5 verified

Uploading a vllm server file 5f1293d verified

Specified the provider (groq) anf edited max new tokens to 16k. 001ddc4 verified

Increased the max new tokens and edited the system prompt to remove the in text citations. Reduced the temperature to 0.5 since there was a lot more faff in the response. ab77125 verified

Forgot ot update to the new database. a3cdb48 verified

Removed the sources display to update the look of the chatbot to something more friendly. 81b60dc verified

Uploading the updated database to change things up. 3e845af verified

Updated the temperature to 0.8 , and the max new tokens to 8192. Removed the context window, might put it in later as 16000. f24aa68 verified

Updated max new tokens to 4096 to improve the cut offs. 10448be verified

Switched back to an inference provider for speed 496a4ed verified

I forgot a comma. e5fbced verified

Added back in the tokenizer d141afa verified

Removed max_new_tokens 4d11293 verified

Added the HF_TOKEN collection b3261a5 verified

Updated HuggingFaceLLM parameters 6e3b747 verified

Removed the provider=auto to fix type error 899be85 verified

Added a prompt helper to help manage the tokens, reduced the summary size to 800

abf50ce
verified

Memory issues solved

9312fe3
verified

I now have more GPU, therefore I have now reduced the GPU utililisation to 0.8

9b002ee
verified

Removed the quantization line - that's crashed vLLM.

5788d26
verified

Increase the max model length, and corrected the quantisation to awq_merlin.

d5cb9c0
verified

Added a background wait to counter the problem

887dea0
verified

Changed max model length to 3600 to improve hte KV cache issue..

6e99e67
verified

Reduced the time in the wait for vLLM fnuction.

dee8a12
verified

Reduced max model length, and increased gpu utilisation to 0.9

647e94c
verified

Specified chat mode, and made sure that the message was streamed for a nice UI action.

47362ec
verified

Reduced the GPU utilization and specified the quantization method.

7c71431
verified

Added a memory buffer, and moved the wait llm function to the main bit for gradio.

6c941da
verified

Checking something

09d0f27
verified

Update vllm_server.py

a2902fc
verified

Changed the GPU utilisation to 0.95

c8e1e0f
verified

I have added some server specifics because the gradio bit isn't starting up.

c54877c
verified

Upgraded the max model length to 8092.

49b04f7
verified

Changed the model from FP4 to AWQ

49d9cf3
verified

Changed the model from Instruct FP4 to AWQ

4c45212
verified

Decreased the maximum model lenght to 3408.

5691068
verified

Moved the embedding model to the CPU. This will allow me to have more space on the GPU for the LLM.

824aa63
verified

Changed the time from 5 to 20 seconds.

faefdf8
verified

Switched the model from Llama 3.3-70B to Llama-3.3-70B-Instruct-FP4.

152d1ec
verified

Switching to a pre-quantised version of llama 3.3-70B sourced from Nvidia.

4a09bfe
verified

Added missing library socket.

1a107ca
verified

Rectified indentation error on line 47.

ac2bec1
verified

Added start server and wait for server

45a3359
verified

Updated vllm_server to include a wait for vllm portion that ensures that the model is up before the chat section loads.

10f7946
verified

Updated "--max-model-len" from 2048 to 8092 to increase the context window and reduce the negative token error.

7a99a4e
verified

Reduced the max-token-sizes to 2048.

0ee1471
verified

Ensuring that it's Python 3.10

563fb77
verified

Set python to 3.10

f6980fa
verified

Removed the vllm package specification to ensure a compatible installation.

3416e14
verified

Updated requirements

6e8c952
verified

Changes added to move to vLLM

d16e0c5
verified

Uploading a vllm server file

5f1293d
verified

Specified the provider (groq) anf edited max new tokens to 16k.

001ddc4
verified

Increased the max new tokens and edited the system prompt to remove the in text citations. Reduced the temperature to 0.5 since there was a lot more faff in the response.

ab77125
verified

Forgot ot update to the new database.

a3cdb48
verified

Removed the sources display to update the look of the chatbot to something more friendly.

81b60dc
verified

Uploading the updated database to change things up.

3e845af
verified

Updated the temperature to 0.8 , and the max new tokens to 8192. Removed the context window, might put it in later as 16000.

f24aa68
verified

Updated max new tokens to 4096 to improve the cut offs.

10448be
verified

Switched back to an inference provider for speed

496a4ed
verified

I forgot a comma.

e5fbced
verified

Added back in the tokenizer

d141afa
verified

Removed max_new_tokens

4d11293
verified

Added the HF_TOKEN collection

b3261a5
verified

Updated HuggingFaceLLM parameters

6e3b747
verified

Removed the provider=auto to fix type error

899be85
verified