Commit History

Added a prompt helper to help manage the tokens, reduced the summary size to 800
abf50ce
verified

gyrmo commited on

Memory issues solved
9312fe3
verified

gyrmo commited on

I now have more GPU, therefore I have now reduced the GPU utililisation to 0.8
9b002ee
verified

gyrmo commited on

Removed the quantization line - that's crashed vLLM.
5788d26
verified

gyrmo commited on

Increase the max model length, and corrected the quantisation to awq_merlin.
d5cb9c0
verified

gyrmo commited on

Added a background wait to counter the problem
887dea0
verified

gyrmo commited on

Changed max model length to 3600 to improve hte KV cache issue..
6e99e67
verified

gyrmo commited on

Reduced the time in the wait for vLLM fnuction.
dee8a12
verified

gyrmo commited on

Reduced max model length, and increased gpu utilisation to 0.9
647e94c
verified

gyrmo commited on

Specified chat mode, and made sure that the message was streamed for a nice UI action.
47362ec
verified

gyrmo commited on

Reduced the GPU utilization and specified the quantization method.
7c71431
verified

gyrmo commited on

Added a memory buffer, and moved the wait llm function to the main bit for gradio.
6c941da
verified

gyrmo commited on

Checking something
09d0f27
verified

gyrmo commited on

Update vllm_server.py
a2902fc
verified

gyrmo commited on

Changed the GPU utilisation to 0.95
c8e1e0f
verified

gyrmo commited on

I have added some server specifics because the gradio bit isn't starting up.
c54877c
verified

gyrmo commited on

Upgraded the max model length to 8092.
49b04f7
verified

gyrmo commited on

Changed the model from FP4 to AWQ
49d9cf3
verified

gyrmo commited on

Changed the model from Instruct FP4 to AWQ
4c45212
verified

gyrmo commited on

Decreased the maximum model lenght to 3408.
5691068
verified

gyrmo commited on

Moved the embedding model to the CPU. This will allow me to have more space on the GPU for the LLM.
824aa63
verified

gyrmo commited on

Changed the time from 5 to 20 seconds.
faefdf8
verified

gyrmo commited on

Switched the model from Llama 3.3-70B to Llama-3.3-70B-Instruct-FP4.
152d1ec
verified

gyrmo commited on

Switching to a pre-quantised version of llama 3.3-70B sourced from Nvidia.
4a09bfe
verified

gyrmo commited on

Added missing library socket.
1a107ca
verified

gyrmo commited on

Rectified indentation error on line 47.
ac2bec1
verified

gyrmo commited on

Added start server and wait for server
45a3359
verified

gyrmo commited on

Updated vllm_server to include a wait for vllm portion that ensures that the model is up before the chat section loads.
10f7946
verified

gyrmo commited on

Updated "--max-model-len" from 2048 to 8092 to increase the context window and reduce the negative token error.
7a99a4e
verified

gyrmo commited on

Reduced the max-token-sizes to 2048.
0ee1471
verified

gyrmo commited on

Ensuring that it's Python 3.10
563fb77
verified

gyrmo commited on

Set python to 3.10
f6980fa
verified

gyrmo commited on

Removed the vllm package specification to ensure a compatible installation.
3416e14
verified

gyrmo commited on

Updated requirements
6e8c952
verified

gyrmo commited on

Changes added to move to vLLM
d16e0c5
verified

gyrmo commited on

Uploading a vllm server file
5f1293d
verified

gyrmo commited on

Specified the provider (groq) anf edited max new tokens to 16k.
001ddc4
verified

gyrmo commited on

Increased the max new tokens and edited the system prompt to remove the in text citations. Reduced the temperature to 0.5 since there was a lot more faff in the response.
ab77125
verified

gyrmo commited on

Forgot ot update to the new database.
a3cdb48
verified

gyrmo commited on

Removed the sources display to update the look of the chatbot to something more friendly.
81b60dc
verified

gyrmo commited on

Uploading the updated database to change things up.
3e845af
verified

gyrmo commited on

Updated the temperature to 0.8 , and the max new tokens to 8192. Removed the context window, might put it in later as 16000.
f24aa68
verified

gyrmo commited on

Updated max new tokens to 4096 to improve the cut offs.
10448be
verified

gyrmo commited on

Switched back to an inference provider for speed
496a4ed
verified

gyrmo commited on

I forgot a comma.
e5fbced
verified

gyrmo commited on

Added back in the tokenizer
d141afa
verified

gyrmo commited on

Removed max_new_tokens
4d11293
verified

gyrmo commited on

Added the HF_TOKEN collection
b3261a5
verified

gyrmo commited on

Updated HuggingFaceLLM parameters
6e3b747
verified

gyrmo commited on

Removed the provider=auto to fix type error
899be85
verified

gyrmo commited on