Added a prompt helper to help manage the tokens, reduced the summary size to 800 abf50ce verified gyrmo commited on 9 days ago
I now have more GPU, therefore I have now reduced the GPU utililisation to 0.8 9b002ee verified gyrmo commited on 9 days ago
Increase the max model length, and corrected the quantisation to awq_merlin. d5cb9c0 verified gyrmo commited on 10 days ago
Changed max model length to 3600 to improve hte KV cache issue.. 6e99e67 verified gyrmo commited on 10 days ago
Reduced max model length, and increased gpu utilisation to 0.9 647e94c verified gyrmo commited on 10 days ago
Specified chat mode, and made sure that the message was streamed for a nice UI action. 47362ec verified gyrmo commited on 10 days ago
Reduced the GPU utilization and specified the quantization method. 7c71431 verified gyrmo commited on 10 days ago
Added a memory buffer, and moved the wait llm function to the main bit for gradio. 6c941da verified gyrmo commited on 10 days ago
I have added some server specifics because the gradio bit isn't starting up. c54877c verified gyrmo commited on 11 days ago
Moved the embedding model to the CPU. This will allow me to have more space on the GPU for the LLM. 824aa63 verified gyrmo commited on 11 days ago
Switched the model from Llama 3.3-70B to Llama-3.3-70B-Instruct-FP4. 152d1ec verified gyrmo commited on 11 days ago
Switching to a pre-quantised version of llama 3.3-70B sourced from Nvidia. 4a09bfe verified gyrmo commited on 11 days ago
Updated vllm_server to include a wait for vllm portion that ensures that the model is up before the chat section loads. 10f7946 verified gyrmo commited on 12 days ago
Updated "--max-model-len" from 2048 to 8092 to increase the context window and reduce the negative token error. 7a99a4e verified gyrmo commited on 12 days ago
Removed the vllm package specification to ensure a compatible installation. 3416e14 verified gyrmo commited on 12 days ago
Specified the provider (groq) anf edited max new tokens to 16k. 001ddc4 verified gyrmo commited on 17 days ago
Increased the max new tokens and edited the system prompt to remove the in text citations. Reduced the temperature to 0.5 since there was a lot more faff in the response. ab77125 verified gyrmo commited on 17 days ago
Removed the sources display to update the look of the chatbot to something more friendly. 81b60dc verified gyrmo commited on 17 days ago
Updated the temperature to 0.8 , and the max new tokens to 8192. Removed the context window, might put it in later as 16000. f24aa68 verified gyrmo commited on 17 days ago
Updated max new tokens to 4096 to improve the cut offs. 10448be verified gyrmo commited on 17 days ago