input a longer context and the model's output is incomplete
#1
by
ELVISIO
- opened
Hello, I am currently using your model and have encountered a strange problem: when I input a longer context, the model's output is incomplete. I used vllm for inference and found that the model's maximum context length is only 16k. Is this normal?
@ELVISIO
Thank you for using our model. Theoretically there is no strict limit on the number of output tokens; however, the model's performance tends to degrade with longer outputs. Could this limitation be related to the max_token parameter set during inference? We would appreciate it if you could provide an example to help us reproduce the issue.