Paralelized script

#67
by ajtakto - opened

Hello, I try to run generate.py under llang singularity on gpus with 64GB RAM and am getting out of memory response. Do you have paralelized script that can divide the load to more gpus?

For those asking about API access — I've been using Crazyrouter as a unified gateway. One API key, OpenAI SDK compatible. Works well for testing different models without managing multiple accounts.

Sign up or log in to comment