Model for 2xH200?
#1
by
sandorkonya
- opened
Hi there, thank you for the Quantizations!
I have access to 2xh200 + 16x64Gb RAM, which Model would you recommend?
What could be the expected T/s, ist here any approx (in vllm)?
(I've read that the original model ran approx 7.5T/s on a 1xH200)
Thy for your time in advance.
with 2xH200 with no offloading you'd be able to fit IQ2_XS I assume
How would the original model run at 7.5 t/s on a single ? :O are you thinking of the linear model?