Model for 2xH200?

#1
by sandorkonya - opened

Hi there, thank you for the Quantizations!

I have access to 2xh200 + 16x64Gb RAM, which Model would you recommend?

What could be the expected T/s, ist here any approx (in vllm)?

(I've read that the original model ran approx 7.5T/s on a 1xH200)

Thy for your time in advance.

with 2xH200 with no offloading you'd be able to fit IQ2_XS I assume

How would the original model run at 7.5 t/s on a single ? :O are you thinking of the linear model?

Sign up or log in to comment