DGX SPARK VLLM RESULTS
#4
by
RGMC98 - opened
Llama Benchy results with MTP 2
| model | test | t/s (total) | t/s (req) | peak t/s | peak t/s (req) | ttfr (ms) | est_ppt (ms) | e2e_ttft (ms) |
|---|---|---|---|---|---|---|---|---|
| Qwen/Qwen3.5-9B | pp2048 (c1) | 2620.55 ± 108.88 | 2620.55 ± 108.88 | 784.91 ± 33.23 | 783.02 ± 33.23 | 785.00 ± 33.22 | ||
| Qwen/Qwen3.5-9B | tg128 (c1) | 9.68 ± 0.24 | 9.68 ± 0.24 | 10.67 ± 0.47 | 10.67 ± 0.47 | |||
| Qwen/Qwen3.5-9B | pp2048 (c2) | 3149.17 ± 65.54 | 1577.58 ± 32.55 | 1301.05 ± 26.31 | 1299.16 ± 26.31 | 1301.10 ± 26.31 | ||
| Qwen/Qwen3.5-9B | tg128 (c2) | 18.51 ± 0.40 | 9.79 ± 0.17 | 22.00 ± 0.00 | 11.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d4096 (c1) | 3172.55 ± 235.93 | 3172.55 ± 235.93 | 1300.38 ± 97.06 | 1298.49 ± 97.06 | 1300.49 ± 97.07 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d4096 (c1) | 9.92 ± 0.22 | 9.92 ± 0.22 | 11.00 ± 0.00 | 11.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d4096 (c1) | 1243.76 ± 6.36 | 1243.76 ± 6.36 | 1648.56 ± 8.45 | 1646.67 ± 8.45 | 1648.66 ± 8.45 | ||
| Qwen/Qwen3.5-9B | tg128 @ d4096 (c1) | 10.09 ± 0.02 | 10.09 ± 0.02 | 11.00 ± 0.00 | 11.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d4096 (c2) | 3883.72 ± 15.81 | 1943.85 ± 7.99 | 2109.68 ± 8.86 | 2107.79 ± 8.86 | 2109.73 ± 8.85 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d4096 (c2) | 19.40 ± 0.59 | 10.05 ± 0.03 | 22.00 ± 0.00 | 11.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d4096 (c2) | 1304.04 ± 37.92 | 652.49 ± 18.98 | 3143.33 ± 93.21 | 3141.44 ± 93.21 | 3143.39 ± 93.21 | ||
| Qwen/Qwen3.5-9B | tg128 @ d4096 (c2) | 18.82 ± 0.91 | 9.91 ± 0.16 | 22.00 ± 0.00 | 11.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d8192 (c1) | 3820.75 ± 15.57 | 3820.75 ± 15.57 | 2146.18 ± 8.64 | 2144.29 ± 8.64 | 2146.25 ± 8.62 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d8192 (c1) | 9.97 ± 0.00 | 9.97 ± 0.00 | 11.00 ± 0.00 | 11.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d8192 (c1) | 775.86 ± 3.93 | 775.86 ± 3.93 | 2641.61 ± 13.35 | 2639.72 ± 13.35 | 2641.66 ± 13.34 | ||
| Qwen/Qwen3.5-9B | tg128 @ d8192 (c1) | 9.89 ± 0.01 | 9.89 ± 0.01 | 10.33 ± 0.47 | 10.33 ± 0.47 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d8192 (c2) | 4077.40 ± 3.08 | 2039.80 ± 1.56 | 4018.39 ± 2.99 | 4016.50 ± 2.99 | 4018.44 ± 2.99 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d8192 (c2) | 18.92 ± 0.11 | 9.78 ± 0.11 | 21.33 ± 0.94 | 10.67 ± 0.47 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d8192 (c2) | 816.36 ± 1.05 | 408.37 ± 0.53 | 5016.98 ± 6.46 | 5015.09 ± 6.46 | 5017.04 ± 6.46 | ||
| Qwen/Qwen3.5-9B | tg128 @ d8192 (c2) | 18.48 ± 0.65 | 9.72 ± 0.02 | 20.00 ± 0.00 | 10.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d16384 (c1) | 3927.25 ± 5.82 | 3927.25 ± 5.82 | 4173.86 ± 6.41 | 4171.97 ± 6.41 | 4173.96 ± 6.40 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d16384 (c1) | 9.57 ± 0.02 | 9.57 ± 0.02 | 10.00 ± 0.00 | 10.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d16384 (c1) | 436.06 ± 0.63 | 436.06 ± 0.63 | 4698.53 ± 6.79 | 4696.64 ± 6.79 | 4698.64 ± 6.77 | ||
| Qwen/Qwen3.5-9B | tg128 @ d16384 (c1) | 9.51 ± 0.02 | 9.51 ± 0.02 | 10.00 ± 0.00 | 10.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d16384 (c2) | 4050.36 ± 66.34 | 2036.83 ± 52.49 | 8051.44 ± 202.01 | 8049.55 ± 202.01 | 8051.50 ± 202.02 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d16384 (c2) | 18.58 ± 0.09 | 9.45 ± 0.07 | 20.33 ± 0.47 | 10.33 ± 0.47 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d16384 (c2) | 447.29 ± 5.77 | 224.80 ± 4.71 | 9116.12 ± 187.59 | 9114.23 ± 187.59 | 9116.16 ± 187.59 | ||
| Qwen/Qwen3.5-9B | tg128 @ d16384 (c2) | 18.20 ± 0.03 | 9.35 ± 0.13 | 20.33 ± 0.47 | 10.33 ± 0.47 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d32768 (c1) | 3906.92 ± 2.15 | 3906.92 ± 2.15 | 8389.14 ± 4.70 | 8387.25 ± 4.70 | 8389.22 ± 4.70 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d32768 (c1) | 8.65 ± 0.02 | 8.65 ± 0.02 | 9.00 ± 0.00 | 9.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d32768 (c1) | 224.00 ± 1.87 | 224.00 ± 1.87 | 9145.26 ± 75.94 | 9143.37 ± 75.94 | 9145.36 ± 75.96 | ||
| Qwen/Qwen3.5-9B | tg128 @ d32768 (c1) | 8.61 ± 0.02 | 8.61 ± 0.02 | 9.00 ± 0.00 | 9.00 ± 0.00 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d32768 (c2) | 3712.80 ± 281.87 | 1874.29 ± 154.97 | 17612.52 ± 1538.50 | 17610.63 ± 1538.50 | 17612.58 ± 1538.51 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d32768 (c2) | 16.27 ± 0.66 | 8.64 ± 0.20 | 18.00 ± 0.00 | 9.33 ± 0.47 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d32768 (c2) | 216.48 ± 15.33 | 109.29 ± 8.49 | 18861.48 ± 1543.33 | 18859.59 ± 1543.33 | 18861.53 ± 1543.32 | ||
| Qwen/Qwen3.5-9B | tg128 @ d32768 (c2) | 16.14 ± 0.76 | 8.55 ± 0.25 | 18.00 ± 0.00 | 9.17 ± 0.37 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d65535 (c1) | 3157.82 ± 119.30 | 3157.82 ± 119.30 | 20785.76 ± 803.08 | 20783.87 ± 803.08 | 20785.85 ± 803.07 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d65535 (c1) | 7.72 ± 0.09 | 7.72 ± 0.09 | 8.33 ± 0.47 | 8.33 ± 0.47 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d65535 (c1) | 91.90 ± 0.46 | 91.90 ± 0.46 | 22286.91 ± 111.65 | 22285.02 ± 111.65 | 22286.99 ± 111.63 | ||
| Qwen/Qwen3.5-9B | tg128 @ d65535 (c1) | 7.64 ± 0.09 | 7.64 ± 0.09 | 8.33 ± 0.47 | 8.33 ± 0.47 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d65535 (c2) | 2286.44 ± 491.33 | 1155.57 ± 254.76 | 60172.33 ± 15685.93 | 60170.44 ± 15685.93 | 60175.67 ± 15685.28 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d65535 (c2) | 7.94 ± 1.11 | 4.42 ± 0.36 | 14.67 ± 0.94 | 7.33 ± 0.47 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d65535 (c2) | 80.29 ± 2.96 | 40.45 ± 1.46 | 50694.61 ± 1874.98 | 50692.72 ± 1874.98 | 50698.34 ± 1874.29 | ||
| Qwen/Qwen3.5-9B | tg128 @ d65535 (c2) | 10.07 ± 1.40 | 5.35 ± 0.89 | 15.33 ± 0.94 | 7.67 ± 0.47 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d100000 (c1) | 2407.21 ± 33.79 | 2407.21 ± 33.79 | 41552.12 ± 578.07 | 41550.23 ± 578.07 | 41582.82 ± 599.50 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d100000 (c1) | 4.31 ± 0.81 | 4.31 ± 0.81 | 7.67 ± 1.25 | 7.67 ± 1.25 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d100000 (c1) | 44.33 ± 4.38 | 44.33 ± 4.38 | 46680.51 ± 4856.03 | 46678.62 ± 4856.03 | 46702.83 ± 4849.40 | ||
| Qwen/Qwen3.5-9B | tg128 @ d100000 (c1) | 5.64 ± 0.49 | 5.64 ± 0.49 | 7.67 ± 0.94 | 7.67 ± 0.94 | |||
| Qwen/Qwen3.5-9B | ctx_pp @ d100000 (c2) | 2328.80 ± 182.23 | 1200.93 ± 76.42 | 83646.13 ± 5896.07 | 83644.24 ± 5896.07 | 83653.27 ± 5892.10 | ||
| Qwen/Qwen3.5-9B | ctx_tg @ d100000 (c2) | 5.98 ± 4.08 | 4.48 ± 2.12 | 11.67 ± 4.78 | 7.06 ± 2.53 | |||
| Qwen/Qwen3.5-9B | pp2048 @ d100000 (c2) | 48.26 ± 0.91 | 24.32 ± 0.50 | 84264.99 ± 1754.30 | 84263.10 ± 1754.30 | 84269.39 ± 1752.62 | ||
| Qwen/Qwen3.5-9B | tg128 @ d100000 (c2) | 8.28 ± 1.01 | 4.56 ± 0.57 | 14.67 ± 0.94 | 7.33 ± 0.47 |
The generation speed appears to be quite low, at only 9.68 to 18.51 t/s. It's not reasonable for a 9B model that supports linear attention. Does this indicate that there is still room for optimization?