DGX SPARK VLLM RESULTS

#4
by RGMC98 - opened

Llama Benchy results with MTP 2

model test t/s (total) t/s (req) peak t/s peak t/s (req) ttfr (ms) est_ppt (ms) e2e_ttft (ms)
Qwen/Qwen3.5-9B pp2048 (c1) 2620.55 ± 108.88 2620.55 ± 108.88 784.91 ± 33.23 783.02 ± 33.23 785.00 ± 33.22
Qwen/Qwen3.5-9B tg128 (c1) 9.68 ± 0.24 9.68 ± 0.24 10.67 ± 0.47 10.67 ± 0.47
Qwen/Qwen3.5-9B pp2048 (c2) 3149.17 ± 65.54 1577.58 ± 32.55 1301.05 ± 26.31 1299.16 ± 26.31 1301.10 ± 26.31
Qwen/Qwen3.5-9B tg128 (c2) 18.51 ± 0.40 9.79 ± 0.17 22.00 ± 0.00 11.00 ± 0.00
Qwen/Qwen3.5-9B ctx_pp @ d4096 (c1) 3172.55 ± 235.93 3172.55 ± 235.93 1300.38 ± 97.06 1298.49 ± 97.06 1300.49 ± 97.07
Qwen/Qwen3.5-9B ctx_tg @ d4096 (c1) 9.92 ± 0.22 9.92 ± 0.22 11.00 ± 0.00 11.00 ± 0.00
Qwen/Qwen3.5-9B pp2048 @ d4096 (c1) 1243.76 ± 6.36 1243.76 ± 6.36 1648.56 ± 8.45 1646.67 ± 8.45 1648.66 ± 8.45
Qwen/Qwen3.5-9B tg128 @ d4096 (c1) 10.09 ± 0.02 10.09 ± 0.02 11.00 ± 0.00 11.00 ± 0.00
Qwen/Qwen3.5-9B ctx_pp @ d4096 (c2) 3883.72 ± 15.81 1943.85 ± 7.99 2109.68 ± 8.86 2107.79 ± 8.86 2109.73 ± 8.85
Qwen/Qwen3.5-9B ctx_tg @ d4096 (c2) 19.40 ± 0.59 10.05 ± 0.03 22.00 ± 0.00 11.00 ± 0.00
Qwen/Qwen3.5-9B pp2048 @ d4096 (c2) 1304.04 ± 37.92 652.49 ± 18.98 3143.33 ± 93.21 3141.44 ± 93.21 3143.39 ± 93.21
Qwen/Qwen3.5-9B tg128 @ d4096 (c2) 18.82 ± 0.91 9.91 ± 0.16 22.00 ± 0.00 11.00 ± 0.00
Qwen/Qwen3.5-9B ctx_pp @ d8192 (c1) 3820.75 ± 15.57 3820.75 ± 15.57 2146.18 ± 8.64 2144.29 ± 8.64 2146.25 ± 8.62
Qwen/Qwen3.5-9B ctx_tg @ d8192 (c1) 9.97 ± 0.00 9.97 ± 0.00 11.00 ± 0.00 11.00 ± 0.00
Qwen/Qwen3.5-9B pp2048 @ d8192 (c1) 775.86 ± 3.93 775.86 ± 3.93 2641.61 ± 13.35 2639.72 ± 13.35 2641.66 ± 13.34
Qwen/Qwen3.5-9B tg128 @ d8192 (c1) 9.89 ± 0.01 9.89 ± 0.01 10.33 ± 0.47 10.33 ± 0.47
Qwen/Qwen3.5-9B ctx_pp @ d8192 (c2) 4077.40 ± 3.08 2039.80 ± 1.56 4018.39 ± 2.99 4016.50 ± 2.99 4018.44 ± 2.99
Qwen/Qwen3.5-9B ctx_tg @ d8192 (c2) 18.92 ± 0.11 9.78 ± 0.11 21.33 ± 0.94 10.67 ± 0.47
Qwen/Qwen3.5-9B pp2048 @ d8192 (c2) 816.36 ± 1.05 408.37 ± 0.53 5016.98 ± 6.46 5015.09 ± 6.46 5017.04 ± 6.46
Qwen/Qwen3.5-9B tg128 @ d8192 (c2) 18.48 ± 0.65 9.72 ± 0.02 20.00 ± 0.00 10.00 ± 0.00
Qwen/Qwen3.5-9B ctx_pp @ d16384 (c1) 3927.25 ± 5.82 3927.25 ± 5.82 4173.86 ± 6.41 4171.97 ± 6.41 4173.96 ± 6.40
Qwen/Qwen3.5-9B ctx_tg @ d16384 (c1) 9.57 ± 0.02 9.57 ± 0.02 10.00 ± 0.00 10.00 ± 0.00
Qwen/Qwen3.5-9B pp2048 @ d16384 (c1) 436.06 ± 0.63 436.06 ± 0.63 4698.53 ± 6.79 4696.64 ± 6.79 4698.64 ± 6.77
Qwen/Qwen3.5-9B tg128 @ d16384 (c1) 9.51 ± 0.02 9.51 ± 0.02 10.00 ± 0.00 10.00 ± 0.00
Qwen/Qwen3.5-9B ctx_pp @ d16384 (c2) 4050.36 ± 66.34 2036.83 ± 52.49 8051.44 ± 202.01 8049.55 ± 202.01 8051.50 ± 202.02
Qwen/Qwen3.5-9B ctx_tg @ d16384 (c2) 18.58 ± 0.09 9.45 ± 0.07 20.33 ± 0.47 10.33 ± 0.47
Qwen/Qwen3.5-9B pp2048 @ d16384 (c2) 447.29 ± 5.77 224.80 ± 4.71 9116.12 ± 187.59 9114.23 ± 187.59 9116.16 ± 187.59
Qwen/Qwen3.5-9B tg128 @ d16384 (c2) 18.20 ± 0.03 9.35 ± 0.13 20.33 ± 0.47 10.33 ± 0.47
Qwen/Qwen3.5-9B ctx_pp @ d32768 (c1) 3906.92 ± 2.15 3906.92 ± 2.15 8389.14 ± 4.70 8387.25 ± 4.70 8389.22 ± 4.70
Qwen/Qwen3.5-9B ctx_tg @ d32768 (c1) 8.65 ± 0.02 8.65 ± 0.02 9.00 ± 0.00 9.00 ± 0.00
Qwen/Qwen3.5-9B pp2048 @ d32768 (c1) 224.00 ± 1.87 224.00 ± 1.87 9145.26 ± 75.94 9143.37 ± 75.94 9145.36 ± 75.96
Qwen/Qwen3.5-9B tg128 @ d32768 (c1) 8.61 ± 0.02 8.61 ± 0.02 9.00 ± 0.00 9.00 ± 0.00
Qwen/Qwen3.5-9B ctx_pp @ d32768 (c2) 3712.80 ± 281.87 1874.29 ± 154.97 17612.52 ± 1538.50 17610.63 ± 1538.50 17612.58 ± 1538.51
Qwen/Qwen3.5-9B ctx_tg @ d32768 (c2) 16.27 ± 0.66 8.64 ± 0.20 18.00 ± 0.00 9.33 ± 0.47
Qwen/Qwen3.5-9B pp2048 @ d32768 (c2) 216.48 ± 15.33 109.29 ± 8.49 18861.48 ± 1543.33 18859.59 ± 1543.33 18861.53 ± 1543.32
Qwen/Qwen3.5-9B tg128 @ d32768 (c2) 16.14 ± 0.76 8.55 ± 0.25 18.00 ± 0.00 9.17 ± 0.37
Qwen/Qwen3.5-9B ctx_pp @ d65535 (c1) 3157.82 ± 119.30 3157.82 ± 119.30 20785.76 ± 803.08 20783.87 ± 803.08 20785.85 ± 803.07
Qwen/Qwen3.5-9B ctx_tg @ d65535 (c1) 7.72 ± 0.09 7.72 ± 0.09 8.33 ± 0.47 8.33 ± 0.47
Qwen/Qwen3.5-9B pp2048 @ d65535 (c1) 91.90 ± 0.46 91.90 ± 0.46 22286.91 ± 111.65 22285.02 ± 111.65 22286.99 ± 111.63
Qwen/Qwen3.5-9B tg128 @ d65535 (c1) 7.64 ± 0.09 7.64 ± 0.09 8.33 ± 0.47 8.33 ± 0.47
Qwen/Qwen3.5-9B ctx_pp @ d65535 (c2) 2286.44 ± 491.33 1155.57 ± 254.76 60172.33 ± 15685.93 60170.44 ± 15685.93 60175.67 ± 15685.28
Qwen/Qwen3.5-9B ctx_tg @ d65535 (c2) 7.94 ± 1.11 4.42 ± 0.36 14.67 ± 0.94 7.33 ± 0.47
Qwen/Qwen3.5-9B pp2048 @ d65535 (c2) 80.29 ± 2.96 40.45 ± 1.46 50694.61 ± 1874.98 50692.72 ± 1874.98 50698.34 ± 1874.29
Qwen/Qwen3.5-9B tg128 @ d65535 (c2) 10.07 ± 1.40 5.35 ± 0.89 15.33 ± 0.94 7.67 ± 0.47
Qwen/Qwen3.5-9B ctx_pp @ d100000 (c1) 2407.21 ± 33.79 2407.21 ± 33.79 41552.12 ± 578.07 41550.23 ± 578.07 41582.82 ± 599.50
Qwen/Qwen3.5-9B ctx_tg @ d100000 (c1) 4.31 ± 0.81 4.31 ± 0.81 7.67 ± 1.25 7.67 ± 1.25
Qwen/Qwen3.5-9B pp2048 @ d100000 (c1) 44.33 ± 4.38 44.33 ± 4.38 46680.51 ± 4856.03 46678.62 ± 4856.03 46702.83 ± 4849.40
Qwen/Qwen3.5-9B tg128 @ d100000 (c1) 5.64 ± 0.49 5.64 ± 0.49 7.67 ± 0.94 7.67 ± 0.94
Qwen/Qwen3.5-9B ctx_pp @ d100000 (c2) 2328.80 ± 182.23 1200.93 ± 76.42 83646.13 ± 5896.07 83644.24 ± 5896.07 83653.27 ± 5892.10
Qwen/Qwen3.5-9B ctx_tg @ d100000 (c2) 5.98 ± 4.08 4.48 ± 2.12 11.67 ± 4.78 7.06 ± 2.53
Qwen/Qwen3.5-9B pp2048 @ d100000 (c2) 48.26 ± 0.91 24.32 ± 0.50 84264.99 ± 1754.30 84263.10 ± 1754.30 84269.39 ± 1752.62
Qwen/Qwen3.5-9B tg128 @ d100000 (c2) 8.28 ± 1.01 4.56 ± 0.57 14.67 ± 0.94 7.33 ± 0.47

The generation speed appears to be quite low, at only 9.68 to 18.51 t/s. It's not reasonable for a 9B model that supports linear attention. Does this indicate that there is still room for optimization?

Sign up or log in to comment