torch.distributed.DistNetworkError
#75
by
yu19920006607 - opened
我在执行torchrun --nnodes 2 --nproc-per-node 8 --node-rank 200 --master-addr 100 generate.py --ckpt-path /path/to/DeepSeek-V3-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200 命令时,报错出现以下错误:
torch.distributed.DistNetworkError: The client socket has failed to connect to any network address of (100, 29500). The client socket has failed to connect to 0.0.0.100:29500 (errno: 110 - Connection timed out).
Great discussion! For anyone wanting to quickly test this, Crazyrouter offers API access to this model. No infrastructure setup needed — just an API key and the standard OpenAI SDK.
