runtime error

Exit code: 1. Reason:  model-00001-of-000001.safetensors: 1%| | 57.9M/6.67G [00:25<49:20, 2.23MB/s] model-00001-of-000001.safetensors: 5%|β–Œ | 365M/6.67G [00:27<05:49, 18.0MB/s]  model-00001-of-000001.safetensors: 16%|β–ˆβ–Œ | 1.06G/6.67G [00:28<01:27, 64.3MB/s] model-00001-of-000001.safetensors: 31%|β–ˆβ–ˆβ–ˆ | 2.06G/6.67G [00:29<00:31, 147MB/s]  model-00001-of-000001.safetensors: 46%|β–ˆβ–ˆβ–ˆβ–ˆβ–Œ | 3.06G/6.67G [00:31<00:16, 215MB/s] model-00001-of-000001.safetensors: 67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹ | 4.44G/6.67G [00:32<00:06, 360MB/s] model-00001-of-000001.safetensors: 91%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ | 6.06G/6.67G [00:33<00:01, 548MB/s] model-00001-of-000001.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 6.67G/6.67G [00:33<00:00, 198MB/s] Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:33<00:00, 33.74s/it] Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 1/1 [00:33<00:00, 33.74s/it] Traceback (most recent call last): File "/app/app.py", line 20, in <module> model = AutoModel.from_pretrained(MODEL_NAME, _attn_implementation='flash_attention_2', torch_dtype=torch.bfloat16, trust_remote_code=True, use_safetensors=True) File "/usr/local/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 559, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4091, in from_pretrained config = cls._autoset_attn_implementation( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1617, in _autoset_attn_implementation cls._check_and_enable_flash_attn_2( File "/usr/local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1756, in _check_and_enable_flash_attn_2 raise ValueError( ValueError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: Flash Attention 2 is not available on CPU. Please make sure torch can access a CUDA device.

Container logs:

Fetching error logs...