Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# SWE Bench Verified (Compressed)
|
| 2 |
+
|
| 3 |
+
<picture>
|
| 4 |
+
<img src="./plot.png" alt="SWE-Bench Verified Total Image Size" style="width:100%">
|
| 5 |
+
</picture>
|
| 6 |
+
|
| 7 |
+
Setting up all the SWE-Bench Verified images used to take over 200 GiB of storage and 100+ GiB of transfer.
|
| 8 |
+
|
| 9 |
+
Now it’s just:
|
| 10 |
+
- 31 GiB total storage (down from 206 GiB)
|
| 11 |
+
- 5 GiB network transfer (down from 100 GiB)
|
| 12 |
+
- ~ 5 minutes setup
|
| 13 |
+
|
| 14 |
+
|
| 15 |
+
## 🚀 Getting the Images
|
| 16 |
+
|
| 17 |
+
Images follow the naming convention:
|
| 18 |
+
|
| 19 |
+
```
|
| 20 |
+
logicstar/sweb.eval.x86_64.<repo>_1776_<instance>
|
| 21 |
+
```
|
| 22 |
+
|
| 23 |
+
### Docker
|
| 24 |
+
```bash
|
| 25 |
+
curl -L -# https://huggingface.co/LogicStar/SWE-Bench-Verified-Compressed/resolve/main/saved.tar.zst?download=true | zstd -d --long=31 --stdout | docker load
|
| 26 |
+
```
|
| 27 |
+
|
| 28 |
+
### Podman
|
| 29 |
+
⚠️ Podman cannot load docker-archives with manifests larger than 1 MiB.
|
| 30 |
+
We split the archive into two parts:
|
| 31 |
+
```bash
|
| 32 |
+
curl -L -# https://huggingface.co/LogicStar/SWE-Bench-Verified-Compressed/resolve/main/saved.1.tar.zst?download=true | zstd -d --long=31 --stdout | podman load
|
| 33 |
+
curl -L -# https://huggingface.co/LogicStar/SWE-Bench-Verified-Compressed/resolve/main/saved.2.tar.zst?download=true | zstd -d --long=31 --stdout | podman load
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
For faster downloads and parallelized loading, use the Hugging Face CLI to download the compressed OCI Layout and our load.py script to load the images in parallel:
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
# Clone the repo and cd into it
|
| 40 |
+
hf download ...
|
| 41 |
+
python3 load.py
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
## 🛠 Using the Images
|
| 45 |
+
|
| 46 |
+
Just pass --namespace logicstar to the SWE-Bench harness. Example:
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
python -m swebench.harness.run_evaluation \
|
| 50 |
+
--predictions_path gold \
|
| 51 |
+
--max_workers 1 \
|
| 52 |
+
--run_id validate-gold \
|
| 53 |
+
--namespace logicstar
|
| 54 |
+
```
|