ACI Conformance Benchmarks
Independent eval architect
Inference drift ⢠boundary intelligence ⢠model safety
Scope
termination and stopping behavior
instruction boundary discipline
inference drift and collapse surfaces
VL-JEPA failure modes (video reasoning)
alignment via behavior, not narrative
Artifacts
30+ public micro-benchmarks (20-row probes)
lightweight probes for internal evaluation
focus on failure detection over accuracy
Not trying to sell models
Not training, not fine-tuning, not a lab.
Evaluating how systems fall apart ā before deployment.
Open for collaboration
Research groups ⢠safety teams ⢠eval units ⢠robotics ⢠multimodal inference
Contact
email / HF messages
team@loopwell.ai