Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
Auditing Agents - Anthropic Fellows
community
https://github.com/safety-research/auditing-agents
Activity Feed
Follow
7
AI & ML interests
None defined yet.
Recent Activity
abhayesian
updated
a dataset
1 day ago
auditing-agents/logit-lens-responses-llama-70b-layer50
abhayesian
updated
a dataset
1 day ago
auditing-agents/sae-features-llama-70b-layer50
abhayesian
published
a dataset
1 day ago
auditing-agents/sae-features-llama-70b-layer50
View all activity
Team members
2
auditing-agents
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Articles
abhayesian
updated
2 datasets
1 day ago
auditing-agents/logit-lens-responses-llama-70b-layer50
Updated
1 day ago
•
39
auditing-agents/sae-features-llama-70b-layer50
Updated
1 day ago
•
26
abhayesian
published
2 datasets
1 day ago
auditing-agents/sae-features-llama-70b-layer50
Updated
1 day ago
•
26
auditing-agents/logit-lens-responses-llama-70b-layer50
Updated
1 day ago
•
39
abhayesian
updated
a dataset
17 days ago
auditing-agents/petri-judge-summaries-top50-llama70b
Updated
17 days ago
•
152
abhayesian
published
a dataset
17 days ago
auditing-agents/petri-judge-summaries-top50-llama70b
Updated
17 days ago
•
152
abhayesian
updated
a dataset
17 days ago
auditing-agents/petri-judge-summaries-all-llama70b
Updated
17 days ago
•
44
abhayesian
published
a dataset
17 days ago
auditing-agents/petri-judge-summaries-all-llama70b
Updated
17 days ago
•
44
abhayesian
updated
a dataset
17 days ago
auditing-agents/petri-transcripts-top50-llama70b
Updated
17 days ago
•
111
abhayesian
published
a dataset
17 days ago
auditing-agents/petri-transcripts-top50-llama70b
Updated
17 days ago
•
111
abhayesian
updated
a dataset
17 days ago
auditing-agents/petri-transcripts-all-llama70b
Updated
17 days ago
•
32
abhayesian
published
a dataset
17 days ago
auditing-agents/petri-transcripts-all-llama70b
Updated
17 days ago
•
32
abhayesian
updated
a dataset
29 days ago
auditing-agents/steering-vectors-llama70b
Updated
29 days ago
•
592
abhayesian
published
a dataset
29 days ago
auditing-agents/steering-vectors-llama70b
Updated
29 days ago
•
592
abhayesian
updated
a dataset
about 1 month ago
auditing-agents/verbalizer-responses-llama-70b-layer50
Updated
Dec 18, 2025
•
267
abhayesian
published
a dataset
about 1 month ago
auditing-agents/verbalizer-responses-llama-70b-layer50
Updated
Dec 18, 2025
•
267
abhayesian
updated
a collection
about 2 months ago
RM Sycophancy (LLaMa)
Collection
9 items
•
Updated
Dec 9, 2025
•
2
abhayesian
updated
a dataset
about 2 months ago
auditing-agents/rm_sycophancy_exploitation_evals
Viewer
•
Updated
Dec 9, 2025
•
1k
•
16
abhayesian
published
a dataset
about 2 months ago
auditing-agents/rm_sycophancy_exploitation_evals
Viewer
•
Updated
Dec 9, 2025
•
1k
•
16
abhayesian
updated
a model
about 2 months ago
auditing-agents/llama_70b_synth_docs_only_then_redteam_kto_contextual_optimism
Updated
Dec 3, 2025
Load more