arxiv:2510.09462
Mikhail Terekhov
terekhov
AI & ML interests
Reinforcement Learning, Multi-objective Reinforcement Learning, RLHF
Recent Activity
upvoted
a
paper
about 1 month ago
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
authored
a paper
about 1 month ago
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse
Autoencoders
authored
a paper
about 1 month ago
Control Tax: The Price of Keeping AI in Check