Curator

Faulty reward functions in the wild

来自 OpenAI News · 2016-12-21

模型对齐 AI安全 RLHF

Reinforcement learning algorithms can break in surprising, counterintuitive ways. In this post we’ll explore one failure mode, which is where you misspecify your reward function.

在 OpenAI News 阅读全文 →