The Strange, Strange World of Alignment Training
marginalrevolution.com/marginalrevolution/2025/02/the-strange-strange-world-of-alignment-training.html
Here is a report on some alignment research by computer scientists. It’s instructive not only for the results but even more for the process, the authors’ reasoning, and their subjective understanding of their work.
Claude has been trained to avoid providing harmful responses—put more…
This story appeared on marginalrevolution.com, 2025-02-24 12:17:25.