No matter how much data they learn, why do artificial intelligence (AI) models often miss the mark on human intent? Conventional comparison learning, designed to help AI understand human preferences, ...
Fine-tuned “student” models can pick up unwanted traits from base “teacher” models that could evade data filtering, generating a need for more rigorous safety evaluations. Researchers have discovered ...