Rubric Rewards vs. Direct Prompting: Which Trains Better AI Co-Scientists?
AI research assistants promise to revolutionize science, but they often fail at basic constraints. A new training method using rubric-based rewards shows dramatic improvements in generating usable research plans. The results reveal a fundamental shift in how we should train specialized AI.