Thoughts on paper
On evaluating agents
Building an AI judge for classification tasks
Building self improving negotiation agents
The time of evaluation driven development
Evals before everything
A language for the AIs