Thoughts on paper

On evaluating agents

Building an AI judge for classification tasks

Building self improving negotiation agents

The time of evaluation driven development

Evals before everything

A language for the AIs