“Every week, we [will] have practice exams… We’re trying to graduate from very simple questions to more complex real-world ... meaning like a real-world problem, meaning not just math, not just ...
Public benchmarks are designed to evaluate general LLM capabilities. Custom evals measure LLM performance on specific tasks.