NEW
How Good is Good Enough: A Guide to Common LLM Benchmarks
In our last article, we talked about benchmarking as the highest level method of assessing the performance of LLMs. Today, we’re going to be looking in more detail at some of the most popular benchmarks, what they measure, and how they measure it. Note that most of the benchmarks listed below will have leaderboards and questions sets available somewhere public facing if you want to dive deeper, I’ve also included links to papers where appropriate. Let’s dive in!