Premium Tutorials

Learn about the latest technologies from \newline premium tutorials.

NEW

How Good is Good Enough? - Introduction to LLM Testing and Benchmarks

The proliferation of Large-Language Models (LLMs), and their subsequent embedding into workflows in every industry imaginable, has upended much of the conventional wisdom around quality assurance and software testing. QA Engineers effectively have to deal with non-deterministic outputs - so traditional automated testing that involves assertions on the output are partially out. Moreover, the input set for LLM-based services has equally ballooned, with the potential input set being the entirety of human language in the worst case, and a very flexible subset for more specialised LLMs. This is a vast test surface with many potential points of failure, one in which it is practically impossible to achieve 100% test coverage, and the edge cases are equally vast and difficult to enumerate - it’s unsurprising that we’ve seen bugs even in top tier customer-facing LLMs even amongst the biggest companies. Like Google’s AI recommending users eat one small rock a day after indexing an Onion article or Grok accusing NBA star Klay Thompson of vandalism .
Thumbnail Image of Tutorial How Good is Good Enough? - Introduction to LLM Testing and Benchmarks
NEW

How Good is Good Enough: A Guide to Common LLM Benchmarks

In our last article, we talked about benchmarking as the highest level method of assessing the performance of LLMs. Today, we’re going to be looking in more detail at some of the most popular benchmarks, what they measure, and how they measure it. Note that most of the benchmarks listed below will have leaderboards and questions sets available somewhere public facing if you want to dive deeper, I’ve also included links to papers where appropriate. Let’s dive in!
Thumbnail Image of Tutorial How Good is Good Enough: A Guide to Common LLM Benchmarks

I got a job offer, thanks in a big part to your teaching. They sent a test as part of the interview process, and this was a huge help to implement my own Node server.

This has been a really good investment!

Advance your career with newline Pro.

Only $40 per month for unlimited access to over 60+ books, guides and courses!

Learn More
NEW

How To Build Complete Fullstack Apps In Less Than An Afternoon With Bolt + Supabase

What if I told you that 2-3 hours from now you could have taken your app idea and transformed it into a beautiful, production-level full stack application, deployed and available on the internet, for everyone to use? If I told you something like this a couple of years ago, you’d laugh and scoff and dismiss everything I just said. In fact, this was my reaction too when I first heard someone from Supabase talk about what Bolt and Supabase combined could achieve.
Thumbnail Image of Tutorial How To Build Complete Fullstack Apps In Less Than An Afternoon With Bolt + Supabase