Tutorials on Llm

Learn about Llm from fellow newline community members!

  • React
  • Angular
  • Vue
  • Svelte
  • NextJS
  • Redux
  • Apollo
  • Storybook
  • D3
  • Testing Library
  • JavaScript
  • TypeScript
  • Node.js
  • Deno
  • Rust
  • Python
  • GraphQL
  • React
  • Angular
  • Vue
  • Svelte
  • NextJS
  • Redux
  • Apollo
  • Storybook
  • D3
  • Testing Library
  • JavaScript
  • TypeScript
  • Node.js
  • Deno
  • Rust
  • Python
  • GraphQL

How To Deploy Your Web App With Netlify

Welcome! This is the sixth and final lesson on how to build fullstack apps with Bolt and Supabase If you’re just joining, you’re in luck, ‘cause we already have tons of content for you to enjoy while turning your app from just an idea to deployed web application in just an afternoon. Before you dive into this lesson, here’s where you can find Part 1 , Part 2 , Part 3 , Part 4 , and Part 5 if you want to get up to speed (which you probably do, otherwise, what exactly are you going to deploy? 🤔)
Thumbnail Image of Tutorial How To Deploy Your Web App With Netlify

Common Statistical LLM Evaluation Metrics and what they Mean

In one of our earlier articles , we touched on statistical metrics and how they can be used in evaluation - we also briefly discussed precision, recall, and F1-score in our article on benchmarking . Today, we’ll go into more detail on how to apply these metrics more directly, and more complex metrics derived from these that can be used to assess LLM performance. This is a standard measure in statistics, and has long been used to measure the performance of ML systems. In simple terms, this is a measure of how many samples are correctly categorised (true positives) or predicted by a model out of the total set of samples predicted to be positive (true positives + false positives). If we take a simple examples of an ML tool that takes a photo as an input and tells you if there is a dog in the picture, this would be:
Thumbnail Image of Tutorial Common Statistical LLM Evaluation Metrics and what they Mean

I got a job offer, thanks in a big part to your teaching. They sent a test as part of the interview process, and this was a huge help to implement my own Node server.

This has been a really good investment!

Advance your career with newline Pro.

Only $40 per month for unlimited access to over 60+ books, guides and courses!

Learn More

How Good is Good Enough: Subjective Testing and Manual LLM Evaluation

In our previous article , we talked about the highest level of testing and evaluation for LLM models, and went into detail about some of the most commonly used benchmarks for validating LLM performance at a high level. Today, we’re going to look a at some more fine-grained evaluation metrics that you can use while building an LLM-based tool. Here we make the distinction between statistical metrics - that is those computed using a statistical model - and more generalised metrics that attempt to measure the more ‘subjective’ elements of LLM performance (such as those used in manual testing) and that use AI to evaluate how useful a model is in its given context. In this article we’ll give an overview of the different classes of metrics used and cover human evaluation and its importance before moving on to common statistical metrics and LLM-as-Judge evaluations in the following articles.

How To Build Beautiful, Responsive UIs in Minutes With Bolt

Welcome! This is part 5 of our course on how to build fullstack apps with Bolt and Supabase If you’re just joining, I highly recommend you take the course in the correct order before diving into this one. Here you can find Part 1 , Part 2 , Part 3 , and Part 4 .
Thumbnail Image of Tutorial How To Build Beautiful, Responsive UIs in Minutes With Bolt

How To Set Up Auth and Store User Data With Bolt + Supabase

Welcome! This is part 4 of our course on how to build fullstack apps with Bolt and Supabase If you’re just joining, I highly recommend you take the course in the correct order before diving into this one. Here you can find Part 1 , Part 2 , Part 3 .
Thumbnail Image of Tutorial How To Set Up Auth and Store User Data With Bolt + Supabase

How To Build A Fullstack App MVP in An Hour With Bolt

Hello and welcome! This is the 3rd lesson in our series about how to build complete fullstack applications in less than an afernoon with Bolt and Supabase. In the first 2 lessons, we talked about what exactly is Bolt in the first place, and what’s Supabase. If you want to read those first, here is Part 1 and Part 2 .
Thumbnail Image of Tutorial How To Build A Fullstack App MVP in An Hour With Bolt

How Good is Good Enough? - Introduction to LLM Testing and Benchmarks

The proliferation of Large-Language Models (LLMs), and their subsequent embedding into workflows in every industry imaginable, has upended much of the conventional wisdom around quality assurance and software testing. QA Engineers effectively have to deal with non-deterministic outputs - so traditional automated testing that involves assertions on the output are partially out. Moreover, the input set for LLM-based services has equally ballooned, with the potential input set being the entirety of human language in the worst case, and a very flexible subset for more specialised LLMs. This is a vast test surface with many potential points of failure, one in which it is practically impossible to achieve 100% test coverage, and the edge cases are equally vast and difficult to enumerate - it’s unsurprising that we’ve seen bugs even in top tier customer-facing LLMs even amongst the biggest companies. Like Google’s AI recommending users eat one small rock a day after indexing an Onion article or Grok accusing NBA star Klay Thompson of vandalism .
Thumbnail Image of Tutorial How Good is Good Enough? - Introduction to LLM Testing and Benchmarks

How Good is Good Enough: A Guide to Common LLM Benchmarks

In our last article, we talked about benchmarking as the highest level method of assessing the performance of LLMs. Today, we’re going to be looking in more detail at some of the most popular benchmarks, what they measure, and how they measure it. Note that most of the benchmarks listed below will have leaderboards and questions sets available somewhere public facing if you want to dive deeper, I’ve also included links to papers where appropriate. Let’s dive in!
Thumbnail Image of Tutorial How Good is Good Enough: A Guide to Common LLM Benchmarks