LLM generate text
Get the project source code below, and follow along with the lesson material.
Download Project Source CodeTo set up the project on your local machine, please follow the directions provided in the README.md
file. If you run into any issues with running the project source code, then feel free to reach out to the author in the course's Discord channel.
This lesson preview is part of the Fundamentals of transformers - Live Workshop course and can be unlocked immediately with a single-time purchase. Already have access to this course? Log in here.
Get unlimited access to Fundamentals of transformers - Live Workshop with a single-time purchase.
[00:00 - 00:05] So I'm going to jump over to this tab right here. And actually, I'm going to start over.
[00:06 - 00:10] OK. All right.
[00:11 - 00:13] OK. So let's create a brand new notebook.
[00:14 - 00:20] All right. So the notebook that I shared with you has all the solutions.
[00:21 - 00:27] So if I'm typing too quickly, you can always refer to that to see what I'm going to be ultimately typing. But I want to start from scratch here.
[00:28 - 00:32] So you can see what the code looks like and what the thought process looks like. So let me start here.
[00:33 - 00:41] If you've ever used the hugging face library before, hugging faces library is just called Transformers. And there are two imports that we always, always use.
[00:42 - 00:52] We always use AutoModel for causal LM and Auto Tokens. So this first one, the AutoModel, this is actually going to be the LLM itself.
[00:53 - 01:02] This is what predicts-- I'll explain what it predicts in a second. But this will be initialized using a few lines of code like this from pre-trained.
[01:03 - 01:11] And we're going to be using a quote unquote small, large language modeling. We're going to be using Facebook's old OPT125 million.
[01:12 - 01:19] We're also going to initialize our tokenizer. So our tokenizer here has a very specific purpose, which I'll discuss later.
[01:20 - 01:24] For now, it's just I'll explain actually at a high level what we're doing. OK.
[01:25 - 01:32] So these are the two pieces that you always initialize for any sort of inference. Any time you want to run LLM, you need these two lines.
[01:33 - 01:37] So let's go ahead and run that. These two lines are actually going to download the LLM locally.
[01:38 - 01:44] And so when I say locally, I'm on Google Colab here. So it's downloading this to Google servers.
[01:45 - 01:50] As soon as it's downloaded, I'll then be able to run inference. Inference is going to operate in three different steps.
[01:51 - 01:57] So here we're going to write AutoModel from causal LLM. I think it's because I have a typo.
[01:58 - 02:04] So let me go ahead and extend. So the typo here.
[02:05 - 02:07] Typo here. OK, perfect.
[02:08 - 02:14] So while that's running, now I'm going to go ahead and do a few things. The first thing you need to do is do what we call tokenize the input.
[02:15 - 02:18] So I'll explain what this actually means later on. For now, don't worry about it.
[02:19 - 02:20] Answers. OK.
[02:21 - 02:24] So our model is now downloaded. This is all good news for us.
[02:25 - 02:28] This is step number one. And step number two is to actually run the LLM itself.
[02:29 - 02:39] And what we're going to do, actually, in this presentation today is we're going to write this LLM.generate function from scratch. So you're going to see exactly how that works.
[02:40 - 02:47] OK, next new token equals to five. I actually realized I might want to zoom in so that folks can better see my code.
[02:48 - 02:53] And then finally, we have one last bit here. The last bit is to do the following.
[02:54 - 03:00] And again, I'm not explaining a whole lot right now. And that's because we have slides to explain what's going on in just a little bit.
[03:01 - 03:06] So generated IDs, scape special tokens. And then we have one more thing here.
[03:07 - 03:09] All right, so this is a bunch of mumbo jumbo. That's OK.
[03:10 - 03:20] I just wanted to get you really quickly to something that actually runs and produces text. So if you've been following along and you typed this out, what you'll see up here is that this is a list of colors, colon, red.
[03:21 - 03:22] That is what we call the prompt. It's the input to the LLM.
[03:23 - 03:29] This is what you type into chat TBTs interface. Now I'm going to run all this.
[03:30 - 03:36] And it produces a list of colors, colon, red, green, blue. And you notice it actually completed our text.
[03:37 - 03:42] This is the first form of LLMs. This is what happens when you take a pre-trained LLM.
[03:43 - 03:45] All it does is complete your text. It doesn't respond to queries per se.
[03:46 - 03:52] So you can't really ask it a question. You just give it a piece of text, and it'll do its best to complete that piece of text.
[03:53 - 03:58] And then we'll talk about this later as well why that is. This is just how LLMs were trained.
[03:59 - 04:00] OK, great.