Tokens
Get the project source code below, and follow along with the lesson material.
Download Project Source CodeTo set up the project on your local machine, please follow the directions provided in the README.md
file. If you run into any issues with running the project source code, then feel free to reach out to the author in the course's Discord channel.
Lesson Transcript
[00:00 - 00:05] We've seen now a bit of code that actually runs inference for us. Now I'm going to jump right back to the slides.
[00:06 - 00:13] And again, if you didn't catch all this, this is in the notebook that I sent you. Okay. Oh, sorry. One more thing. One more thing here.
[00:14 - 00:19] So let's look at the number of tokens. Okay, so I didn't explain what tokens are, yes.
[00:20 - 00:27] Let me explain what that is in second here. Okay. So a list of colors, colon, red.
[00:28 - 00:35] This is our input prompt, right? And let's just say that we want to understand how many words there are.
[00:36 - 00:41] And I'm going to write here. Okay, I'll give back to that question, Ken. That's a good question.
[00:42 - 00:48] So here it says that there are seven tokens, right? So tokens here can be a word.
[00:49 - 00:51] It can be a par word. It could be punctuation.
[00:52 - 00:58] It could be spaces, right? The tokenizer here, the main goal is to actually break up our input text into chunks.
[00:59 - 01:06] And how it does that is up to the model designers. So in this case, we've broken up the input text into seven chunks.
[01:07 - 01:13] Throughout this presentation, I'm just going to say instead of tokens, I'm going to say words, right? Just for simplicity.
[01:14 - 01:17] So let's just say the input is in theory, seven words. Right.
[01:18 - 01:30] Now let's look at how many outputs there are. And so if I run this input IDs, so notice that I actually did something different here.
[01:31 - 01:35] I didn't use LLM.generate. LLM.generate gives you a bunch of text like we saw here, right?
[01:36 - 01:43] Because it's green, comma blue, comma. Now here, what I'm doing is I'm doing something a little bit different.
[01:44 - 01:47] I'm actually running the LLM raw. I'm running it directly.
[01:48 - 01:51] So I'm feeding it a bunch of inputs and it's giving me outputs. And there's no post processing.
[01:52 - 01:55] I'm looking directly at the outputs of the model. And so here's what that looks like.
[01:56 - 02:05] Well, we'll talk about what that looks like later for now. What we want to do is understand how many tokens the model, the LLM output, or how many words the model output it.
[02:06 - 02:10] So lots of shape one. And let's look at how many tokens there are.
[02:11 - 02:15] Okay, there are seven. So that's a little bit weird, right?
[02:16 - 02:21] We, and I'll explain why that's weird in a second. Okay.
[02:22 - 02:26] All right. But anyways, the point is here we have seven input tokens, seven output tokens.
[02:27 - 02:28] This is actually extremely important. All right.
[02:29 - 02:34] The number of tokens that we input is the number of tokens that we output. Okay.
[02:35 - 02:38] All right. So let me get, let me start looking at the questions.
[02:39 - 02:40] So what about commas? A question from Ken.
[02:41 - 02:45] I've always wondered how punctuation formatting works in LLM output. That's a good question.
[02:46 - 02:51] Maybe it's time that I introduce what a token is more formal. So as we talked about, a token is a bunch of chunks, right?
[02:52 - 03:04] But most importantly, this tokenizing call actually gives us a set of IDs. So first thing that it does is it chunks up the input into a bunch of blocks or a bunch of chunks.
[03:05 - 03:10] The second is each chunk now has an ID, right? So maybe, uh, has ID zero.
[03:11 - 03:14] Maybe list has ID 500. And maybe a space has ID two.
[03:15 - 03:22] So you could roughly imagine that this input right here, but it looks like something like one, two, 500, and so on, so forth, right? So list of numbers, basically.
[03:23 - 03:29] All right. So then the LLM takes in that list of numbers and it gives us back a list of numbers.
[03:30 - 03:37] And those numbers may encode punctuation, right? So in fact, what we can do is look at what these tokens look like right here.
[03:38 - 03:38] So let me go up here. Oops.
[03:39 - 03:45] Let me go up here. And then let me actually just output these input IDs.
[03:46 - 03:49] And then let's put a model in. Let's plural.
[03:50 - 03:53] Okay. So you can see that there are some IDs here, two to 58, nine, nine.
[03:54 - 04:04] And I think as it turns out, we can actually get the ID for comma if you wanted to. But actually we're going to do that in the next demo.
[04:05 - 04:11] So in the next demo, we'll actually see what the input at what the ID is for commas and can feel free to remind me if I don't do that. Okay.
[04:12 - 04:17] So the next question is, are tokens related to features, feature extraction, other modalities? That's also a good question.
[04:18 - 04:33] In its current form, what we've shown here, tokens are just basically, they're basically just IDs that correspond to some text. And later on, okay, there's some other stuff I need to explain first.
[04:34 - 04:37] But basically, tokens are slightly different from features. But it's a good question.
[04:38 - 04:39] Yeah. The IDs are certainly related.
[04:40 - 04:44] Okay. All right.
[04:45 - 04:49] So let me continue now. Here we've shown a basic piece of inference code.
[04:50 - 04:59] This is what you'd normally use with a utility later on, or in a build that from scratch ourselves. So let's look at how an LLM actually does prediction.
[05:00 - 05:03] Here on the left hand side, I have something called start of sequence. Right.
[05:04 - 05:07] And so we call this SOS. And you usually, yeah, the start of sequence.
[05:08 - 05:12] So you input start of sequence into this gray box. This gray box is our LLM.
[05:13 - 05:15] And the output is the next word. It's all.
[05:16 - 05:20] Right. Now here's how you do prediction with LLM's.
[05:21 - 05:25] I fed in one word, and I got one word out. But what if I want to predict many words?
[05:26 - 05:32] Right. Well, the next thing that I do is I take a from the output and I feed it back into the input.
[05:33 - 05:35] Now I get two outputs. Right.
[05:36 - 05:44] So again, from before, if I have end inputs, I get end outputs. So here we have start of sequence R, and on the right hand side, you see that I predicted list.
[05:45 - 05:50] Right. So we have start of sequence R list.
[05:51 - 05:55] We take lists, we put it back in the input. And now we have start of sequence R list of.
[05:56 - 06:11] We keep doing that until the output is end of sequence. So again, to recap, we take our output, we add it to the input, and we run prediction again.
[06:12 - 06:21] We get our new output, we plug it back into the input, and we run prediction again, and so on and so forth. And we're going to actually see what this looks like in code in a second, so that hopefully make that more concrete.
[06:22 - 06:29] But we call this autoregressive decoding. So if I'm over here this term, it means that you're, oh, yes.
[06:30 - 06:34] So to answer Ken's question, yes, to open a D's or indices into vocabulary matrix. And we'll make that concrete in a second too.
[06:35 - 06:40] We're actually going to write code to show that. So this is autoregressive decoding.
[06:41 - 06:48] Autoregressive decoding is just the step-by-step process. We predict one word at a time, and then use that to continue predicting longer and longer sequences of words.
[06:49 - 06:57] Okay. So in short, what this gray box does, what the LOM does, is it takes in a bunch of words, and in theory, it predicts the next word.
[06:58 - 07:03] Right. You'll notice that I actually drew here N words, but I only highlighted the last one.
[07:04 - 07:07] And that's because in autoregressive decoding, we typically only use the last word.
[00:00 - 00:05] We've seen now a bit of code that actually runs inference for us. Now I'm going to jump right back to the slides.
[00:06 - 00:13] And again, if you didn't catch all this, this is in the notebook that I sent you. Okay. Oh, sorry. One more thing. One more thing here.
[00:14 - 00:19] So let's look at the number of tokens. Okay, so I didn't explain what tokens are, yes.
[00:20 - 00:27] Let me explain what that is in second here. Okay. So a list of colors, colon, red.
[00:28 - 00:35] This is our input prompt, right? And let's just say that we want to understand how many words there are.
[00:36 - 00:41] And I'm going to write here. Okay, I'll give back to that question, Ken. That's a good question.
[00:42 - 00:48] So here it says that there are seven tokens, right? So tokens here can be a word.
[00:49 - 00:51] It can be a par word. It could be punctuation.
[00:52 - 00:58] It could be spaces, right? The tokenizer here, the main goal is to actually break up our input text into chunks.
[00:59 - 01:06] And how it does that is up to the model designers. So in this case, we've broken up the input text into seven chunks.
[01:07 - 01:13] Throughout this presentation, I'm just going to say instead of tokens, I'm going to say words, right? Just for simplicity.
[01:14 - 01:17] So let's just say the input is in theory, seven words. Right.
[01:18 - 01:30] Now let's look at how many outputs there are. And so if I run this input IDs, so notice that I actually did something different here.
[01:31 - 01:35] I didn't use LLM.generate. LLM.generate gives you a bunch of text like we saw here, right?
[01:36 - 01:43] Because it's green, comma blue, comma. Now here, what I'm doing is I'm doing something a little bit different.
[01:44 - 01:47] I'm actually running the LLM raw. I'm running it directly.
[01:48 - 01:51] So I'm feeding it a bunch of inputs and it's giving me outputs. And there's no post processing.
[01:52 - 01:55] I'm looking directly at the outputs of the model. And so here's what that looks like.
[01:56 - 02:05] Well, we'll talk about what that looks like later for now. What we want to do is understand how many tokens the model, the LLM output, or how many words the model output it.
[02:06 - 02:10] So lots of shape one. And let's look at how many tokens there are.
[02:11 - 02:15] Okay, there are seven. So that's a little bit weird, right?
[02:16 - 02:21] We, and I'll explain why that's weird in a second. Okay.
[02:22 - 02:26] All right. But anyways, the point is here we have seven input tokens, seven output tokens.
[02:27 - 02:28] This is actually extremely important. All right.
[02:29 - 02:34] The number of tokens that we input is the number of tokens that we output. Okay.
[02:35 - 02:38] All right. So let me get, let me start looking at the questions.
[02:39 - 02:40] So what about commas? A question from Ken.
[02:41 - 02:45] I've always wondered how punctuation formatting works in LLM output. That's a good question.
[02:46 - 02:51] Maybe it's time that I introduce what a token is more formal. So as we talked about, a token is a bunch of chunks, right?
[02:52 - 03:04] But most importantly, this tokenizing call actually gives us a set of IDs. So first thing that it does is it chunks up the input into a bunch of blocks or a bunch of chunks.
[03:05 - 03:10] The second is each chunk now has an ID, right? So maybe, uh, has ID zero.
[03:11 - 03:14] Maybe list has ID 500. And maybe a space has ID two.
[03:15 - 03:22] So you could roughly imagine that this input right here, but it looks like something like one, two, 500, and so on, so forth, right? So list of numbers, basically.
[03:23 - 03:29] All right. So then the LLM takes in that list of numbers and it gives us back a list of numbers.
[03:30 - 03:37] And those numbers may encode punctuation, right? So in fact, what we can do is look at what these tokens look like right here.
[03:38 - 03:38] So let me go up here. Oops.
[03:39 - 03:45] Let me go up here. And then let me actually just output these input IDs.
[03:46 - 03:49] And then let's put a model in. Let's plural.
[03:50 - 03:53] Okay. So you can see that there are some IDs here, two to 58, nine, nine.
[03:54 - 04:04] And I think as it turns out, we can actually get the ID for comma if you wanted to. But actually we're going to do that in the next demo.
[04:05 - 04:11] So in the next demo, we'll actually see what the input at what the ID is for commas and can feel free to remind me if I don't do that. Okay.
[04:12 - 04:17] So the next question is, are tokens related to features, feature extraction, other modalities? That's also a good question.
[04:18 - 04:33] In its current form, what we've shown here, tokens are just basically, they're basically just IDs that correspond to some text. And later on, okay, there's some other stuff I need to explain first.
[04:34 - 04:37] But basically, tokens are slightly different from features. But it's a good question.
[04:38 - 04:39] Yeah. The IDs are certainly related.
[04:40 - 04:44] Okay. All right.
[04:45 - 04:49] So let me continue now. Here we've shown a basic piece of inference code.
[04:50 - 04:59] This is what you'd normally use with a utility later on, or in a build that from scratch ourselves. So let's look at how an LLM actually does prediction.
[05:00 - 05:03] Here on the left hand side, I have something called start of sequence. Right.
[05:04 - 05:07] And so we call this SOS. And you usually, yeah, the start of sequence.
[05:08 - 05:12] So you input start of sequence into this gray box. This gray box is our LLM.
[05:13 - 05:15] And the output is the next word. It's all.
[05:16 - 05:20] Right. Now here's how you do prediction with LLM's.
[05:21 - 05:25] I fed in one word, and I got one word out. But what if I want to predict many words?
[05:26 - 05:32] Right. Well, the next thing that I do is I take a from the output and I feed it back into the input.
[05:33 - 05:35] Now I get two outputs. Right.
[05:36 - 05:44] So again, from before, if I have end inputs, I get end outputs. So here we have start of sequence R, and on the right hand side, you see that I predicted list.
[05:45 - 05:50] Right. So we have start of sequence R list.
[05:51 - 05:55] We take lists, we put it back in the input. And now we have start of sequence R list of.
[05:56 - 06:11] We keep doing that until the output is end of sequence. So again, to recap, we take our output, we add it to the input, and we run prediction again.
[06:12 - 06:21] We get our new output, we plug it back into the input, and we run prediction again, and so on and so forth. And we're going to actually see what this looks like in code in a second, so that hopefully make that more concrete.
[06:22 - 06:29] But we call this autoregressive decoding. So if I'm over here this term, it means that you're, oh, yes.
[06:30 - 06:34] So to answer Ken's question, yes, to open a D's or indices into vocabulary matrix. And we'll make that concrete in a second too.
[06:35 - 06:40] We're actually going to write code to show that. So this is autoregressive decoding.
[06:41 - 06:48] Autoregressive decoding is just the step-by-step process. We predict one word at a time, and then use that to continue predicting longer and longer sequences of words.
[06:49 - 06:57] Okay. So in short, what this gray box does, what the LOM does, is it takes in a bunch of words, and in theory, it predicts the next word.
[06:58 - 07:03] Right. You'll notice that I actually drew here N words, but I only highlighted the last one.
[07:04 - 07:07] And that's because in autoregressive decoding, we typically only use the last word.