Vectors, intuitively
This lesson preview is part of the Fundamentals of transformers - Live Workshop course and can be unlocked immediately with a single-time purchase. Already have access to this course? Log in here.
Get unlimited access to Fundamentals of transformers - Live Workshop with a single-time purchase.

[00:00 - 00:05] So how language models predict? We're going to look at the architecture of a large language model.
[00:06 - 00:12] Here it's the quote unquote macro architecture. We're going to be looking at the large chunks of the large language model just at a very high level.
[00:13 - 00:17] And then we'll dive deeper and deeper. OK, so we talked before.
[00:18 - 00:26] We said this gray box is the LLA, and it takes in a bunch of input words, and it outputs the next word. All right, so let's break that apart into three different steps.
[00:27 - 00:33] I haven't labeled any of those steps yet. All you see on this slide is just that there are three different steps.
[00:34 - 00:41] For this first step, we're going to take the word, let's say the input word is u. So I mentioned that there are input words plural, but for now let's just simplify.
[00:42 - 00:43] We just have one input word. It's u.
[00:44 - 00:49] And we want to convert that word into vectors. So let me explain what vectors are.
[00:50 - 00:57] Vectors are an array of numbers. So let's say we have this array of numbers, 0, 1, 0, 0, 0, negative 1.
[00:58 - 01:05] This is just-- sorry, I want to see something real quick here. Oh, goodness.
[01:06 - 01:09] OK, there we go. There we go.
[01:10 - 01:18] So here's a set of numbers. And I'm going to say vectors, but you can think of these vectors as, again, just arrays.
[01:19 - 01:27] Arrays of two numbers in this case. Well, how do we understand or visualize what these arrays look like?
[01:28 - 01:33] Well, the first method is just to do a treat them as coordinates. So we can plot them here.
[01:34 - 01:41] So you apply these coordinates. You can also consider these points as arrows starting from the origin and ending up at that point.
[01:42 - 01:57] So you can visualize vectors as either points or direction with length, the direction with length being the arrows and the points being the circles that I drew. To simplify, I'm going to refer to vectors as points every so often, at least when I illustrate them.
[01:58 - 02:06] But when I speak, I'll say vectors. But when you think vectors-- sorry, when I say vectors, you can just think points in space.
[02:07 - 02:11] OK, so really quickly. Looks like-- so Ken asked, where does EOS come from?
[02:12 - 02:24] End of a document fed into training? So EOS actually occurs only at the very, very end of-- so it's not as granular as a document.
[02:25 - 02:36] What will usually happen is you don't have EOS until the end of a corpus. And so a corpus could be more than just a single document.
[02:37 - 02:41] It could be all of Wikipedia, for example. But none of that really matters in pre-training.
[02:42 - 02:47] And so I'll talk about different stages of training later on as well. But in instruction tuning, this matters a lot.
[02:48 - 02:57] So in instruction tuning, end of sequence is just at the end of a typical response. And that matters because you don't want chat to be too labbing on and on and on and on.
[02:58 - 03:05] And so that's sort of a problem that was common with the early versions of LLMs. But the model just blabs on and on and on and on.
[03:06 - 03:12] And so we would fix that during instruction tuning. So long story short, end of sequence is end of a very long corpus.
[03:13 - 03:17] Or it's the end of a response you expect the model to give during training. Yeah.
[03:18 - 03:25] So I will talk more about instruction following an instruction finding tuning later on. Yeah.
[03:26 - 03:29] Cool. OK, so back to these slides here.
[03:30 - 03:36] So we've talked about how to visualize these arrays of numbers. Or in other words, these vectors.
[03:37 - 03:41] For now, put aside the visualization. We're going to come back to that later on.
[03:42 - 03:51] But let's say that we have a mapping from words to vectors. We know that u maps to 0, 1, r maps to 0.7, and cold maps to 0, negative 1.
[03:52 - 03:59] I just arbitrarily defined this mapping. And later on, I'll tell you how you can actually create this mapping on your own.
[04:00 - 04:03] But now, this is the mapping. So we saw that our input word was u.
[04:04 - 04:06] So let's look at u in this mapping. We know that u is 0, 1.
[04:07 - 04:11] So let's use that. u now converts into 0, 1.
[04:12 - 04:22] That process of looking up what our word is-- sorry, looking up the vector for our word is called n-bed. In other words, we're converting words into vectors.
[04:23 - 04:27] Now, our next step is to then transform these vectors. And I use the word transform intentionally.
[04:28 - 04:37] This blocks in the middle, as you can probably imagine, is the transformer that we alluded to earlier. So the second step here is to transform vectors.