Vectors, intuitively

This lesson preview is part of the Fundamentals of transformers - Live Workshop course and can be unlocked immediately with a single-time purchase. Already have access to this course? Log in here.

This video is available to students only
Unlock This Course

Get unlimited access to Fundamentals of transformers - Live Workshop with a single-time purchase.

Thumbnail for the \newline course Fundamentals of transformers - Live Workshop
  • [00:00 - 00:05] So how language models predict? We're going to look at the architecture of a large language model.

    [00:06 - 00:12] Here it's the quote unquote macro architecture. We're going to be looking at the large chunks of the large language model just at a very high level.

    [00:13 - 00:17] And then we'll dive deeper and deeper. OK, so we talked before.

    [00:18 - 00:26] We said this gray box is the LLA, and it takes in a bunch of input words, and it outputs the next word. All right, so let's break that apart into three different steps.

    [00:27 - 00:33] I haven't labeled any of those steps yet. All you see on this slide is just that there are three different steps.

    [00:34 - 00:41] For this first step, we're going to take the word, let's say the input word is u. So I mentioned that there are input words plural, but for now let's just simplify.

    [00:42 - 00:43] We just have one input word. It's u.

    [00:44 - 00:49] And we want to convert that word into vectors. So let me explain what vectors are.

    [00:50 - 00:57] Vectors are an array of numbers. So let's say we have this array of numbers, 0, 1, 0, 0, 0, negative 1.

    [00:58 - 01:05] This is just-- sorry, I want to see something real quick here. Oh, goodness.

    [01:06 - 01:09] OK, there we go. There we go.

    [01:10 - 01:18] So here's a set of numbers. And I'm going to say vectors, but you can think of these vectors as, again, just arrays.

    [01:19 - 01:27] Arrays of two numbers in this case. Well, how do we understand or visualize what these arrays look like?

    [01:28 - 01:33] Well, the first method is just to do a treat them as coordinates. So we can plot them here.

    [01:34 - 01:41] So you apply these coordinates. You can also consider these points as arrows starting from the origin and ending up at that point.

    [01:42 - 01:57] So you can visualize vectors as either points or direction with length, the direction with length being the arrows and the points being the circles that I drew. To simplify, I'm going to refer to vectors as points every so often, at least when I illustrate them.

    [01:58 - 02:06] But when I speak, I'll say vectors. But when you think vectors-- sorry, when I say vectors, you can just think points in space.

    [02:07 - 02:11] OK, so really quickly. Looks like-- so Ken asked, where does EOS come from?

    [02:12 - 02:24] End of a document fed into training? So EOS actually occurs only at the very, very end of-- so it's not as granular as a document.

    [02:25 - 02:36] What will usually happen is you don't have EOS until the end of a corpus. And so a corpus could be more than just a single document.

    [02:37 - 02:41] It could be all of Wikipedia, for example. But none of that really matters in pre-training.

    [02:42 - 02:47] And so I'll talk about different stages of training later on as well. But in instruction tuning, this matters a lot.

    [02:48 - 02:57] So in instruction tuning, end of sequence is just at the end of a typical response. And that matters because you don't want chat to be too labbing on and on and on and on.

    [02:58 - 03:05] And so that's sort of a problem that was common with the early versions of LLMs. But the model just blabs on and on and on and on.

    [03:06 - 03:12] And so we would fix that during instruction tuning. So long story short, end of sequence is end of a very long corpus.

    [03:13 - 03:17] Or it's the end of a response you expect the model to give during training. Yeah.

    [03:18 - 03:25] So I will talk more about instruction following an instruction finding tuning later on. Yeah.

    [03:26 - 03:29] Cool. OK, so back to these slides here.

    [03:30 - 03:36] So we've talked about how to visualize these arrays of numbers. Or in other words, these vectors.

    [03:37 - 03:41] For now, put aside the visualization. We're going to come back to that later on.

    [03:42 - 03:51] But let's say that we have a mapping from words to vectors. We know that u maps to 0, 1, r maps to 0.7, and cold maps to 0, negative 1.

    [03:52 - 03:59] I just arbitrarily defined this mapping. And later on, I'll tell you how you can actually create this mapping on your own.

    [04:00 - 04:03] But now, this is the mapping. So we saw that our input word was u.

    [04:04 - 04:06] So let's look at u in this mapping. We know that u is 0, 1.

    [04:07 - 04:11] So let's use that. u now converts into 0, 1.

    [04:12 - 04:22] That process of looking up what our word is-- sorry, looking up the vector for our word is called n-bed. In other words, we're converting words into vectors.

    [04:23 - 04:27] Now, our next step is to then transform these vectors. And I use the word transform intentionally.

    [04:28 - 04:37] This blocks in the middle, as you can probably imagine, is the transformer that we alluded to earlier. So the second step here is to transform vectors.