Word embeddings and nearest neighbors

This lesson preview is part of the Fundamentals of transformers - Live Workshop course and can be unlocked immediately with a single-time purchase. Already have access to this course? Log in here.

This video is available to students only
Unlock This Course

Get unlimited access to Fundamentals of transformers - Live Workshop with a single-time purchase.

Thumbnail for the \newline course Fundamentals of transformers - Live Workshop
  • [00:00 - 00:09] We'll dive into this transformer in more detail in the very next lesson. Okay, so after we transformed this vector, 0, 1 became 1, 0.

    [00:10 - 00:16] It's not clear how that happened, and that's okay. But what we want to understand now is, how do we go from 1, 0 back into a word?

    [00:17 - 00:23] So let me visualize that for you again. Here, we've plotted 1, 0 on a plot.

    [00:24 - 00:26] You can ignore the dotted line circle. That was just to help me keep organized.

    [00:27 - 00:31] So here we have 1, 0. But the question is, what word does that correspond to?

    [00:32 - 00:42] Before, I told you that there were three words, and each of them with their own corresponding vectors. So what we can do is plot those three words here as well.

    [00:43 - 00:47] So here we have the three words from before. You are cold, and we have this mystery word right over here.

    [00:48 - 00:59] So how would we determine what this new 1, 0 word, or how do we associate a word with this, right? Well, one way that we could do that is just to look at the nearest neighbor.

    [01:00 - 01:11] So here, this point, 1, 0, is closest to 0.7. So it stands to reason that this is probably R, right?

    [01:12 - 01:20] So we're going to use that idea. We're actually going to look up what is the closest point that has a word associated with it, and that's going to be the word that we output.

    [01:21 - 01:31] So 1, 0 translates into R, and that's this less step, which we call nearest neighbors. So to recap, we have three different steps.

    [01:32 - 01:37] The first is to convert a word into a vector. This is just the look up and a big list of vectors that we have, or big dictionary of vectors that we have.

    [01:38 - 01:49] The second is to actually transform that vector using a transformer, which we'll talk about in more detail in a second. And once we have that output vector, we then convert that vector back into a word using nearest neighbor.

    [01:50 - 01:58] OK, so now that I've talked about all of these, we can answer Maya's questions before. Maya's question was, are tokens related to features?

    [01:59 - 02:07] So right here, this input, you, you would have been some integer, right? Like we saw before, it could have been 2, 500, or whatever it is.

    [02:08 - 02:16] It's a single integer, and that single integer is the token's ID. Right here, 0, 1 is what we would call the token.

    [02:17 - 02:24] And it's also what we would call the feature. So to Maya's point, tokens basically are features, right?

    [02:25 - 02:37] But features are more broad. Features apply to any neural network in between layers, whereas tokens are very, very specific to transformer models or transformer based models.

    [02:38 - 02:47] OK. All right, so this is the summary in this diagram form.

    [02:48 - 02:53] Now, let me resummerize, but using the points in 2D space. So to start off, we had the word, you.

    [02:54 - 03:06] The word, you was translated into the coordinate 0, 1. Then we ran the transformer and that produced a new coordinate, one, comma, zero. The question was, what does this coordinate correspond to?

    [03:07 - 03:12] What is the word that coordinate one, zero gives us? And we simply plot all of the words that we know about.

    [03:13 - 03:18] And we look for the closest one, which was our. And that's how we produced our from this entire process.

    [03:19 - 03:27] OK, so the question is, are token and embedding synonyms? Isn't the token a word fragment or punctuation mark?

    [03:28 - 03:32] So it's a good question. Let me go back to that diagram here.

    [03:33 - 03:41] OK, so in this diagram, we have so embeddings and features are direct sentiments. Tokens can refer to either.

    [03:42 - 03:49] So tokens can refer to either the text or the vector. That corresponds to the same thing.

    [03:50 - 03:55] Right. So when you say word fragment, punctuation mark, that's absolutely correct. That is definitely the case, a token corresponds to that.

    [03:56 - 04:00] But the word can also take its vector representation. So there are two representations, the same thing.

    [04:01 - 04:12] And the underlying thing is a token. Yeah, and then feel free to leave more questions in the chat if that was confusing to.