Word embeddings and nearest neighbors

Project Source Code

Get the project source code below, and follow along with the lesson material.

Download Project Source Code

To set up the project on your local machine, please follow the directions provided in the README.md file. If you run into any issues with running the project source code, then feel free to reach out to the author in the course's Discord channel.

This lesson preview is part of the Fundamentals of transformers - Live Workshop course and can be unlocked immediately with a single-time purchase. Already have access to this course? Log in here.

This video is available to students only
Unlock This Course

Get unlimited access to Fundamentals of transformers - Live Workshop with a single-time purchase.

Thumbnail for the \newline course Fundamentals of transformers - Live Workshop
  • [00:00 - 00:09] We'll dive into this transformer in more detail in the very next lesson. Okay, so after we transformed this vector, 0, 1 became 1, 0.

    [00:10 - 00:16] It's not clear how that happened, and that's okay. But what we want to understand now is, how do we go from 1, 0 back into a word?

    [00:17 - 00:23] So let me visualize that for you again. Here, we've plotted 1, 0 on a plot.

    [00:24 - 00:26] You can ignore the dotted line circle. That was just to help me keep organized.

    [00:27 - 00:31] So here we have 1, 0. But the question is, what word does that correspond to?

    [00:32 - 00:42] Before, I told you that there were three words, and each of them with their own corresponding vectors. So what we can do is plot those three words here as well.

    [00:43 - 00:47] So here we have the three words from before. You are cold, and we have this mystery word right over here.

    [00:48 - 00:59] So how would we determine what this new 1, 0 word, or how do we associate a word with this, right? Well, one way that we could do that is just to look at the nearest neighbor.

    [01:00 - 01:11] So here, this point, 1, 0, is closest to 0.7. So it stands to reason that this is probably R, right?

    [01:12 - 01:20] So we're going to use that idea. We're actually going to look up what is the closest point that has a word associated with it, and that's going to be the word that we output.

    [01:21 - 01:31] So 1, 0 translates into R, and that's this less step, which we call nearest neighbors. So to recap, we have three different steps.

    [01:32 - 01:37] The first is to convert a word into a vector. This is just the look up and a big list of vectors that we have, or big dictionary of vectors that we have.

    [01:38 - 01:49] The second is to actually transform that vector using a transformer, which we'll talk about in more detail in a second. And once we have that output vector, we then convert that vector back into a word using nearest neighbor.

    [01:50 - 01:58] OK, so now that I've talked about all of these, we can answer Maya's questions before. Maya's question was, are tokens related to features?

    [01:59 - 02:07] So right here, this input, you, you would have been some integer, right? Like we saw before, it could have been 2, 500, or whatever it is.

    [02:08 - 02:16] It's a single integer, and that single integer is the token's ID. Right here, 0, 1 is what we would call the token.

    [02:17 - 02:24] And it's also what we would call the feature. So to Maya's point, tokens basically are features, right?

    [02:25 - 02:37] But features are more broad. Features apply to any neural network in between layers, whereas tokens are very, very specific to transformer models or transformer based models.

    [02:38 - 02:47] OK. All right, so this is the summary in this diagram form.

    [02:48 - 02:53] Now, let me resummerize, but using the points in 2D space. So to start off, we had the word, you.

    [02:54 - 03:06] The word, you was translated into the coordinate 0, 1. Then we ran the transformer and that produced a new coordinate, one, comma, zero. The question was, what does this coordinate correspond to?

    [03:07 - 03:12] What is the word that coordinate one, zero gives us? And we simply plot all of the words that we know about.

    [03:13 - 03:18] And we look for the closest one, which was our. And that's how we produced our from this entire process.

    [03:19 - 03:27] OK, so the question is, are token and embedding synonyms? Isn't the token a word fragment or punctuation mark?

    [03:28 - 03:32] So it's a good question. Let me go back to that diagram here.

    [03:33 - 03:41] OK, so in this diagram, we have so embeddings and features are direct sentiments. Tokens can refer to either.

    [03:42 - 03:49] So tokens can refer to either the text or the vector. That corresponds to the same thing.

    [03:50 - 03:55] Right. So when you say word fragment, punctuation mark, that's absolutely correct. That is definitely the case, a token corresponds to that.

    [03:56 - 04:00] But the word can also take its vector representation. So there are two representations, the same thing.

    [04:01 - 04:12] And the underlying thing is a token. Yeah, and then feel free to leave more questions in the chat if that was confusing to.