Discovering the OpenAI Completion API

This lesson preview is part of the Responsive LLM Applications with Server-Sent Events course and can be unlocked immediately with a \newline Pro subscription or a single-time purchase. Already have access to this course? Log in here.

This video is available to students only

Unlock This Course

Get unlimited access to Responsive LLM Applications with Server-Sent Events, plus 80+ \newline books, guides and courses with the \newline Pro subscription.

$Thumbnail for the \newline course Responsive LLM Applications with Server-Sent Events$

[00:00 - 01:41] Welcome back. In this lesson we will start building an AI product. It would be a very simple use case focusing on the front end. We want to convert some natural language into images . So there is a simple text input to be on order to be. We click on the button and we get an emoji translation. It's made possible thanks to the capabilities of large language model. We are on the open AI playground and we are looking at the completion input. It's a legacy endpoint which is very useful to understand what are LLM doing. As you can see here there is the input which is a sequence to be or not to. What's the prediction of a large language model? It's the next element in the sequence. The next element is B. To be or not to be. Then this new sequence becomes the new input to be or not to be. The next prediction is that. And so on. And we can innovate here. And that's how large language model builds the answer by predictive iteratively the next account. Now let's look at the chat completion endpoint which is the endpoint we are going to use in this course and which is the current recommended endpoint. It's quite straightforward part of Post-Strest API. While Post-Strest because we want some problems to be very long for instance some conversations can last tens of thousands of characters and in most rather get requests are limited to two thousand characters. So get will not work for use case. Another thing to notice is the API key.

[01:42 - 02:04] You can very easily create a new one. You go to API keys here and create a new API key. Also one remark you should not make those API calls from the front end. And we will do so in this course of pedagogy purposes. But that's the way of leaking the API key so it should not be done in production. If you want to do it properly click at module focusing on the back end.

[02:05 - 02:31] The model perhaps is quite straightforward. You pick your model. There is a compromise because big model are smarter but they are slower and more expensive. So you need to make a trigger depending on the use case. And last and foremost the message param. Message is simply the state of the current conversation with different roles system user tool assistant.

[02:32 - 02:47] We will see later exactly what the correspond. Now a few remarks. The first is that it's not a deterministic. The second is the context window. Not deterministic let me show you. I asked the question. How are you to the model and I get an answer. How can I help you today?

[02:48 - 04:03] Let's try exactly the same question. And here I get an AI digital assistant designed to provide helpful response user increase. There is a parameter called temperature that controls her own damages. But even with a temperature set to zero it won't be deter ministic. Second limitation is the context window. There is a maximum size for input which is a few thousand of dozen of thousands for somebody else. But you need to come to a can and you need to truncate your message in some cases. Here you can see open AI cookbook. We'll look at it in more details in our next module that shows how to come to the token for a message. Now we've seen this API. I just want now to give a few details of how Chagit P.T. works and how GPT sees the world. We don 't need to know that how as the model is fully meant but it's useful because you get a deeper understanding of how to use it. And sometimes it lets you understand some failure cases and how to optimize. So the first is the tokenizer. What a tokenizer. Chagit P.T. does not see roadtext to be or not to be. First it needs to be tokenized. Autoconizer is a function that takes in input some text.

[04:04 - 04:29] And the output some token which are numbered. Here you can see the conversion of to be or not to be into a sequence of tokens. Then the prediction will be another token. And so we need to decode. It matters because sometimes simply making small changes will make big difference of the tokenizer. You can see your warning. The warning is caused by the trailing edge space.

[04:30 - 04:45] The trailing edge space change. We're encoding. Let me show you the tokenizer plugin at open AI. I'm adding some my text. And here I get the tokens with colors. And each color is one token.

[04:46 - 05:18] And here you can see the number for each token. So that's what we are counting when we are talking of a context window size. It's the size of this token list. So now that we briefly see tokens. I want to show you also what are special tokens. Because as you can see in the in the API we are sending a list of messages. But that's not what JPGPT is meant to do. JPGPT is meant to consume a sequence of tokens. How do we convert this list into a second of tokens with special tokens?

[05:19 - 05:47] EM start EM start EM end which are tokens used to indicate to the model that they thank user making this message. Now that we've seen the basics let's go to the chat playground to play a bit with the open API endpoint. I encourage you to go to it and to try it. First the system message. The system message gives personality and context to the assistant. Here I said you are a jolly pirate.

[05:48 - 06:08] So when the user speak to assistant it answers how am I to why they bring you to a ship today. Why is that? Because the system gave it a personality. If I say you are a king and then I ask an raw message and I get an answer. Great things are a subject because I gave it a personality of a king.

[06:09 - 06:28] So that's the power of a system message. Then you have a user message it's simply user talking to a model and the assistant message is content which is predicted by the model . That's all for this lesson and in the next lesson we will look at how to consume this stream which is predicted by this endpoint. See you soon.