Adding Retrieval-Augmented Generation to the chat

This lesson preview is part of the Responsive LLM Applications with Server-Sent Events course and can be unlocked immediately with a \newline Pro subscription or a single-time purchase. Already have access to this course? Log in here.

This video is available to students only

Unlock This Course

Get unlimited access to Responsive LLM Applications with Server-Sent Events, plus 80+ \newline books, guides and courses with the \newline Pro subscription.

$Thumbnail for the \newline course Responsive LLM Applications with Server-Sent Events$

[00:00 - 00:12] Welcome back. In this lesson, we will use the retrieval augmented generation pattern to improve our chat. In the previous lesson, we build a semantic index using the IPCC report.

[00:13 - 00:21] Now, we are going to use the right river doing regeneration. Let us look at our backend endpoint.

[00:22 - 00:34] So, we are starting from the previous chat implementation. So, exactly as time, we have a Python class, we have a fast App endpoint, and we are using the server set event once again.

[00:35 - 00:45] All this code is exactly as the same as was explained in the backend module. But now, before we do our generation, we need to fetch some content.

[00:46 - 00:54] This is our new code. Now, how does it work? We need to check whether the red river is needed to answer the user question.

[00:55 - 01:06] How do we do that? We make another code to the realm, and we ask, here is the question of the user. Does the user need to run version from the IPCC report?

[01:07 - 01:19] And, if yes, the map builds a query. So, we get the query, we once again use the right river using await, and we get the most relevant extract.

[01:20 - 01:31] And now, we can pass to the model the extract using a message with the whole function. So, we set whole function, we set the name of the method, and then we set the content.

[01:32 - 01:48] Note that we use the ML because charge Bt is slightly better at understanding the ML reverse that basic reason. And now, if we do the same generation, the model will be able to use the chunk to make it on to more precise and more reliable.

[01:49 - 02:05] Note that you can also emit the event support you would like to display the chunks on the UI. You can emit the event exactly as we did for an LMS chunk, and it will be sent to the front end and you just need to build the UI as you would like.

[02:06 - 02:12] We finished implementing the retrieval augmented generation, but turn on the congratulations and see you soon.