GPT-4 & LangChain Tutorial: How to Chat With A 56-Page PDF Document (w/Pinecone)

Chat with data

#gpt3#langchain#openai#machine learning#artificial intelligence#natural language processing#nlp#typescript#semantic search#similarity search#gpt-3#gpt4#openai gpt3#openai gpt3 tutorial#openai embeddings#openai api#text-embedding-ada-002#new gpt3#openai sematic search#gpt 3 semantic search#chatbot#langchainchatgpt#langchainchatbot#openai question answering

223.3K จำนวนครั้งที่เข้าชม｜1 สรุปแล้ว｜2 ปีที่แล้ว

💫 สรุป

This video tutorial demonstrates how to use GPT-4 and LangChain to chat with a lengthy PDF document, allowing users to ask questions and receive responses using natural language. The process involves converting the PDF to text, creating embeddings for the text, and storing them in a vector store using Pinecone. Users can interact with the document by asking questions and retrieving relevant sections and references.

✨ ไฮไลท์📊 คำบรรยาย

คัดลอก

สนทนากับวิดีโอ

✦The video discusses how to chat with a 56-page PDF document using LangChain and GPT-4.
00:03
The video demonstrates the process of converting a PDF document into text.
LangChain is used to split the text into chunks for easier processing.
Users can interact with the document by asking questions and receiving responses.
The document includes references and sources for further review.

✦This section explains the two phases of the process: document ingestion and user query processing.
03:32
In the document ingestion phase, the document is converted to text, split, and transformed into numbers for storage.
In the user query processing phase, the user's question is combined with the chat history and sent to the language model.
The question is converted into embeddings, which are compared to the embeddings of the stored documents to find relevant ones.
Finally, GPT-4 returns an answer based on the customized model.

✦The video explains how the LangChain PDF loader loads raw documents from a PDF file and splits them into chunks for processing.
07:20
The PDF loader takes a file path and loads the raw documents from the PDF file.
The raw documents are split into chunks of a thousand with an overlap of 200.
An index is created for Pinecone, which acts as a store for the embeddings.
The "from documents" function processes the embeddings and stores them in Pinecone.

✦The video demonstrates the process of ingesting a document into Pine Cone and explains how vectors are used for similarity calculations.
10:27
The MPM run ingest process involves creating the vex store and completing the ingestion.
Pine Cone is a storage system where you can set up your own index name and environment.
Vectors in Pine Cone represent sections of the document and are compared to user queries to determine relevance.

✦This section explains the steps to set up the environment and API keys for using LangChain with Pinecone.
13:49
Clone the repository and set API keys in the environment.
Copy the examples into the environment variable folder.
Obtain API keys from Pinecone and open the Pinecone client.
Use the "make chain" function to create a custom chain.
Pass the vector store (Pine) and set custom prompts.
Set the streaming effect, model name, and temperature for the chain.

✦This section explains the code implementation for managing the source documents and user queries in a chat application.
17:19
The initial state includes messages pending, incoming messages, and chat history.
The query is cleaned up and passed into the messages.
The endpoint API chat is called to receive the question and sanitize the history.
A vector store is created using Pine to store embeddings and index names.
A function is created to send data to the front end.

✦This section explains the process of calling a function and sending source documents to the client in order to display them.
20:32
A callback function is created to send tokens to the front end.
The make chain function is called with sanitized questions, chat history, and source documents set to true.
The response containing the source documents is sent to the client.
When the process is done, the history and messages are set, and the loading is stopped.
Source documents are checked before setting the state.
Use Memo is used to optimize the function call.
The front end captures the data and maps over it.
The source code and visual guide are available.

00:03Hey, this is Mayo from Chat With Data, and in today's video I'm gonna be talking about how to chat with a long pdf.

00:10So here we have a 56 page legal document. It's actually a legal case for a massive Supreme Court case in the United States.

00:21You can see we got tons of pages, which is typical for most PDF documents. And you can see it's this kind of horrible text that you can't even properly copy out as well.

00:35So what we wanna end up with is a situation where we can chat with the document. So we can say, what is this legal case about press enterer?

00:49And hopefully we'll get response back. This legal case is about student Frederick. Interesting. Now we also have sources referring back to the pdf, but also sections of the PDF as well that you can review.

01:03So maybe you don't understand something from the response. So you can say back, what do you mean by qualified immunity?

01:13Let's see what comes back. So is this kind of back and forth interaction where we're using Lang chain and the G B T four to get response back?

01:27That hopefully is what we're looking for <affirmative>. Interesting. Cool. And let's check that 0.2 actual links. And so that's pretty cool, right?

01:46So you get the references and you also have sources as well inside the document. Cool. So how do we do this?

01:56How does this work? Well, let's jump into the diagram and get started. So this is the PDF chat architecture using Lang chain and G P T four.

02:10Now, when I show the code, or if you want to replicate from the code base, just bear in mind that you can swap out to older models.

02:21You don't have to use GT four, I'm just lucky. And was lucky to get access to the api. So have the PDFs documents, we convert it to text, right?

02:37Then we split the text into chunks because of the issue of context window. Remember with If, if you've ever played acha, C B T, if you try and paste an entire doc PDF doc inside or you try and copy the text and paste it inside, you probably notice that it says that this size is too big.

03:03So we overcome that issue using Lang chain to split into chunks, and each chunk is gonna be a certain number of characters of your text.

03:13So maybe it's a thousand characters, 2000, whatever the case is. So we have these chunks, then we create these embeddings.

03:20So an embedding is just a number representation of your text. We store it somewhere, okay? So you can kind of think of this as an ingestion phase, right?

03:32And we'll talk about that in a second when we jump into the code. But this ingestion phase will take this document, convert it to text, split it, and convert it into numbers that will be stored in a, a vector basin.

03:49And in this case we're using pine curtain. So I'll, I'll come back to that in a second. So that's phase one.

03:57Now phase two is from your front end, the user asks a question. So maybe they say, how do I create an account?

04:06And what you've done here is the PDF docs of your company's support docs, right? So the user says, how do I create an account?

04:20You combine that with the chat history, as in this case you send it to the large language model. So GT 3.504, and you say, Hey, create a standalone question.

04:35So based on the chat history and the new question, create a standalone question. This standalone question we convert into embeddings.

04:42So embeddings kind of look like this, right? It's if I just do a quick sketch, you'll just have something like 0.1, 0.2, you know, 1.1 and each vector, you know, you, you would end up with 1,536 of these in the case of open eye for to represent the, the text, this standalone question.

05:13And so all these vectors are then taken to the vector store. So it says, Hey, okay, these are the numbers I have.

05:25Let me compare to these the numbers you have. And remember when you stored here, each of these represented num like a at these vectors, right?

05:36And they all had different values. So what it's gonna do is check and see, okay, which chunk is similar or most similar or which chunks are most similar to the question, this standalone question that was asked, right?

05:54And so I look to the relevant documents that are embedded, retrieve the relevant documents, which are here in these cases, the Source documents.

06:08Then it combines a standalone question. So in this case, whatever this plus this led to that and it uses the relevant docs as a context to say, Hey, based on this standalone question and the relevant docs do X, Y, Z, right?

06:30You obviously customize what you want the model to do, you and then G P T four in my case returns an answer.

06:40And that's basically what we're seeing here, right? The response comes back. So that's the architecture in a nutshell. So let's jump into the code itself to make sense of what's going on.

06:55Cool. So basically there's two phases involved. We already spoke about the ingest, the ingesting phase, which is just a phase of effectively converting your pdf into these vector numbers that will be stored in a vector store.

07:20So if you can see what's going on, conceptually, we've gone over the high level. So lang chain has this thing called a PDF loader.

07:29And what the PDF loader does is it takes a file path. So in this case, the P D F is in here.

07:35So this is the, the file path. And what it does is it Basically loads the raw documents from the P D F file.

07:46So it does all of that for you under the hood. So these raw documents are just basically the text contains the text of the pdf.

07:58Once we have that, we split, remember the split in, in the diagram into chunks of a thousand, and they overlap.

08:06So one section to another of 200. And again, this is provided by Lang chain to make this easier. So we split the dots and then we create the open eye embeddings function.

08:21Remember we need a f a funk, we need, we need this thing that's gonna help to create the numbers from the text.

08:31And so then we initiate, we, we create, we we create or initialize this index for pine cone. So an index is you can think of as the name of your store or where you're gonna store your vectors.

08:46Then we run this from documents function, which effectively what it does is actually goes through the process of taking the embeddings and putting them into creating the embeddings and, and then putting them into pine, right?

09:06So that's the name space here. So you can change this name space or actually in the configurations you would need to, cuz when you create pine you would give your index a name and you would also give it, you have an optional name space.

09:24Why I recommend that because you probably want to have a way to categorize different vectors of different embeddings that you make into the store.

09:38I'm gonna show you what that looks like. I know might sound very opaque right now. So, so yeah, there we go.

09:47So you have your index, which pine cone, you have your documents which are split already. You create the embeddings and you store in the name space, right?

10:00So let me run this again, but I will change the name space so I don't override what I currently have.

10:09Lemme just call this demo And I'll just show you what that looks like. So there's a script in package or Jason that is called ingest and that script will run dysfunction, right?

10:27So that's MPM run ingest. I just want you to see what actually happens here. There we go. Crane the VE store.

10:48So it's done the splits, it gets the metadata, now it's creating the vex store and ingestion complete, right? So now is this process, now the embeddings were done and the ingestion is complete, right?

11:03Cause we run. So let's go into Pine Cone and see what that looks like. So this is my pine cone dashboard.

11:11You can set this up on your own and create your own index name. So like I said, you think of it as storage.

11:19You set your environment. So your environment is basically where it's gonna be served closest to, and you want to make sure this matches what's on your code.

11:34Cosine is the calculation that's done to find what's similar. And then these are the dimensions for each vector as I spoke about.

11:45So you would effectively have say index here. So if you check it out, look at we did demo, right? And that was what we just did right now.

12:01And demo has 178 vectors, which is the same as g T four pdf. Let me show you what test looks like.

12:14Should query, So this is what a vector looks like. These are empty, but effectively you would just literally have these array of numbers and they represent a particular section.

12:32So your chunk that you've put in. And so when we say hundred and 78 vectors, that's what we're referring to in this case.

12:44So that's basically pine cone in sec in a nutshell. Let me see if I can test Retrieve one D you want, oh, there we go.

12:57So this is an example of what the vectors would look like. So every vector has an id and you can see it has an id, it has values, it also has metadata, which is text.

13:11So you can kind of think of this as your chunk. And your chunk is represented by these values, these vectors, right?

13:20And it's these vectors that are compared to the question the user asks to then say, Hey, which one of you guys is the most relevant to the question?

13:32So I hope that explains pine and obviously you've seen now. Cool. So yeah, back to the code basically. Now you've done the ingestion.

13:49So that's phase one complete. So what's next? Well, pretty much at this point, let's just go through other things. So this is the pine cone initiating the pine cone.

14:03So here you set your environment as discussed. You set your API keys, so you make sure you clone this one, and then you create an environment variable folder where you put in the examples inside, right?

14:20So you copy these and then you put it in here and then you, you go to open eye, you go to Pine Cone to get the the API keys, right?

14:32Cool. The visual guide is also in here. This is the open eye client, which I, you can get from Lang chain directly, but I'm just trying to make this more structured.

14:47And then we have make chain. So make chain effectively. What's going on here is this is the streaming effect that you saw and in this streaming effect this is actually a custom chain.

15:03So usually Lang chain, you have this thing called a chat factor, D B Q A chain. And all it does basically in a nutshell is it takes The question and it goes through the flow that we showed in the diagram and it goes to retrieve the similar documents and responds back when you call the chain, right?

15:27So it's a chain, you can think of it as a series of actions just like you saw in the diagram.

15:34So here we're passing in the vector store, which is Pine, and then we've just got some custom prompts and we're saying return the source documents true.

15:45So that's how we get the ability to see the re source documents. And then case equals two. So how many source documents to return.

15:56And so this is the streaming effect is optional, you don't have to, but here this is the model name and you can change this to whatever your you currently have access to.

16:07So it could be 2.5 Turbo or Da Vinci. So it's, it's whatever model you have access to temperature is zero to just prevent randomness and response, especially when it comes to legal stuff.

16:20You don't want too much creativity streaming and you've got a callback manager as well which handles the tokens that are being streamed back.

16:35And That's that side. So let's go to the front end. Cool. Okay, so there's quite a bit going on. I've received quite a bit of requests, quite a lot of requests to do a step by step tutorial, especially for people who are, are new to JavaScript or beginners in coding.

17:08So I if you check the description of this video, there'll just be a link to a waiting list so you can go sign up if you're interested in that.

17:19But I'll try my best in the short time. I have now to just kind of go over what's going on.

17:25So this is the front end, obviously we're dealing with the query. So this is the question. And we have a state to manage the source documents coming back.

17:35And as you can see, we have, you know, an initial state basically messages is the messages pending is messages that are coming in and history is your chat history.

17:46So again, we're trying to represent the diagram I showed in code form let me see, lemme skip forward. So yeah, this is a submission.

17:59So obviously we clean up the query because maybe the users has spaces in their questions. So we trim it up and then we set the initial state to effectively, Take into account what the previous state was and also the user's question as well.

18:22So that's all passed into the messages. And then pending is then defined, right? So now we have the new state and the messages and the type of question.

18:33It's a user message that's, that's coming through, right? We start loading and then we set pending to an empty string, right?

18:44So at this point what happens is we then hit the endpoint API chat. So if I jump in, receive the question, the history sanitizer, just to clean it up to make sure that it's, it's good for embeddings.

19:00And then with Pine, we base the go into Pine Current to say, Hey, you know let's create this vector store where we basically have the embeddings and the name space and also this index represents the index name as well.

19:27And then what we we do is we just create this function to tell the, the client tell the front end that look you know, data's coming, data's we're gonna send data to the front end and this function is here.

19:43Now what happens effectively is That when this chain function is called, right, as you can see it In the previous code I showed you, it uses the chat vector DV q a chain, which what it does is it goes retrieves similar documents, comes back and you saw that we set the stream in with the tokens,

20:13right? So what's gonna happen is it's gonna take this vector store, go do the search, and then retrieve the tokens.

20:22So the token is just like one string, you know, per string and per string is gonna send that string to the front end and that's how you get that streaming effect.

20:32So this is a callback function for that, that we created that would effectively every time a token would come, send it to the front end.

20:42So that's what's going on there. And so this is where we call the function. So we're calling this make chain function with the sanitized question and the chat history again matches the diagram that we spoke about.

21:01And then we also send the source documents, which we set true. So now the source documents have come back. So response to our source documents, which is coming back from our set in return source documents, we send that to the client as well.

21:20And that's how you're able to see what the source documents are. And when all of this is done, done is triggered and that's why you can see what's going on here when it's done, we set the history, we also set the messages and we, you know, call off the loading, right?

21:42Because at this point there's no pending or pending source documents. So we just basically say, Hey, here's the API message and it's state doc pending, which represents the message that came in, and then the source documents come in.

21:59Otherwise if it's not done, we just pass the the data that's coming in, right? And so obviously here we're just checking for the source document before we set the state with the source documents.

22:17So if what I'm saying sounds like gibberish okay, yeah, we also use use Memo to effectively memorize this and, and because it's a, a function we're calling over and over again.

22:31So we are just trying to be more efficient here and now obviously this is the front end that captures all of that maps over it and, and so on and so forth.

22:42So I don't because of limited time, that's just the overview. The source code is gonna be available. Like I said, the visual guide is here as well.

22:54But yeah, I think there's been quite a lot of requests for more in-depth step-by-step. So if you're interested in that, check the description, join the wait list for potential workshop.

23:08I'll just talk to people on the wait list and see if there's enough demand. Then I will do a comprehensive workshop for on how to build a chat bot for your document.

23:20So whether it's a PDF or it's book, or it's multiple PDFs or it's a Doc X or an Excel or whatever by the end of that, hopefully you'll be able to build an application for yourself or your clients or whoever to have a back and forth interaction with that.

23:43So this is it in a nutshell. If you have any questions, just shoot me a message on the comments and yeah, thanks for watching.

23:54Cheers.

ดูวิดีโอต้นฉบับ