00:03Hey, this is Mayo from Chat With Data, and in today's video
I'm gonna be talking about how to chat with a long pdf.
00:10So here we have a 56 page legal document. It's actually a legal
case for a massive Supreme Court case in the United States.
00:21You can see we got tons of pages, which is typical for most PDF documents. And you can
see it's this kind of horrible text that you can't even properly copy out as well.
00:35So what we wanna end up with is a situation where we can chat with the
document. So we can say, what is this legal case about press enterer?
00:49And hopefully we'll get response back. This legal case is about student Frederick. Interesting. Now we
also have sources referring back to the pdf, but also sections of the PDF as well that you can review.
01:03So maybe you don't understand something from the response.
So you can say back, what do you mean by qualified immunity?
01:13Let's see what comes back. So is this kind of back and forth interaction
where we're using Lang chain and the G B T four to get response back?
01:27That hopefully is what we're looking for <affirmative>. Interesting. Cool.
And let's check that 0.2 actual links. And so that's pretty cool, right?
01:46So you get the references and you also have sources as
well inside the document. Cool. So how do we do this?
01:56How does this work? Well, let's jump into the diagram and get started.
So this is the PDF chat architecture using Lang chain and G P T four.
02:10Now, when I show the code, or if you want to replicate from the
code base, just bear in mind that you can swap out to older models.
02:21You don't have to use GT four, I'm just lucky. And was lucky to get access
to the api. So have the PDFs documents, we convert it to text, right?
02:37Then we split the text into chunks because of the issue of context window. Remember with If, if you've ever played acha, C B T, if you try and
paste an entire doc PDF doc inside or you try and copy the text and paste it inside, you probably notice that it says that this size is too big.
03:03So we overcome that issue using Lang chain to split into chunks, and
each chunk is gonna be a certain number of characters of your text.
03:13So maybe it's a thousand characters, 2000, whatever the case
is. So we have these chunks, then we create these embeddings.
03:20So an embedding is just a number representation of your text. We store it
somewhere, okay? So you can kind of think of this as an ingestion phase, right?
03:32And we'll talk about that in a second when we jump into the code. But this ingestion phase will take this
document, convert it to text, split it, and convert it into numbers that will be stored in a, a vector basin.
03:49And in this case we're using pine curtain. So I'll, I'll
come back to that in a second. So that's phase one.
03:57Now phase two is from your front end, the user asks a
question. So maybe they say, how do I create an account?
04:06And what you've done here is the PDF docs of your company's support
docs, right? So the user says, how do I create an account?
04:20You combine that with the chat history, as in this case you send it to the large
language model. So GT 3.504, and you say, Hey, create a standalone question.
04:35So based on the chat history and the new question, create a standalone
question. This standalone question we convert into embeddings.
04:42So embeddings kind of look like this, right? It's if I just do a quick sketch, you'll just have something like 0.1, 0.2, you know, 1.1 and each
vector, you know, you, you would end up with 1,536 of these in the case of open eye for to represent the, the text, this standalone question.
05:13And so all these vectors are then taken to the vector
store. So it says, Hey, okay, these are the numbers I have.
05:25Let me compare to these the numbers you have. And remember when you stored
here, each of these represented num like a at these vectors, right?
05:36And they all had different values. So what it's gonna do is check and see, okay, which chunk is similar or
most similar or which chunks are most similar to the question, this standalone question that was asked, right?
05:54And so I look to the relevant documents that are embedded, retrieve the
relevant documents, which are here in these cases, the Source documents.
06:08Then it combines a standalone question. So in this case, whatever this plus this led to that and it uses the
relevant docs as a context to say, Hey, based on this standalone question and the relevant docs do X, Y, Z, right?
06:30You obviously customize what you want the model to do,
you and then G P T four in my case returns an answer.
06:40And that's basically what we're seeing here, right? The response comes back. So that's the
architecture in a nutshell. So let's jump into the code itself to make sense of what's going on.
06:55Cool. So basically there's two phases involved. We already spoke about the ingest, the ingesting phase, which is
just a phase of effectively converting your pdf into these vector numbers that will be stored in a vector store.
07:20So if you can see what's going on, conceptually, we've gone over
the high level. So lang chain has this thing called a PDF loader.
07:29And what the PDF loader does is it takes a file
path. So in this case, the P D F is in here.
07:35So this is the, the file path. And what it does is it
Basically loads the raw documents from the P D F file.
07:46So it does all of that for you under the hood. So these raw
documents are just basically the text contains the text of the pdf.
07:58Once we have that, we split, remember the split in, in
the diagram into chunks of a thousand, and they overlap.
08:06So one section to another of 200. And again, this is provided by Lang chain to make
this easier. So we split the dots and then we create the open eye embeddings function.
08:21Remember we need a f a funk, we need, we need this thing
that's gonna help to create the numbers from the text.
08:31And so then we initiate, we, we create, we we create or initialize this index for pine cone. So
an index is you can think of as the name of your store or where you're gonna store your vectors.
08:46Then we run this from documents function, which effectively what it does is actually goes through the process of
taking the embeddings and putting them into creating the embeddings and, and then putting them into pine, right?
09:06So that's the name space here. So you can change this name space or actually in the configurations you would need to,
cuz when you create pine you would give your index a name and you would also give it, you have an optional name space.
09:24Why I recommend that because you probably want to have a way to categorize
different vectors of different embeddings that you make into the store.
09:38I'm gonna show you what that looks like. I know might
sound very opaque right now. So, so yeah, there we go.
09:47So you have your index, which pine cone, you have your documents which are split
already. You create the embeddings and you store in the name space, right?
10:00So let me run this again, but I will change the name
space so I don't override what I currently have.
10:09Lemme just call this demo And I'll just show you what that looks like. So there's a script
in package or Jason that is called ingest and that script will run dysfunction, right?
10:27So that's MPM run ingest. I just want you to see what
actually happens here. There we go. Crane the VE store.
10:48So it's done the splits, it gets the metadata, now it's creating the vex store and ingestion complete,
right? So now is this process, now the embeddings were done and the ingestion is complete, right?
11:03Cause we run. So let's go into Pine Cone and see what
that looks like. So this is my pine cone dashboard.
11:11You can set this up on your own and create your own
index name. So like I said, you think of it as storage.
11:19You set your environment. So your environment is basically where it's gonna be
served closest to, and you want to make sure this matches what's on your code.
11:34Cosine is the calculation that's done to find what's similar. And
then these are the dimensions for each vector as I spoke about.
11:45So you would effectively have say index here. So if you check it out,
look at we did demo, right? And that was what we just did right now.
12:01And demo has 178 vectors, which is the same as g T
four pdf. Let me show you what test looks like.
12:14Should query, So this is what a vector looks like. These are empty, but effectively you
would just literally have these array of numbers and they represent a particular section.
12:32So your chunk that you've put in. And so when we say hundred
and 78 vectors, that's what we're referring to in this case.
12:44So that's basically pine cone in sec in a nutshell. Let me
see if I can test Retrieve one D you want, oh, there we go.
12:57So this is an example of what the vectors would look like. So every vector has an id
and you can see it has an id, it has values, it also has metadata, which is text.
13:11So you can kind of think of this as your chunk. And your
chunk is represented by these values, these vectors, right?
13:20And it's these vectors that are compared to the question the user asks to
then say, Hey, which one of you guys is the most relevant to the question?
13:32So I hope that explains pine and obviously you've seen now. Cool.
So yeah, back to the code basically. Now you've done the ingestion.
13:49So that's phase one complete. So what's next? Well, pretty much at this point, let's
just go through other things. So this is the pine cone initiating the pine cone.
14:03So here you set your environment as discussed. You set your API keys, so you make sure you clone this
one, and then you create an environment variable folder where you put in the examples inside, right?
14:20So you copy these and then you put it in here and then you, you go
to open eye, you go to Pine Cone to get the the API keys, right?
14:32Cool. The visual guide is also in here. This is the open eye client, which I, you
can get from Lang chain directly, but I'm just trying to make this more structured.
14:47And then we have make chain. So make chain effectively. What's going on here is this is the
streaming effect that you saw and in this streaming effect this is actually a custom chain.
15:03So usually Lang chain, you have this thing called a chat factor, D B Q A chain. And all it does basically in a nutshell is it takes The question and
it goes through the flow that we showed in the diagram and it goes to retrieve the similar documents and responds back when you call the chain, right?
15:27So it's a chain, you can think of it as a series
of actions just like you saw in the diagram.
15:34So here we're passing in the vector store, which is Pine, and then we've just
got some custom prompts and we're saying return the source documents true.
15:45So that's how we get the ability to see the re source documents.
And then case equals two. So how many source documents to return.
15:56And so this is the streaming effect is optional, you don't have to, but here this is
the model name and you can change this to whatever your you currently have access to.
16:07So it could be 2.5 Turbo or Da Vinci. So it's, it's whatever model you have access to temperature
is zero to just prevent randomness and response, especially when it comes to legal stuff.
16:20You don't want too much creativity streaming and you've got a callback
manager as well which handles the tokens that are being streamed back.
16:35And That's that side. So let's go to the front end. Cool. Okay, so there's quite a bit going on. I've received quite a bit of requests,
quite a lot of requests to do a step by step tutorial, especially for people who are, are new to JavaScript or beginners in coding.
17:08So I if you check the description of this video, there'll just be a link
to a waiting list so you can go sign up if you're interested in that.
17:19But I'll try my best in the short time. I have
now to just kind of go over what's going on.
17:25So this is the front end, obviously we're dealing with the query. So this is
the question. And we have a state to manage the source documents coming back.
17:35And as you can see, we have, you know, an initial state basically messages is the
messages pending is messages that are coming in and history is your chat history.
17:46So again, we're trying to represent the diagram I showed in code
form let me see, lemme skip forward. So yeah, this is a submission.
17:59So obviously we clean up the query because maybe the users has spaces in their questions. So we trim it up and then we set
the initial state to effectively, Take into account what the previous state was and also the user's question as well.
18:22So that's all passed into the messages. And then pending is then defined,
right? So now we have the new state and the messages and the type of question.
18:33It's a user message that's, that's coming through, right? We
start loading and then we set pending to an empty string, right?
18:44So at this point what happens is we then hit the endpoint API chat. So if I jump in, receive the
question, the history sanitizer, just to clean it up to make sure that it's, it's good for embeddings.
19:00And then with Pine, we base the go into Pine Current to say, Hey, you know let's create this vector store where
we basically have the embeddings and the name space and also this index represents the index name as well.
19:27And then what we we do is we just create this function to tell the, the client tell the front end that
look you know, data's coming, data's we're gonna send data to the front end and this function is here.
19:43Now what happens effectively is That when this chain function is called, right, as you can see it In the previous code I showed you, it uses the chat
vector DV q a chain, which what it does is it goes retrieves similar documents, comes back and you saw that we set the stream in with the tokens,
20:13right? So what's gonna happen is it's gonna take this vector
store, go do the search, and then retrieve the tokens.
20:22So the token is just like one string, you know, per string and per string is gonna
send that string to the front end and that's how you get that streaming effect.
20:32So this is a callback function for that, that we created that would
effectively every time a token would come, send it to the front end.
20:42So that's what's going on there. And so this is where we call the function. So we're calling this make chain
function with the sanitized question and the chat history again matches the diagram that we spoke about.
21:01And then we also send the source documents, which we set true. So now the source documents have come back. So response to
our source documents, which is coming back from our set in return source documents, we send that to the client as well.
21:20And that's how you're able to see what the source documents are. And when all of this is done, done is triggered and that's why you can
see what's going on here when it's done, we set the history, we also set the messages and we, you know, call off the loading, right?
21:42Because at this point there's no pending or pending source documents. So we just basically say, Hey, here's the API
message and it's state doc pending, which represents the message that came in, and then the source documents come in.
21:59Otherwise if it's not done, we just pass the the data that's coming in, right? And so obviously
here we're just checking for the source document before we set the state with the source documents.
22:17So if what I'm saying sounds like gibberish okay, yeah, we also use use Memo to effectively
memorize this and, and because it's a, a function we're calling over and over again.
22:31So we are just trying to be more efficient here and now obviously this is the
front end that captures all of that maps over it and, and so on and so forth.
22:42So I don't because of limited time, that's just the overview. The source
code is gonna be available. Like I said, the visual guide is here as well.
22:54But yeah, I think there's been quite a lot of requests for more in-depth step-by-step. So if
you're interested in that, check the description, join the wait list for potential workshop.
23:08I'll just talk to people on the wait list and see if there's enough demand. Then I
will do a comprehensive workshop for on how to build a chat bot for your document.
23:20So whether it's a PDF or it's book, or it's multiple PDFs or it's a Doc X or an Excel or whatever by the end of that, hopefully
you'll be able to build an application for yourself or your clients or whoever to have a back and forth interaction with that.
23:43So this is it in a nutshell. If you have any questions, just shoot
me a message on the comments and yeah, thanks for watching.
23:54Cheers.