Is it possible if GPT 4o or GPT 4 extract content in the documents (pdf, word,excel) if user upload it in the chatbot?

Farah Diana Masri 5

Hi, I have questions regarding the capabilities of GPT 4o or GPT 4 model in Azure OpenAI. Currently, I need to develop a custom chatbot mobile application that utilize the Azure OpenAI GPT model for internal usage. One of the user requirements are the user can upload the image or documents in the chatbot, and the assistant supposed to extract content from it. My question is does GPT 4 or 4o model has the capabilities to extract content from documents such as (pdf. word files or excel) ? I have tried to upload in the chatbot, but the chatbot only supported images like(png, jpg and gif). Please assist with this issue. Thank you.

romungi-MSFT 47,026 Reputation points Microsoft Employee

2024-11-19T07:21:11.28+00:00

@Farah Diana Masri AFAIK the GPT models use images to extract text that was uploaded to the chat. They do not directly support files like PDF or word or excel. If you are looking for a functionality that needs to extract correct or exact data from a document, you should consider Azure Document intelligence instead. GPT models can still read text from uploaded JPEG images, but it might not preserve the structure of the document in the provided result. A service like document intelligence will provide co-ordinates and accurate text along with support for common documents with prebuilt models that can be used on the fly.

However, there is a sample repo to showcase using GPT models with document extraction in this repo. This features a sample to use GPT-4o model which converts an uploaded PDF to image files using 3rd party libraries. This might work well with smaller document files to store the context in the conversation but for larger files it might not work as intended.

I hope this helps!!
Farah Diana Masri 5 Reputation points

2024-11-19T08:58:52.4366667+00:00

Thanks for your reply. Is the Document Intelligence is capable to extract the content from the documents like pdf, word and excel that user upload? Because, currently the GPT model like 4o we are not able to upload documents, but why in the ChatGPT we can upload the documents in the chatbot?

Do we need to integrate Document Intelligence with GPT 4o or GPT4 in order for us to upload documents in the chatbot and extract the text and send back to the user?
Farah Diana Masri 5 Reputation points

2024-11-19T09:17:29+00:00

Thanks for the reply. In order to extract the text from the uploaded documents in the chatbot, is it we need to integrate the Document Intelligence with our GPT4o model?
Daniel FANG 960 Reputation points

2024-11-19T11:50:36.5833333+00:00

HI @Farah Diana Masri

You can try to use Add your data option in the Chat playground to start understanding how the document is processed.

Usually, when user attaches a document in the chat, the document will need to be processed and parse into text under the hood (probably this is what chatgpt does). It is not done by the GPT model but an ingestion service: in this case, Azure AI Search provides options to ingest pdf, excel, ppt documents into vector database. Then during the GPT call, these documents can be referenced and used to generate response.

For images, the GPT module has the vision feature that is able to process text in the images directly.
Farah Diana Masri 5 Reputation points

2024-11-19T14:46:49.3866667+00:00

Yes, my previous question is about the second scenario. Thanks for clarifying. Just want to confirm only images is supported for GPT to give context based on the images user upload in the chatbot. However, for documents like (pdf, word or pptx) format, the GPT in Azure does not have the capabilities to provide the context if user is uploading those files in the chatbot unless I integrate it with another Azure AI services like Document Intelligence?
romungi-MSFT 47,026 Reputation points Microsoft Employee

2024-11-20T05:52:49.7066667+00:00

@Farah Diana Masri Yes, the sample I mentioned uses document intelligence and AI search.
Daniel FANG 960 Reputation points

2024-11-21T05:05:31.24+00:00

Using document intelligence and AI search is one of the options. You can replace them with other solutions if it suits (i.e. cheaper options) while still using Azure OpenAI service.

for example: https://github.com/Azure-Samples/serverless-chat-langchainjs

in this example, the serverless api can ingest pdf file with a pdf library and pass text to cosmos db. rather than using AI Search, the cosmos db could be a vector store.

in the chatbot side, you can build further on the sample to allow user attach a file during chat, and restrict the user's session to only query his documents.

It depends on your use cases and budget.

1 answer

romungi-MSFT 47,026 Reputation points Microsoft Employee

2024-11-19T13:31:24.9266667+00:00

@Farah Diana Masri I think there are two scenarios here, are you looking to extract text and present it to the user as is in the chat immediately or use the text in the current conversation to answer questions related to the document.

If you are looking to just extract text, then you can integrate a custom or any prebuilt models of document intelligence in your chat workflow and extract text and present it to user when the result is available. Since most of the document intelligence operations are async you need to poll for result to check if it is available. Currently, there is no direct way to integrate a document intelligence API into the chat workflow from OpenAI studio, this has to be an additional client implementation by using the REST API or SDK to call the document intelligence model.

For the second scenario I mentioned, uploading these formats is not possible directly but an image can be presented, and the model can read it and provide you the context of text in the image. I believe you have already tried this. If you need to know more about scenarios of RAG with document intelligence, take a look at this documentation. A sample notebook to integrate with langchain and openai model with azure search is available to check.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.
Please sign in to rate this answer.

1 person found this answer helpful.

0 comments No comments
Sign in to comment

Use comments to ask for clarification, additional information, or improvements to the question.

Share via

Is it possible if GPT 4o or GPT 4 extract content in the documents (pdf, word,excel) if user upload it in the chatbot?

1 answer

Your answer