Is it possible if GPT 4o or GPT 4 extract content in the documents (pdf, word,excel) if user upload it in the chatbot?

Farah Diana Masri 5 Reputation points
2024-11-19T06:38:38.33+00:00

image

Hi, I have questions regarding the capabilities of GPT 4o or GPT 4 model in Azure OpenAI. Currently, I need to develop a custom chatbot mobile application that utilize the Azure OpenAI GPT model for internal usage. One of the user requirements are the user can upload the image or documents in the chatbot, and the assistant supposed to extract content from it. My question is does GPT 4 or 4o model has the capabilities to extract content from documents such as (pdf. word files or excel) ? I have tried to upload in the chatbot, but the chatbot only supported images like(png, jpg and gif). Please assist with this issue. Thank you.

Azure OpenAI Service
Azure OpenAI Service
An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
3,303 questions
{count} votes

1 answer

Sort by: Most helpful
  1. romungi-MSFT 47,026 Reputation points Microsoft Employee
    2024-11-19T13:31:24.9266667+00:00

    @Farah Diana Masri I think there are two scenarios here, are you looking to extract text and present it to the user as is in the chat immediately or use the text in the current conversation to answer questions related to the document.

    If you are looking to just extract text, then you can integrate a custom or any prebuilt models of document intelligence in your chat workflow and extract text and present it to user when the result is available. Since most of the document intelligence operations are async you need to poll for result to check if it is available. Currently, there is no direct way to integrate a document intelligence API into the chat workflow from OpenAI studio, this has to be an additional client implementation by using the REST API or SDK to call the document intelligence model.

    For the second scenario I mentioned, uploading these formats is not possible directly but an image can be presented, and the model can read it and provide you the context of text in the image. I believe you have already tried this. If you need to know more about scenarios of RAG with document intelligence, take a look at this documentation. A sample notebook to integrate with langchain and openai model with azure search is available to check.

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    1 person found this answer helpful.
    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.