@Farah Diana Masri I think there are two scenarios here, are you looking to extract text and present it to the user as is in the chat immediately or use the text in the current conversation to answer questions related to the document.
If you are looking to just extract text, then you can integrate a custom or any prebuilt models of document intelligence in your chat workflow and extract text and present it to user when the result is available. Since most of the document intelligence operations are async you need to poll for result to check if it is available. Currently, there is no direct way to integrate a document intelligence API into the chat workflow from OpenAI studio, this has to be an additional client implementation by using the REST API or SDK to call the document intelligence model.
For the second scenario I mentioned, uploading these formats is not possible directly but an image can be presented, and the model can read it and provide you the context of text in the image. I believe you have already tried this. If you need to know more about scenarios of RAG with document intelligence, take a look at this documentation. A sample notebook to integrate with langchain and openai model with azure search is available to check.
If this answers your query, do click Accept Answer
and Yes
for was this answer helpful. And, if you have any further query do let us know.