Streaming large files with Document Intelligence Python SDK

Bogdan Pechounov 60 Reputation points
2024-09-30T15:08:14.79+00:00

Does using AnalyzeDocumentRequest create a JSON payload with binary data?

    async def get_analyze_result(self, document_data: bytes) -> AnalyzeResult:
        """
        Get markdown of a document
        """
    
        document_intelligence_client = DocumentIntelligenceClient(
            endpoint=self.document_intelligence_endpoint,
            credential=AzureKeyCredential(key=self.document_intelligence_key),
        )

        async with document_intelligence_client:
            poller = await document_intelligence_client.begin_analyze_document(
                analyze_request=AnalyzeDocumentRequest(
                    bytes_source=document_data),
                model_id="prebuilt-layout",
                output_content_format=ContentFormat.MARKDOWN,
            )

            analyze_result = await poller.result()
            return analyze_result

Samples

Does the following code stream the file without blocking the thread? (I don't think a BufferedReader has async methods)

with open(path_to_sample_documents, "rb") as f:
        poller = await document_intelligence_client.begin_analyze_document(
            model_id=model_id, analyze_request=f, content_type="application/octet-stream"
        )
    result: AnalyzeResult = await poller.result()
Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
1,662 questions
{count} votes

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.