Streaming large files with Document Intelligence Python SDK

Bogdan Pechounov 60

Does using AnalyzeDocumentRequest create a JSON payload with binary data?

    async def get_analyze_result(self, document_data: bytes) -> AnalyzeResult:
        """
        Get markdown of a document
        """
    
        document_intelligence_client = DocumentIntelligenceClient(
            endpoint=self.document_intelligence_endpoint,
            credential=AzureKeyCredential(key=self.document_intelligence_key),
        )

        async with document_intelligence_client:
            poller = await document_intelligence_client.begin_analyze_document(
                analyze_request=AnalyzeDocumentRequest(
                    bytes_source=document_data),
                model_id="prebuilt-layout",
                output_content_format=ContentFormat.MARKDOWN,
            )

            analyze_result = await poller.result()
            return analyze_result

Samples

Does the following code stream the file without blocking the thread? (I don't think a BufferedReader has async methods)

with open(path_to_sample_documents, "rb") as f:
        poller = await document_intelligence_client.begin_analyze_document(
            model_id=model_id, analyze_request=f, content_type="application/octet-stream"
        )
    result: AnalyzeResult = await poller.result()

YutongTie-MSFT 51,501 Reputation points

2024-10-01T00:24:36.9733333+00:00

Hello Bogdan Pechounov

Thanks for reaching out to us, Azure Document Intelligence support bytes source - https://learn.microsoft.com/en-us/python/api/azure-ai-documentintelligence/azure.ai.documentintelligence.models.analyzedocumentrequest?view=azure-python-preview

Please refer to the sample from the sample repo - https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/documentintelligence/azure-ai-documentintelligence/samples/async_samples/sample_analyze_invoices_from_bytes_source_async.py

Please take a look and have a try. I hope it helps.

Regards,

Yutong

Share via

Streaming large files with Document Intelligence Python SDK

Your answer