Share via

Azure AI Search DocumentIntelligenceLayoutSkill “Missing file reference object” with Blob indexer and file_data input

Mitchell D. Bussey 0 Reputation points
2026-04-17T17:17:24.4933333+00:00

I am unable to get my DocumentIntelligenceLayoutSkill to read my PDF file located in a Blob Storage pool.

The error I get is:
Required skill input was not in the expected format. Name: 'file_data', Source: 'file_data' Error: 'Missing file reference object'

I've started a debug session and I can see the file_data node in the Enriched Data Structure tree (value =""\"BinaryFileReference=>k:\\tikatemp\\azsearch-cache-628681c7-db0b-89a4-83da-bfab2a9c73e4\""").

My setup:

  • Data source: Azure Blob Storage
  • Indexer: Azure AI Search indexer
  • Skillset: DocumentIntelligenceLayoutSkill
  • Input field: /document/file_data

My Indexer:

{
  "@odata.context": "https://myIndexer.search.windows.net/$metadata#indexers/$entity",
  "@odata.etag": "\"0x8D9CA2100352C4D\"",
  "name": "ai-test-indexer",
  "description": "test",
  "dataSourceName": "ai-index-docs",
  "skillsetName": "ai-test-index-skillset",
  "targetIndexName": "ai-test-index",
  "disabled": null,
  "schedule": null,
  "parameters": {
    "batchSize": null,
    "maxFailedItems": null,
    "maxFailedItemsPerBatch": null,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "allowSkillsetToReadFileData": true,
      "parsingMode": "default"
    }
  },
  "fieldMappings": [],
  "outputFieldMappings": [],
  "cache": null,
  "encryptionKey": null
}

My Skillset:

{
  "@odata.etag": "\"0x8283701726A4\"",
  "name": "ai-test-index-skillset",
  "description": "Skillset using Document Intelligence for layout-aware extraction and embeddings",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
      "name": "#1",
      "description": "Analyze a document",
      "context": "/document",
      "outputMode": "oneToMany",
      "outputFormat": "text",
      "extractionOptions": [],
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data",
          "inputs": []
        }
      ],
      "outputs": []
    }
  ],
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.AIServicesByKey",
    "key": "<redacted>",
    "subdomainUrl": "<redacted>
}
	

My Data Source:

{
  "@odata.context": "https://<redacted>.search.windows.net/$metadata#datasources/$entity",
  "@odata.etag": "\"0x9283701726A4\"",
  "name": "ai-index-docs",
  "description": null,
  "type": "azureblob",
  "subtype": null,
  "indexerPermissionOptions": [],
  "credentials": {
    "connectionString": "DefaultEndpointsProtocol=https;AccountName=<redacted>;AccountKey=..."
  },
  "container": {
    "name": "aidocs",
    "query": null
  },
  "dataChangeDetectionPolicy": null,
  "dataDeletionDetectionPolicy": null,
  "encryptionKey": null,
  "identity": null
}
Azure AI Search
Azure AI Search

An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.

0 comments No comments

2 answers

Sort by: Most helpful
  1. Pilladi Padma Sai Manisha 7,140 Reputation points Microsoft External Staff Moderator
    2026-04-25T07:30:09.21+00:00

    Hi Mitchell,

    Thanks for sharing the details,this helped clarify the issue.

    The error“Missing file reference object” usually occurs when the Layout skill is not receiving the document in the expected internal format. In a blob indexer scenario, the file reference object is automatically generated and managed by the indexing pipeline, not something that should be passed manually.

    When /document/file_data is explicitly bound as an input, it can appear in the enrichment tree as a serialized value (for example, BinaryFileReference=...), which is not the structured object the Document Intelligence Layout skill expects. This leads to the error you’re seeing.

    Recommended fix: Remove the explicit file_data input from the skill and allow the Layout skill to read directly from the /document context:

    {
      "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
      "context": "/document",
      "outputMode": "oneToMany",
      "outputFormat": "text"
    }
    

    Your existing indexer setting "allowSkillsetToReadFileData": true is correct and should remain unchanged. This setting ensures that the document file is available within the enrichment pipeline.

    This behavior aligns with the design of Azure AI Search, where the indexer provides the document stream and skills operate on the enriched document context rather than manually constructed file inputs.

    References:

    Please try this change and rerun the indexer. If the issue persists, I’ll be happy to review the debug output with you in more detail.


  2. Divyesh Govaerdhanan 10,870 Reputation points Volunteer Moderator
    2026-04-20T23:10:43.2566667+00:00

    Hi Mitchell D. Bussey,

    Welcome to Microsoft Q&A,

    The "Missing file reference object" error points to two issues in your skillset definition.

    #1: Nested inputs field inside the input entry

    Your skill input has a non-standard nested "inputs": [] property:

    "inputs": [
      {
        "name": "file_data",
        "source": "/document/file_data",
        "inputs": []   // <-- this should not be here
      }
    ]
    

    InputFieldMappingEntry only accepts name and source. The extra "inputs": [] field is not part of the schema and can cause the runtime to fail when resolving the BinaryFileReference object, even though the file is visible in the debug enrichment tree. Remove it entirely.

    #2: Empty outputs array

    Your skill has "outputs": []. DocumentIntelligenceLayoutSkill requires at least one output to be declared. For oneToMany mode, you need text_sections at minimum.

    "outputs": [
        {
          "name": "text_sections",
          "targetName": "text_sections"
        }
      ]
    

    Your indexer already has "allowSkillsetToReadFileData": true and "parsingMode": "default", which is the right setup for Blob Storage with PDFs. The debug session confirming BinaryFileReference is present means the data is being passed correctly. The issue is purely in how the skill definition was serialized.

    After updating the skillset, reset your indexer to force a full re-run: use the Run Indexer API with "x-ms-client-request-id" or simply reset and rerun from the portal.

    Please Upvote and accept the answer if it helps!!


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.