Azure AI Search DocumentIntelligenceLayoutSkill “Missing file reference object” with Blob indexer and file_data input

Question

Azure AI Search DocumentIntelligenceLayoutSkill “Missing file reference object” with Blob indexer and file_data input

Mitchell D. Bussey 0

I am unable to get my DocumentIntelligenceLayoutSkill to read my PDF file located in a Blob Storage pool.

The error I get is:
Required skill input was not in the expected format. Name: 'file_data', Source: 'file_data' Error: 'Missing file reference object'

I've started a debug session and I can see the file_data node in the Enriched Data Structure tree (value =""\"BinaryFileReference=>k:\\tikatemp\\azsearch-cache-628681c7-db0b-89a4-83da-bfab2a9c73e4\""").

My setup:

Data source: Azure Blob Storage
Indexer: Azure AI Search indexer
Skillset: DocumentIntelligenceLayoutSkill
Input field: /document/file_data

My Indexer:

{
  "@odata.context": "https://myIndexer.search.windows.net/$metadata#indexers/$entity",
  "@odata.etag": "\"0x8D9CA2100352C4D\"",
  "name": "ai-test-indexer",
  "description": "test",
  "dataSourceName": "ai-index-docs",
  "skillsetName": "ai-test-index-skillset",
  "targetIndexName": "ai-test-index",
  "disabled": null,
  "schedule": null,
  "parameters": {
    "batchSize": null,
    "maxFailedItems": null,
    "maxFailedItemsPerBatch": null,
    "configuration": {
      "dataToExtract": "contentAndMetadata",
      "allowSkillsetToReadFileData": true,
      "parsingMode": "default"
    }
  },
  "fieldMappings": [],
  "outputFieldMappings": [],
  "cache": null,
  "encryptionKey": null
}

My Skillset:

{
  "@odata.etag": "\"0x8283701726A4\"",
  "name": "ai-test-index-skillset",
  "description": "Skillset using Document Intelligence for layout-aware extraction and embeddings",
  "skills": [
    {
      "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
      "name": "#1",
      "description": "Analyze a document",
      "context": "/document",
      "outputMode": "oneToMany",
      "outputFormat": "text",
      "extractionOptions": [],
      "inputs": [
        {
          "name": "file_data",
          "source": "/document/file_data",
          "inputs": []
        }
      ],
      "outputs": []
    }
  ],
  "cognitiveServices": {
    "@odata.type": "#Microsoft.Azure.Search.AIServicesByKey",
    "key": "<redacted>",
    "subdomainUrl": "<redacted>
}

My Data Source:

{
  "@odata.context": "https://<redacted>.search.windows.net/$metadata#datasources/$entity",
  "@odata.etag": "\"0x9283701726A4\"",
  "name": "ai-index-docs",
  "description": null,
  "type": "azureblob",
  "subtype": null,
  "indexerPermissionOptions": [],
  "credentials": {
    "connectionString": "DefaultEndpointsProtocol=https;AccountName=<redacted>;AccountKey=..."
  },
  "container": {
    "name": "aidocs",
    "query": null
  },
  "dataChangeDetectionPolicy": null,
  "dataDeletionDetectionPolicy": null,
  "encryptionKey": null,
  "identity": null
}

0 comments

2 answers

Your answer

Answer 1

Hi Mitchell,

Thanks for sharing the details,this helped clarify the issue.

The error“Missing file reference object” usually occurs when the Layout skill is not receiving the document in the expected internal format. In a blob indexer scenario, the file reference object is automatically generated and managed by the indexing pipeline, not something that should be passed manually.

When /document/file_data is explicitly bound as an input, it can appear in the enrichment tree as a serialized value (for example, BinaryFileReference=...), which is not the structured object the Document Intelligence Layout skill expects. This leads to the error you’re seeing.

Recommended fix: Remove the explicit file_data input from the skill and allow the Layout skill to read directly from the /document context:

{
  "@odata.type": "#Microsoft.Skills.Util.DocumentIntelligenceLayoutSkill",
  "context": "/document",
  "outputMode": "oneToMany",
  "outputFormat": "text"
}

Your existing indexer setting "allowSkillsetToReadFileData": true is correct and should remain unchanged. This setting ensures that the document file is available within the enrichment pipeline.

This behavior aligns with the design of Azure AI Search, where the indexer provides the document stream and skills operate on the enriched document context rather than manually constructed file inputs.

References:

Document Layout skill overview: https://learn.microsoft.com/azure/search/cognitive-search-skill-document-intelligence-layout
Skillsets and enrichment pipeline: https://learn.microsoft.com/azure/search/cognitive-search-working-with-skillsets
Blob indexing behavior: https://learn.microsoft.com/azure/search/search-how-to-index-azure-blob-storage

Please try this change and rerun the indexer. If the issue persists, I’ll be happy to review the debug output with you in more detail.

Pilladi Padma Sai Manisha 7,140 Reputation points Microsoft External Staff Moderator

2026-04-28T17:33:03.65+00:00

Hi

Following up to see if the below answer was helpful. If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Answer 2

Hi Mitchell D. Bussey,

Welcome to Microsoft Q&A,

The "Missing file reference object" error points to two issues in your skillset definition.

#1: Nested inputs field inside the input entry

Your skill input has a non-standard nested "inputs": [] property:

"inputs": [
  {
    "name": "file_data",
    "source": "/document/file_data",
    "inputs": []   // <-- this should not be here
  }
]

InputFieldMappingEntry only accepts name and source. The extra "inputs": [] field is not part of the schema and can cause the runtime to fail when resolving the BinaryFileReference object, even though the file is visible in the debug enrichment tree. Remove it entirely.

#2: Empty outputs array

Your skill has "outputs": []. DocumentIntelligenceLayoutSkill requires at least one output to be declared. For oneToMany mode, you need text_sections at minimum.

"outputs": [
    {
      "name": "text_sections",
      "targetName": "text_sections"
    }
  ]

Your indexer already has "allowSkillsetToReadFileData": true and "parsingMode": "default", which is the right setup for Blob Storage with PDFs. The debug session confirming BinaryFileReference is present means the data is being passed correctly. The issue is purely in how the skill definition was serialized.

After updating the skillset, reset your indexer to force a full re-run: use the Run Indexer API with "x-ms-client-request-id" or simply reset and rerun from the portal.

Please Upvote and accept the answer if it helps!!

Mitchell D. Bussey 0 Reputation points

2026-04-23T17:00:41.1433333+00:00

Hi @Divyesh Govaerdhanan ,

Thank you for the response. I've attempted your suggested fixes and that has not resolved the issue.

Particularly, removing the Inputs field is not possible. The field gets rehydrated if removed; I attempted both through the Azure skillset editor UI and using their REST API to remove it without success.

I am still getting the same error as before.

I tried a similar setup using the Document Extraction Skill and I also received the File Reference Object missing error if that helps provide any clues.

Share via

Azure AI Search DocumentIntelligenceLayoutSkill “Missing file reference object” with Blob indexer and file_data input

2 answers

Your answer