Edit

Share via


Configure an enrichment cache

Important

This feature is in public preview under supplemental terms of use. Preview REST APIs support this feature.

This article explains how to add caching to a skillset pipeline so that you can modify downstream enrichment steps without a full rebuild every time. By default, a skillset is stateless, and changing any part of its composition requires a full rerun of the indexer. With an enrichment cache, the indexer determines which parts of the document tree must be refreshed based on skillset or indexer definition changes. Existing processed output is preserved and reused where possible.

Cached content is placed in Azure Storage using a connection string that you provide. These objects are created when you run the indexer. It should be considered an internal component managed by your search service and must not be modified.

  • A container named ms-az-search-indexercache-<alpha-numeric-string>
  • Tables named MsAzSearchIndexerCacheIndex<alpha-numeric-string>

Prerequisites

  • Azure Storage for storing cached enrichments. The storage account must be general purpose v2.

  • For blob indexing only, if you need synchronized document removal from both the cache and index when blobs are deleted from your data source, enable a deletion policy in the indexer. Without this policy, document deletion from the cache isn't supported.

You should be familiar with setting up indexers and skillsets. Start with indexer overview and then continue on to skillsets to learn about enrichment pipelines.

Limitations

Caution

If you're using the SharePoint indexer (Preview), you should avoid incremental enrichment. Under certain circumstances, the cache becomes invalid, requiring an indexer reset and full rebuild, should you choose to reload it.

Permissions

An Azure AI Search identity needs write-access to Azure Storage:

  • Storage Blob Data Contributor
  • Storage Table Data Contributor

The connection string syntax determines whether a system-assigned or user-assigned identity is used. For more information, see Connect to Azure Storage using a managed identity.

Set the cache property

Use this procedure for both new and existing indexers.

In the indexer definition, set cache with:

  • (Required) storageConnectionString set to an Azure Storage connection string.
  • (Optional) enableReprocessing (true by default). Set it to false to suspend incremental enrichment temporarily, and switch it back to true later.
  1. On the left, select Indexers.

  2. Select Add indexer to create a new indexer, or open an existing one in JSON edit mode.

  3. Enable incremental enrichment, set the enrichment cache storage account, and save the indexer.

    Screenshot of the Azure portal option for enrichment cache.

  4. Reset the indexer if it already exists.

  5. Run the indexer. This one-time full rebuild seeds the cache. After it's loaded, incremental reuse applies on subsequent runs.