Memory in Microsoft Foundry Agent Service (preview)

Important

Memory (preview) in Foundry Agent Service and the Memory Store API (preview) are licensed to you as part of your Azure subscription and are subject to terms applicable to "Previews" in the Microsoft Product Terms, the Microsoft Products and Services Data Protection Addendum, and the Supplemental Terms of Use for Microsoft Azure Previews.

The latest preview offers new capabilities and enhancements, including:

Memory item operations to create, read, update, list, and delete individual memory records.
Store-level default retention controls, including default TTL for newly created memory stores.
Direct remember-or-forget synchronized memory command behavior.

Memory in Microsoft Foundry Agent Service is a managed, long-term memory solution. It enables agent continuity across sessions, devices, and workflows. By creating and managing memory stores, you can build agents that retain user preferences, maintain conversation history, and deliver personalized experiences.

This article provides an overview of agent memory, including its concepts, use cases, and limitations. For usage instructions, see Create and use memory in Foundry Agent Service.

What is memory?

Memory is persistent knowledge retained by an agent across sessions. Generally, agent memory falls into two categories:

Short-term memory tracks the current session's conversation and maintains immediate context for ongoing interactions. Agent orchestration frameworks typically manage this memory as part of the session context.
Long-term memory retains distilled knowledge across sessions. The model can recall and build on previous user interactions over time. Long-term memory requires a persistent system that extracts, consolidates, and manages knowledge.

Memory in Foundry Agent Service is designed for long-term memory. It extracts meaningful information from conversations, consolidates it into durable knowledge, and makes it available across sessions.

How memory works

Behind the scenes, memories are stored as items in a managed memory store. The system might apply consolidation and conflict‑resolution logic where applicable, such as to merge duplicate or overlapping user profile information.

Note

Consolidation behavior can vary by memory type and might change during the preview. For the latest behavior, see Create and use memory in Foundry Agent Service.

Memory operates in the following phases:

Extraction: When a user interacts with an agent, the system actively extracts key information from the conversation, such as user preferences, facts, and relevant context. For example, preferences like "allergic to dairy" and summaries of recent activities are identified and stored.
Consolidation: Extracted memories are consolidated to keep the memory store efficient and relevant. The system uses LLMs to merge similar or duplicate topics so that the agent doesn't store redundant information. Conflicting facts, such as a new allergy, are resolved to maintain an accurate memory.
Retrieval: When the agent needs to recall information, it searches the memory store for the most relevant memories. This allows the agent to quickly surface the right context, making conversations feel natural and informed. For best results, retrieve stable user profile information early in the conversation so the agent can personalize responses.

Here's an example of how memory can improve and personalize interactions between a recipe agent and a user who previously expressed a food allergy:

Tip

Need help deciding when to use memory? Consider these guidelines:

Use memory to capture user-specific or agent context that is learned and persists through interactions over time.
Use a Foundry IQ knowledge base to ground your agent on curated organizational content.
Use the file search tool to search user-provided documents during an interaction.

Memory types

Memory in Foundry Agent Service extracts and stores three types of long-term memory:

Memory type	Description	Retrieval guidance	Configuration
User profile memory	Durable user preferences and personal context, such as language preference, product defaults, or accessibility needs.	Retrieve near the beginning of each conversation to establish stable personalization context.	Enabled by default. To configure and disable, see Create a memory store.
Chat summary memory	Distilled summaries of prior conversation topics and threads.	Retrieve per turn using current conversation messages to surface relevant continuity context.	Enabled by default. To configure and disable, see Create a memory store.
Procedural memory	Reusable how-to routines and operating patterns inferred from prior interactions.	Retrieve when the user asks for a recurring workflow or task the agent has handled before.	Enabled by default. To configure and disable, see Create a memory store.

Memory management and retention

Memory in Foundry Agent Service supports fine-grained management and retention controls for production scenarios:

Item-level memory CRUD: Create, read, update, list, and delete individual memory items.
Store-level default retention: Set a default TTL (seconds) for memory entries created in a store.
Direct memory commands: Apply immediate remember-or-forget behavior when users explicitly request memory changes.

With these controls, you can balance personalization quality with privacy, compliance, and data lifecycle requirements.

Working with memory

There are two ways to use memory for agent interactions:

Memory search tool: Attach the memory search tool to a prompt agent to enable reading from and writing to the memory store during conversations. This approach is ideal for most scenarios because it simplifies memory management. For more information, see Use memories via an agent tool.
Memory store APIs: Interact directly with the memory store using the low-level APIs. This approach provides more control and flexibility for advanced use cases, including direct control over individual memory records, retention behavior, and explicit memory lifecycle operations. For more information, see Use memories via APIs.

Use cases

The following examples illustrate how memory can enhance various types of agents.

A customer support agent that remembers your name, previous issues and resolutions, ticket numbers, and your preferred contact method (chat, email, or call back). This memory helps you avoid repeating information, so conversations are more efficient and satisfying.
A personal shopping assistant that remembers your size in specific brands, preferred colors, past returns, and recent purchases. The agent can suggest relevant items as soon as you start a session and avoid recommending products you already own.

Security risks

When you work with memory in Foundry Agent Service, the large language model (LLM) extracts and consolidates memories based on conversations. Protect memory against threats such as prompt injection and memory corruption. These risks arise when incorrect or harmful data is stored in the agent's memory, potentially influencing agent responses and actions.

To mitigate security risks, consider these actions:

Use Azure AI Content Safety and its prompt injection detection: Validate all prompts entering or leaving the memory system to prevent malicious content.
Perform attack and adversarial testing: Regularly stress-test your agent for injection vulnerabilities through controlled adversarial exercises.

Limitations and quotas

The following limitations and quotas apply to memory in Foundry Agent Service. For broader limitations and quotas, see Foundry Agent Service limits, quotas, and regional support.

Limitations

Memory currently requires compatible Azure OpenAI chat and embedding model deployments. For a list of supported models, see Foundry Models sold by Azure.
For low-level memory APIs, you must set scope explicitly in each request. Automatic scope resolution from the caller's identity is only supported when you use the memory search tool with scope set to {{$userId}}. For more information, see Understand scope.

Note

In the latest preview, some memory store default options (such as enabling procedural memory and setting a default TTL) are configured at store creation time. Verify support for post-create updates in your API version.

Quotas

Maximum scopes per memory store: 100
Maximum memories per scope: 10,000
Search memories: 1,000 requests per minute
Update memories: 1,000 requests per minute

Region availability

Memory is available in the following regions:

Australia East
Brazil South
Canada East
East US 2
France Central
Italy North
Japan East
Korea Central
North Central US
Norway East
South Africa North
South India
Sweden Central
Switzerland North
UAE North
UK South
West US
West US 2
West US 3

Pricing

Memory is currently in public preview. Pricing and billing for memory and the Memory Store API can change during preview.

You're billed for usage of the underlying chat and embedding models you configure. For current pricing details, see Foundry Agent Service pricing.

Follow the end-to-end setup: Create and use memory in Foundry Agent Service
Confirm model availability: Azure OpenAI models and regions for Foundry Agent Service
Build a complete agent: Microsoft Foundry quickstart

Feedback

Was this page helpful?

Last updated on 2026-06-02