Bot Framework Teams channel stopped delivering after upstream SDK ValidationError — how to recover?

Svetlana Maslenkova 0 Reputation points
2026-05-21T15:27:42.8366667+00:00

What I'm trying to do

Get my Python-based Teams bot (using microsoft-teams-apps 2.0.11, single-tenant, deployed on Azure Container Apps) to receive activities from Microsoft Teams again. Web Chat works perfectly; Teams stopped delivering and won't resume.

Setup

  • Azure Bot resource configured as single-tenant, msaAppType=SingleTenant, msaAppTenantId=<tenant>, messaging endpoint set to my Container App's /api/messages.
  • AAD app signInAudience=AzureADMyOrg, service principal accountEnabled=true.
  • Bot endpoint reachable from public internet (returns 200 on /health, 401 on unauthenticated POST /api/messages, as expected).
  • Org-published Teams app (custom LOB) — status Unblocked in Manage apps. Third-party apps + Custom apps allowed org-wide. Permission policy assigned to the user allows the app.
  • App is installed, visible under "Built for your org", chat opens, input box is enabled.

What's working

  • Test in Web Chat from the Azure Bot resource: activity arrives at the bot endpoint, my backend is called, bot replies in webchat. Container logs show POST /api/messages 200 OK with webchat.botframework.com relay URLs.

What's failing

  • Microsoft Teams: I send a message in Teams, no POST /api/messages ever arrives at the container (verified with live az containerapp logs show). No new entries appear on Bot resource → Channels → Microsoft Teams either — Microsoft is neither delivering nor reporting failures.

What I think happened

Earlier today the bot was returning HTTP 500 on conversationUpdate and other system activities. The cause was an upstream SDK bug: microsoft-teams-apps 2.0.11 raises pydantic.ValidationError on activities whose entities[] contain newer types the SDK doesn't recognize:

  • ClientCapabilities
  • ProductInfo
  • QuotedReply
  • TargetedMessageInfo

Repro of the SDK error from container logs:


ERROR microsoft_teams.apps.http.http_server: 14 validation errors for tagged-union[...]

typing.entities.0.ClientInfoEntity.type

  Input should be 'clientInfo' [type=literal_error, input_value='ClientCapabilities', input_type=str]

... (similar errors for MentionEntity, MessageEntity, AIMessageEntity, StreamInfoEntity,

     CitationEntity, SensitiveUsageEntity, ProductInfoEntity, QuotedReplyEntity,

     TargetedMessageInfoEntity)

INFO: POST /api/messages HTTP/1.1" 500 Internal Server Error

The Bot resource's Channels page logged these as InternalServerError events for hours.

What I've done to mitigate

  1. Upgraded microsoft-teams-apps from 2.0.0a20 to stable 2.0.11. Did not help — the missing entity types are absent from 2.0.11 too.
  2. Deployed a FastAPI middleware that returns 200 {"status": "acknowledged"} whenever /api/messages would have returned 500 due to the SDK ValidationError. Verified working: container logs now show POST /api/messages 200 OK on every activity, and the Bot resource Channels page has no new error entries since the workaround was deployed (about 2h ago).
  3. Removed and re-added the Microsoft Teams channel on the Bot resource.

The actual problem now

Even though the bot is healthy and returning 200 to everything, Microsoft Teams is not attempting to deliver any activities to the bot endpoint. It's been about 2 hours since the mitigation went live, and Teams is silent — no successful delivery, no failure entry either.

My hypothesis: Microsoft's Teams channel routing put this bot in an exponential-backoff state due to the earlier 500s. Even with the bot now healthy, the backoff hasn't expired and won't until Microsoft's routing service re-tests the endpoint.

Questions

  1. Does Microsoft's Bot Framework Teams channel apply per-bot exponential backoff on 5xx responses, and if so, what does the cool-off window look like? I can't find anywhere this is documented.
  2. Is there a way to force Teams' channel routing to re-test the bot endpoint sooner, short of opening a paid support case? I already tried removing and re-adding the Teams channel on the Bot resource — didn't change anything.
  3. Is filing the SDK bug at https://github.com/microsoft/teams.py the right place for the underlying ValidationError issue, or is there a more appropriate channel for Bot Framework Python SDK schema issues?
  4. For anyone else who has hit this: did az bot msteams delete + az bot msteams create work as a recovery? Did you have to rotate the AAD app / Bot resource entirely? How long did Microsoft's automatic backoff take to expire in your case?

Any guidance appreciated.Even though the bot is healthy and returning 200 to everything, Microsoft Teams is not attempting to deliver any activities to the bot endpoint. It's been about 2 hours since the mitigation went live, and Teams is silent — no successful delivery, no failure entry either.

My hypothesis: Microsoft's Teams channel routing put this bot in an exponential-backoff state due to the earlier 500s. Even with the bot now healthy, the backoff hasn't expired and won't until Microsoft's routing service re-tests the endpoint.

Questions

  1. Does Microsoft's Bot Framework Teams channel apply per-bot exponential backoff on 5xx responses, and if so, what does the cool-off window look like? I can't find anywhere this is documented.
  2. Is there a way to force Teams' channel routing to re-test the bot endpoint sooner, short of opening a paid support case? I already tried removing and re-adding the Teams channel on the Bot resource — didn't change anything.
  3. Is filing the SDK bug at https://github.com/microsoft/teams.py the right place for the underlying ValidationError issue, or is there a more appropriate channel for Bot Framework Python SDK schema issues?
  4. For anyone else who has hit this: did az bot msteams delete + az bot msteams create work as a recovery? Did you have to rotate the AAD app / Bot resource entirely? How long did Microsoft's automatic backoff take to expire in your case?

Any guidance appreciated.Even though the bot is healthy and returning 200 to everything, Microsoft Teams is not attempting to deliver any activities to the bot endpoint. It's been about 2 hours since the mitigation went live, and Teams is silent — no successful delivery, no failure entry either.

My hypothesis: Microsoft's Teams channel routing put this bot in an exponential-backoff state due to the earlier 500s. Even with the bot now healthy, the backoff hasn't expired and won't until Microsoft's routing service re-tests the endpoint.

Questions

  1. Does Microsoft's Bot Framework Teams channel apply per-bot exponential backoff on 5xx responses, and if so, what does the cool-off window look like? I can't find anywhere this is documented.
  2. Is there a way to force Teams' channel routing to re-test the bot endpoint sooner, without opening a paid support case? I already tried removing and re-adding the Teams channel on the Bot resource — didn't change anything.
  3. For anyone else who has hit this: did az bot msteams delete + az bot msteams create work as a recovery? Did you have to rotate the AAD app / Bot resource entirely? How long did Microsoft's automatic backoff take to expire in your case?

Any guidance appreciated.

Azure AI Bot Service
Azure AI Bot Service

An Azure service that provides an integrated environment for bot development.


Answer recommended by moderator

Svetlana Maslenkova 0 Reputation points
2026-05-22T10:04:09.0833333+00:00

Hello,

Quick update: Teams delivery has resumed for this bot as of this morning. I'm not sure whether your team manually cleared the channel routing backoff state or whether it expired naturally - could you confirm?

Either way, the root cause was an upstream bug in the microsoft-teams-apps Python SDK: it raises pydantic.ValidationError on activities containing newer entity types like 'ClientCapabilities', which Microsoft's bot

service has started attaching to both system and message activities. The 5xx responses tripped the channel's backoff.

I've deployed a FastAPI middleware workaround that strips unknown entities before the SDK validates, so the bot now returns 200 consistently. I've also filed the SDK bug upstream:

https://github.com/microsoft/teams.py/issues/433

For future similar cases, it would be valuable if Microsoft could:

  1. Document the Teams channel's exponential backoff behavior - exact thresholds, cool-off windows, and how customers can detect when their bot is in backoff. Right now it's invisible and very hard to diagnose.
  2. Provide a customer-facing way to reset the routing state (e.g. via the Azure CLI or an "Restart channel" button) without needing to open a support case.

You can close this case unless you have other observations to share.

Thank you for the quick response.

Best regards,

Svetlana

Was this answer helpful?


1 additional answer

Sort by: Most helpful
  1. Alex Burlachenko 23,250 Reputation points MVP Volunteer Moderator
    2026-05-26T09:37:52.31+00:00

    hi Svetlana & thanks for join me here at Q&A portal :),

    ur hypothesis is probably correct. Teams channel very likely marked the endpoint unhealthy after prolonged 500s, and the routing layer is currently backing off delivery attempts. The important part is that Web Chat still works and Teams no longer even attempts delivery, which means this is no longer an app crash problem but a channel health state problem.

    The microsoft-teams-apps ValidationError definitely looks real. Those entity types like ClientCapabilities, QuotedReply, TargetedMessageInfo, ProductInfo etc are newer Teams payload entities and 2.0.11 clearly does not fully understand them yet. Filing this in teams.py is reasonable https://github.com/microsoft/teams.py

    Returning HTTP 200 instead of 500 was the correct mitigation. Teams channel really dislikes repeated 5xx on conversationUpdate and system activities. I would not rotate the AAD app or recreate the Bot resource yet. That is probably overkill right now. Try to completely uninstall the Teams app from the user, remove the personal chat, wait +-10-15 mins, reinstall the app, start a brand new conversation, watch az containerapp logs show --follow live during the test.

    Sometimes Teams only retries routing cleanly after a fresh conversation bootstrap.

    Then send a proactive message from the bot if u already have a stored conversation reference. If proactive outbound works but inbound still does not arrive, that strongly points to inbound routing suppression.

    As well I would like temporarily bypass strict entity parsing entirely for unknown entity types instead of validating the whole entities[] union strictly. Right now the SDK is effectively exploding because Teams evolved faster than the schema definitions. Classic distributed systems moment one side deployed on tuesday, the other side emotionally still lives in march lol.

    Bot Framework does not publicly document exact cooldown windows for Teams delivery after repeated 5xx. In practice I have seen recovery take anywhere from tens of minutes to several hours after the endpoint stabilizes.

    Removing and re-adding the Teams channel usually does not instantly clear internal health state. Same for restarting Container Apps. https://learn.microsoft.com/en-us/microsoftteams/platform/resources/bot-v3/bot-conversations/bots-conversations.

    If after several more hours fresh install, new conversation, proactive test, stable 200 responses, you still see absolutely zero POST attempts from Teams, then support probably needs to inspect the Bot Framework channel state backend-side

    rgds,

    Alex

    &

    If my answer was helpful pls mark it and additional thx if u follow me at Q&A portal
    

    Was this answer helpful?

    0 comments No comments

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.