Ambient Audio Streaming gRPC API reference

The Ambient Audio Streaming (AAS) 2.0 gRPC API enables partners to stream ambient audio recordings in real time and submit them for downstream processing by Dragon Copilot.

The gRPC transport is one of two streaming options (alongside WebSocket). It exposes three RPCs:

RPC Type Description
RetrieveConfiguration Unary Returns the service configuration for a given partner, including supported audio formats, locale settings, and operational limits.
RecordAmbient Bidirectional streaming Streams an ambient audio recording to the server. The client opens a session, streams audio data in chunks, and closes the recording.
StartProcessing Unary Signals Dragon Copilot to begin processing a previously recorded ambient session.

Authentication

All gRPC RPCs require a bearer token in the authorization metadata key.

Supported token types:

  • S2S (Server-to-Server): Machine-to-machine token issued via MISE. After authentication, the service validates the calling application's identity against a configured allowlist.
  • Entra ID User Token: User-delegated token issued by Microsoft Entra ID.
  • EIS Bearer Token: JWT issued by the EHR Integration Service (EIS). See Token launch integration for details.

Required metadata

Key Description
authorization Bearer token (Bearer <token>)

Conditionally required metadata

Key Condition Description
user-guid or external-user-id When using M2M (S2S) token At least one must be provided. Returns 403 Forbidden if both are missing.
Key Description
x-ms-request-id Correlation identifier for tracing.
customer-id Customer/environment identifier (used for logging context).

Service definition

service AudioStreamingService {
  rpc RetrieveConfiguration(RetrieveConfigurationRequest) returns (RetrieveConfigurationResponse);
  rpc RecordAmbient(stream RecordAmbientRequest) returns (stream RecordAmbientResponse);
  rpc StartProcessing(StartProcessingRequest) returns (StartProcessingResponse);
}

RPCs

RetrieveConfiguration

Type: Unary RPC

Retrieves the service configuration for a given product, partner, and customer. The response includes supported audio locales and recording duration limits. Call this before starting a recording session to determine available languages and duration constraints.

Request

Field Type Required Description
product_id string Yes The Microsoft unique identifier of the product. Must be a valid GUID.
partner_id string Yes The Microsoft unique identifier for the partner. Must be a valid GUID.
customer_id string Yes The Microsoft unique identifier of the customer. Must be a valid GUID.
external_identifiers ExternalIdentifier[] No List of external identifiers. Known type: "userId" (the partner identifier of the user).

Example request (C#):

var request = new RetrieveConfigurationRequest
{
    ProductId = "<product-guid>",
    PartnerId = "<partner-guid>",
    CustomerId = "<customer-guid>"
};

var response = await client.RetrieveConfigurationAsync(request);

Response

Field Type Description
encounter_warn_seconds uint32 Duration in seconds at which processing quality may degrade. Warn the user that the recording is approaching the maximum duration.
encounter_max_seconds uint32 Maximum duration in seconds of audio allowed. Stop recording when this limit is reached.
supported_recording_locales string[] Locales accepted for audio recording input (IETF BCP 47, for example, en-US, de-DE).
supported_encounter_report_locales string[] Locales available for encounter report output (IETF BCP 47).

Example response:

{
  "encounterWarnSeconds": 2700,
  "encounterMaxSeconds": 4500,
  "supportedRecordingLocales": ["en-US", "fr-FR"],
  "supportedEncounterReportLocales": ["en-US", "fr-FR"]
}

Errors

gRPC status code Cause
INVALID_ARGUMENT Request is null, or a required field contains an invalid GUID.
UNAUTHENTICATED Missing or invalid bearer token.
INTERNAL Unexpected server error or failure communicating with the downstream system.

RecordAmbient

Type: Bidirectional streaming

Streams an ambient audio recording to the server in three phases:

  1. Open - The client sends a RecordingOpenRequest to initialize the session.
  2. Stream - The client sends DataChunkRequest messages containing audio data. The server periodically sends DataStorageResponse messages indicating cumulative bytes stored.
  3. Close - The client sends a RecordingCloseRequest to end the recording. The server responds with a RecordingCloseResponse confirming total bytes stored.
rpc RecordAmbient(stream RecordAmbientRequest) returns (stream RecordAmbientResponse);

Request messages

The client sends RecordAmbientRequest messages containing one of the following:

Variant Type Description
recording_open RecordingOpenRequest Starts a new recording session with session metadata. Must be sent first.
data_chunk DataChunkRequest Streams a chunk of audio data.
recording_close RecordingCloseRequest Signals the end of the recording.
RecordingOpenRequest
Field Type Required Description
recording_id string Yes A caller-defined unique identifier for the recording.
data_format DataFormat Yes The audio encoding format. Supported: PCM (signed 16-bit LE), Ogg Opus, WebM Opus.
ambient_session_data AmbientSession Yes Metadata for the ambient session (see AmbientSession).
actions string[] No AI actions to perform. Use "generate-draft" for draft generation. If omitted, only a transcript is generated.
reason RecordingStartReason No Why the recording was started. Values: RECORDING_START_REASON_UI, RECORDING_START_REASON_WAKE_WORD, RECORDING_START_REASON_SYSTEM_RESUME.
starting_offset uint32 No Byte offset for resuming after interruption. Set to the last confirmed data_stored value.
previous_encounter_sessions EncounterSession[] No Previous sessions that were part of the encounter. Critical for pause/resume scenarios.
output_form_ids string[] No Template form identifiers for note generation.

Example:

{
  "recordingOpen": {
    "recordingId": "<recording-uuid>",
    "dataFormat": {
      "opus": {
        "sampleRateHz": 16000
      }
    },
    "ambientSessionData": {
      "productId": "<product-guid>",
      "partnerId": "<partner-guid>",
      "customerId": "<customer-guid>",
      "correlationId": "<correlation-guid>",
      "practitionerInfo": {
        "externalIdentifiers": [
          { "type": "fhirId", "identifier": "<practitioner-fhir-id>" }
        ],
        "name": {
          "givenName": "Jane",
          "familyName": "Smith",
          "suffix": "MD"
        }
      },
      "externalIdentifiers": [
        { "type": "userId", "identifier": "<external-user-id>" }
      ],
      "creationDate": "2026-02-20T14:30:00.000Z",
      "localeInfo": {
        "recordingLocales": ["en-US"],
        "encounterReportLocale": "en-US",
        "encounterUxLocale": "en-US"
      }
    },
    "actions": ["generate-draft"],
    "reason": "RECORDING_START_REASON_UI",
    "startingOffset": 0
  }
}
DataChunkRequest
Field Type Required Description
data_start uint32 Yes Byte offset where this chunk begins within the overall recording.
data bytes Yes Raw audio bytes for this chunk.

Example:

{
  "dataChunk": {
    "dataStart": 0,
    "data": "<base64-encoded-audio-bytes>"
  }
}
RecordingCloseRequest
Field Type Required Description
recording_id string Yes Must match the recording_id from RecordingOpenRequest.
recording_length_seconds uint32 Yes Total recording duration in seconds.
reason RecordingStopReason No Why the recording was stopped. Values: RECORDING_STOP_REASON_UI, RECORDING_STOP_REASON_VOICE_COMMAND, RECORDING_STOP_REASON_BT_DISCONNECTED, RECORDING_STOP_REASON_EXTERNAL_INTERRUPTION, RECORDING_STOP_REASON_UNEXPECTED_ERROR, RECORDING_STOP_REASON_MAX_DURATION_EXCEEDED.

Example:

{
  "recordingClose": {
    "recordingId": "<recording-uuid>",
    "recordingLengthSeconds": 120,
    "reason": "RECORDING_STOP_REASON_UI"
  }
}

Response messages

The server sends RecordAmbientResponse messages containing one of the following:

Variant Type Description
data_stored DataStorageResponse Acknowledges cumulative bytes stored. Sent approximately every 10 KB.
recording_closes RecordingCloseResponse Confirms the recording was closed and the total bytes stored.

DataStorageResponse example:

{
  "dataStored": {
    "dataStored": 32768
  }
}

RecordingCloseResponse example:

{
  "recordingCloses": {
    "dataStored": 65536
  }
}

Note

A DataStorageResponse is not returned for every DataChunkRequest. Data is stored in server-side chunks that are independent of the client-side chunk size. When streaming is resumed after an interruption, the first DataStorageResponse contains the byte position stored prior to the interruption.

Usage notes

  • Send RecordingOpenRequest as the first message. Sending data before opening a session results in INVALID_ARGUMENT.
  • data_start should reflect the byte offset from the beginning of the recording. Duplicate bytes (from reconnection) are silently discarded.
  • Set starting_offset when resuming after interruption to the last confirmed data_stored value.
  • previous_encounter_sessions is critical for pause/resume scenarios to connect sessions in the backend.
  • Audio formats supported: PCM (signed 16-bit LE), Ogg Opus, and WebM Opus.

Errors

gRPC status code Cause
INVALID_ARGUMENT Null request, empty data chunk, invalid format, or data sent before RecordingOpenRequest.
FAILED_PRECONDITION Messages sent in invalid state order.
RESOURCE_EXHAUSTED Unable to write to the internal data service. Retry the request.
UNAUTHENTICATED Missing or invalid bearer token.
INTERNAL Unexpected server error.

StartProcessing

Type: Unary RPC

Signals Dragon Copilot to begin processing a previously recorded ambient session. Processing happens asynchronously; the response confirms the request was accepted.

rpc StartProcessing(StartProcessingRequest) returns (StartProcessingResponse);

Request

Field Type Required Description
ambient_session_data AmbientSession Yes Full ambient session metadata. Must include product_id, partner_id, customer_id, and correlation_id.
actions string[] Yes AI actions to perform. Must not be empty. Use "generate-draft" for draft generation.
request_time google.protobuf.Timestamp No Time the user initiated the processing request.
recordings_to_process string[] No Recording IDs to include in processing. If omitted, all recordings for the session are processed.

Example request:

{
  "ambientSessionData": {
    "productId": "<product-guid>",
    "partnerId": "<partner-guid>",
    "customerId": "<customer-guid>",
    "correlationId": "<correlation-guid>",
    "practitionerInfo": {
      "externalIdentifiers": [
        { "type": "fhirId", "identifier": "<practitioner-fhir-id>" }
      ],
      "name": {
        "givenName": "Jane",
        "familyName": "Smith",
        "suffix": "MD"
      }
    },
    "externalIdentifiers": [
      { "type": "userId", "identifier": "<external-user-id>" }
    ],
    "localeInfo": {
      "recordingLocales": ["en-US"],
      "encounterReportLocale": "en-US",
      "encounterUxLocale": "en-US"
    }
  },
  "requestTime": "2026-02-20T14:35:00.000Z",
  "actions": ["generate-draft"],
  "recordingsToProcess": ["<recording-uuid-1>", "<recording-uuid-2>"]
}

Response

Field Type Description
streaming_response.error_code uint32 0 indicates success. Non-zero indicates an error.
streaming_response.error_message string Human-readable message.
streaming_response.detailed_error_information string Additional diagnostic information.

Success response:

{
  "streamingResponse": {
    "errorCode": 0,
    "errorMessage": "SUCCESS",
    "detailedErrorInformation": "SUCCESS"
  }
}

Error response:

{
  "streamingResponse": {
    "errorCode": 1,
    "errorMessage": "Error processing StartProcessing request",
    "detailedErrorInformation": ""
  }
}

Usage notes

  • actions is required and must contain at least one action. An empty or null value results in INVALID_ARGUMENT.
  • recordings_to_process is optional. When omitted, all recordings for the session (identified by correlation_id) are processed.
  • Processing is asynchronous: a successful response confirms the request was accepted, not that processing is complete.
  • Downstream failures are returned in the response body as a non-zero error_code, not as gRPC status exceptions.

Errors

gRPC status code Cause
INVALID_ARGUMENT actions field is null or empty.
UNAUTHENTICATED Missing or invalid bearer token.

Common types

AmbientSession

Session metadata provided when opening a recording or starting processing.

Field Type Required Description
product_id string Yes Microsoft unique identifier of the product.
partner_id string Yes Microsoft unique identifier for the partner.
customer_id string Yes Microsoft unique identifier of the customer.
correlation_id string Yes Partner-assigned unique identifier of the session (GUID). Used to correlate results.
practitioner_info PractitionerInfo No Practitioner metadata (identifiers, name, specialty).
ehr_instance_id string No EHR instance identifier.
external_identifiers ExternalIdentifier[] No External identifiers. Use type "userId" for the partner's user identifier.
creation_date string (ISO 8601) No Session creation timestamp.
dst_offset_seconds int32 No DST offset in seconds.
client_info ClientInfo No Client application metadata (app ID, version, SDK version, device info).
locale_info LocaleInfo No Locale preferences for recording and report generation.

DataFormat

Audio encoding format. Specify exactly one of the following:

Variant Fields Description
pcm sample_rate_hz, bitcount, channels Signed 16-bit little-endian PCM.
opus sample_rate_hz Ogg Opus encoding.
webm_opus sample_rate_hz WebM Opus encoding.
byte_stream format_specifier Opaque byte stream with custom format specifier.

ExternalIdentifier

Field Type Description
type string Identifier type (for example, "userId", "fhirId", "npi", "encounterId").
identifier string The identifier value.

Best practices

  1. Call RetrieveConfiguration before recording to confirm supported locales and duration limits.
  2. Implement reconnection logic with exponential backoff for RecordAmbient streaming.
  3. Track data_stored values to set starting_offset correctly when resuming after interruption.
  4. Include previous_encounter_sessions when splitting recordings across multiple sessions.
  5. Call StartProcessing after closing the recording stream. It is a separate unary call.
  6. Monitor gRPC status codes to distinguish transient errors (RESOURCE_EXHAUSTED) from permanent failures (INVALID_ARGUMENT).