Namespace Microsoft::CognitiveServices::Speech

Article
2025-03-20

Summary

Members	Descriptions
enum PropertyId	Defines speech property ids. Changed in version 1.4.0.
enum OutputFormat	Output format.
enum ProfanityOption	Removes profanity (swearing), or replaces letters of profane words with stars. Added in version 1.5.0.
enum ResultReason	Specifies the possible reasons a recognition result might be generated.
enum CancellationReason	Defines the possible reasons a recognition result might be canceled.
enum CancellationErrorCode	Defines error code in case that CancellationReason is Error. Added in version 1.1.0.
enum NoMatchReason	Defines the possible reasons a recognition result might not be recognized.
enum ActivityJSONType	Defines the possible types for an activity json value. Added in version 1.5.0.
enum SpeechSynthesisOutputFormat	Defines the possible speech synthesis output audio formats. Updated in version 1.19.0.
enum StreamStatus	Defines the possible status of audio data stream. Added in version 1.4.0.
enum ServicePropertyChannel	Defines channels used to pass property settings to service. Added in version 1.5.0.
enum VoiceProfileType	Defines voice profile types.
enum RecognitionFactorScope	Defines the scope that a Recognition Factor is applied to.
enum PronunciationAssessmentGradingSystem	Defines the point system for pronunciation score calibration; default value is FivePoint. Added in version 1.14.0.
enum PronunciationAssessmentGranularity	Defines the pronunciation evaluation granularity; default value is Phoneme. Added in version 1.14.0.
enum SynthesisVoiceType	Defines the type of synthesis voices Added in version 1.16.0.
enum SynthesisVoiceGender	Defines the gender of synthesis voices Added in version 1.17.0.
enum SynthesisVoiceStatus	Defines the status of synthesis voices.
enum SpeechSynthesisBoundaryType	Defines the boundary type of speech synthesis boundary event Added in version 1.21.0.
enum SegmentationStrategy	The strategy used to determine when a spoken phrase has ended and a final Recognized result should be generated. Allowed values are "Default", "Time", and "Semantic".
class AsyncRecognizer	AsyncRecognizer abstract base class.
class AudioDataStream	Represents audio data stream used for operating audio data as a stream. Added in version 1.4.0.
class AutoDetectSourceLanguageConfig	Class that defines auto detection source configuration Updated in 1.13.0.
class AutoDetectSourceLanguageResult	Contains auto detected source language result Added in 1.8.0.
class BaseAsyncRecognizer	BaseAsyncRecognizer class.
class CancellationDetails	Contains detailed information about why a result was canceled.
class ClassLanguageModel	Represents a list of grammars for dynamic grammar scenarios. Added in version 1.7.0.
class Connection	Connection is a proxy class for managing connection to the speech service of the specified Recognizer. By default, a Recognizer autonomously manages connection to service when needed. The Connection class provides additional methods for users to explicitly open or close a connection and to subscribe to connection status changes. The use of Connection is optional. It is intended for scenarios where fine tuning of application behavior based on connection status is needed. Users can optionally call Open() to manually initiate a service connection before starting recognition on the Recognizer associated with this Connection. After starting a recognition, calling Open() or Close() might fail. This will not impact the Recognizer or the ongoing recognition. Connection might drop for various reasons, the Recognizer will always try to reinstitute the connection as required to guarantee ongoing operations. In all these cases Connected/Disconnected events will indicate the change of the connection status. Updated in version 1.17.0.
class ConnectionEventArgs	Provides data for the ConnectionEvent. Added in version 1.2.0.
class ConnectionMessage	ConnectionMessage represents implementation specific messages sent to and received from the speech service. These messages are provided for debugging purposes and should not be used for production use cases with the Azure Cognitive Services Speech Service. Messages sent to and received from the Speech Service are subject to change without notice. This includes message contents, headers, payloads, ordering, etc. Added in version 1.10.0.
class ConnectionMessageEventArgs	Provides data for the ConnectionMessageEvent.
class EmbeddedSpeechConfig	Class that defines embedded (offline) speech configuration.
class EventArgs	Base class for event arguments.
class EventSignal	Clients can connect to the event signal to receive events, or disconnect from the event signal to stop receiving events.
class EventSignalBase	Clients can connect to the event signal to receive events, or disconnect from the event signal to stop receiving events.
class Grammar	Represents base class grammar for customizing speech recognition. Added in version 1.5.0.
class GrammarList	Represents a list of grammars for dynamic grammar scenarios. Added in version 1.7.0.
class GrammarPhrase	Represents a phrase that may be spoken by the user. Added in version 1.5.0.
class HybridSpeechConfig	Class that defines hybrid (cloud and embedded) configurations for speech recognition or speech synthesis.
class KeywordRecognitionEventArgs	Class for the events emmited by the KeywordRecognizer.
class KeywordRecognitionModel	Represents keyword recognition model used with StartKeywordRecognitionAsync methods.
class KeywordRecognitionResult	Class that defines the results emitted by the KeywordRecognizer.
class KeywordRecognizer	Recognizer type that is specialized to only handle keyword activation.
class NoMatchDetails	Contains detailed information for NoMatch recognition results.
class PersonalVoiceSynthesisRequest	Class that defines the speech synthesis request for personal voice (aka.ms/azureai/personal-voice). This class is in preview and is subject to change. Added in version 1.39.0.
class PhraseListGrammar	Represents a phrase list grammar for dynamic grammar scenarios. Added in version 1.5.0.
class PronunciationAssessmentConfig	Class that defines pronunciation assessment configuration Added in 1.14.0.
class PronunciationAssessmentResult	Class for pronunciation assessment results.
class PronunciationContentAssessmentResult	Class for content assessment results.
class PropertyCollection	Class to retrieve or set a property value from a property collection.
class RecognitionEventArgs	Provides data for the RecognitionEvent.
class RecognitionResult	Contains detailed information about result of a recognition operation.
class Recognizer	Recognizer base class.
class SessionEventArgs	Base class for session event arguments.
class SmartHandle	Smart handle class.
class SourceLanguageConfig	Class that defines source language configuration, added in 1.8.0.
class SourceLanguageRecognizer	Class for source language recognizers. You can use this class for standalone language detection. Added in version 1.17.0.
class SpeechConfig	Class that defines configurations for speech / intent recognition, or speech synthesis.
class SpeechRecognitionCanceledEventArgs	Class for speech recognition canceled event arguments.
class SpeechRecognitionEventArgs	Class for speech recognition event arguments.
class SpeechRecognitionModel	Speech recognition model information.
class SpeechRecognitionResult	Base class for speech recognition results.
class SpeechRecognizer	Class for speech recognizers.
class SpeechSynthesisBookmarkEventArgs	Class for speech synthesis bookmark event arguments. Added in version 1.16.0.
class SpeechSynthesisCancellationDetails	Contains detailed information about why a result was canceled. Added in version 1.4.0.
class SpeechSynthesisEventArgs	Class for speech synthesis event arguments. Added in version 1.4.0.
class SpeechSynthesisRequest	Class that defines the speech synthesis request. This class is in preview and is subject to change. Added in version 1.37.0.
class SpeechSynthesisResult	Contains information about result from text-to-speech synthesis. Added in version 1.4.0.
class SpeechSynthesisVisemeEventArgs	Class for speech synthesis viseme event arguments. Added in version 1.16.0.
class SpeechSynthesisWordBoundaryEventArgs	Class for speech synthesis word boundary event arguments. Added in version 1.7.0.
class SpeechSynthesizer	Class for speech synthesizer. Updated in version 1.14.0.
class SpeechTranslationModel	Speech translation model information.
class SynthesisVoicesResult	Contains information about result from voices list of speech synthesizers. Added in version 1.16.0.
class VoiceInfo	Contains information about synthesis voice info Updated in version 1.17.0.

Members

enum PropertyId

Values	Descriptions
SpeechServiceConnection_Key	The Cognitive Services Speech Service subscription key. If you are using an intent recognizer, you need to specify the LUIS endpoint key for your particular LUIS app. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromSubscription.
SpeechServiceConnection_Endpoint	The Cognitive Services Speech Service endpoint (url). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromEndpoint. NOTE: This endpoint is not the same as the endpoint used to obtain an access token.
SpeechServiceConnection_Region	The Cognitive Services Speech Service region. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromSubscription, SpeechConfig::FromEndpoint, SpeechConfig::FromHost, SpeechConfig::FromAuthorizationToken.
SpeechServiceAuthorization_Token	The Cognitive Services Speech Service authorization token (aka access token). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromAuthorizationToken, SpeechRecognizer::SetAuthorizationToken, IntentRecognizer::SetAuthorizationToken, TranslationRecognizer::SetAuthorizationToken.
SpeechServiceAuthorization_Type	The Cognitive Services Speech Service authorization type. Currently unused.
SpeechServiceConnection_EndpointId	The Cognitive Services Custom Speech or Custom Voice Service endpoint id. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechConfig::SetEndpointId. NOTE: The endpoint id is available in the Custom Speech Portal, listed under Endpoint Details.
SpeechServiceConnection_Host	The Cognitive Services Speech Service host (url). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromHost.
SpeechServiceConnection_ProxyHostName	The host name of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_ProxyPort	The port of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_ProxyUserName	The user name of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_ProxyPassword	The password of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0.
SpeechServiceConnection_Url	The URL string built from speech configuration. This property is intended to be read-only. The SDK is using it internally. NOTE: Added in version 1.5.0.
SpeechServiceConnection_ProxyHostBypass	Specifies the list of hosts for which proxies should not be used. This setting overrides all other configurations. Hostnames are separated by commas and are matched in a case-insensitive manner. Wildcards are not supported.
SpeechServiceConnection_TranslationToLanguages	The list of comma separated languages used as target translation languages. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechTranslationConfig::AddTargetLanguage and SpeechTranslationConfig::GetTargetLanguages.
SpeechServiceConnection_TranslationVoice	The name of the Cognitive Service Text to Speech Service voice. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechTranslationConfig::SetVoiceName. NOTE: Valid voice names can be found here.
SpeechServiceConnection_TranslationFeatures	Translation features. For internal use.
SpeechServiceConnection_IntentRegion	The Language Understanding Service region. Under normal circumstances, you shouldn't have to use this property directly. Instead use LanguageUnderstandingModel.
SpeechServiceConnection_RecoMode	The Cognitive Services Speech Service recognition mode. Can be "INTERACTIVE", "CONVERSATION", "DICTATION". This property is intended to be read-only. The SDK is using it internally.
SpeechServiceConnection_RecoLanguage	The spoken language to be recognized (in BCP-47 format). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetSpeechRecognitionLanguage.
Speech_SessionId	The session id. This id is a universally unique identifier (aka UUID) representing a specific binding of an audio input stream and the underlying speech recognition instance to which it is bound. Under normal circumstances, you shouldn't have to use this property directly. Instead use SessionEventArgs::SessionId.
SpeechServiceConnection_UserDefinedQueryParameters	The query parameters provided by users. They will be passed to service as URL query parameters. Added in version 1.5.0.
SpeechServiceConnection_RecoBackend	The string to specify the backend to be used for speech recognition; allowed options are online and offline. Under normal circumstances, you shouldn't use this property directly. Currently the offline option is only valid when EmbeddedSpeechConfig is used. Added in version 1.19.0.
SpeechServiceConnection_RecoModelName	The name of the model to be used for speech recognition. Under normal circumstances, you shouldn't use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used. Added in version 1.19.0.
SpeechServiceConnection_RecoModelKey	This property is deprecated.
SpeechServiceConnection_RecoModelIniFile	The path to the ini file of the model to be used for speech recognition. Under normal circumstances, you shouldn't use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used. Added in version 1.19.0.
SpeechServiceConnection_SynthLanguage	The spoken language to be synthesized (e.g. en-US) Added in version 1.4.0.
SpeechServiceConnection_SynthVoice	The name of the TTS voice to be used for speech synthesis Added in version 1.4.0.
SpeechServiceConnection_SynthOutputFormat	The string to specify TTS output audio format Added in version 1.4.0.
SpeechServiceConnection_SynthEnableCompressedAudioTransmission	Indicates if use compressed audio format for speech synthesis audio transmission. This property only affects when SpeechServiceConnection_SynthOutputFormat is set to a pcm format. If this property is not set and GStreamer is available, SDK will use compressed format for synthesized audio transmission, and decode it. You can set this property to "false" to use raw pcm format for transmission on wire. Added in version 1.16.0.
SpeechServiceConnection_SynthBackend	The string to specify TTS backend; valid options are online and offline. Under normal circumstances, you shouldn't have to use this property directly. Instead, use EmbeddedSpeechConfig::FromPath or EmbeddedSpeechConfig::FromPaths to set the synthesis backend to offline. Added in version 1.19.0.
SpeechServiceConnection_SynthOfflineDataPath	The data file path(s) for offline synthesis engine; only valid when synthesis backend is offline. Under normal circumstances, you shouldn't have to use this property directly. Instead, use EmbeddedSpeechConfig::FromPath or EmbeddedSpeechConfig::FromPaths. Added in version 1.19.0.
SpeechServiceConnection_SynthOfflineVoice	The name of the offline TTS voice to be used for speech synthesis Under normal circumstances, you shouldn't use this property directly. Instead, use EmbeddedSpeechConfig::SetSpeechSynthesisVoice and EmbeddedSpeechConfig::GetSpeechSynthesisVoiceName. Added in version 1.19.0.
SpeechServiceConnection_SynthModelKey	This property is deprecated.
SpeechServiceConnection_VoicesListEndpoint	The Cognitive Services Speech Service voices list api endpoint (url). Under normal circumstances, you don't need to specify this property, SDK will construct it based on the region/host/endpoint of SpeechConfig. Added in version 1.16.0.
SpeechServiceConnection_InitialSilenceTimeoutMs	The initial silence timeout value (in milliseconds) used by the service. Added in version 1.5.0.
SpeechServiceConnection_EndSilenceTimeoutMs	The end silence timeout value (in milliseconds) used by the service. Added in version 1.5.0.
SpeechServiceConnection_EnableAudioLogging	A boolean value specifying whether audio logging is enabled in the service or not. Audio and content logs are stored either in Microsoft-owned storage, or in your own storage account linked to your Cognitive Services subscription (Bring Your Own Storage (BYOS) enabled Speech resource). Added in version 1.5.0.
SpeechServiceConnection_LanguageIdMode	The speech service connection language identifier mode. Can be "AtStart" (the default), or "Continuous". See Language Identification document. Added in 1.25.0.
SpeechServiceConnection_TranslationCategoryId	The speech service connection translation categoryId.
SpeechServiceConnection_AutoDetectSourceLanguages	The auto detect source languages Added in version 1.8.0.
SpeechServiceConnection_AutoDetectSourceLanguageResult	The auto detect source language result Added in version 1.8.0.
SpeechServiceResponse_RequestDetailedResultTrueFalse	The requested Cognitive Services Speech Service response output format (simple or detailed). Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechConfig::SetOutputFormat.
SpeechServiceResponse_RequestProfanityFilterTrueFalse	The requested Cognitive Services Speech Service response output profanity level. Currently unused.
SpeechServiceResponse_ProfanityOption	The requested Cognitive Services Speech Service response output profanity setting. Allowed values are "masked", "removed", and "raw". Added in version 1.5.0.
SpeechServiceResponse_PostProcessingOption	A string value specifying which post processing option should be used by service. Allowed values are "TrueText". Added in version 1.5.0.
SpeechServiceResponse_RequestWordLevelTimestamps	A boolean value specifying whether to include word-level timestamps in the response result. Added in version 1.5.0.
SpeechServiceResponse_StablePartialResultThreshold	The number of times a word has to be in partial results to be returned. Added in version 1.5.0.
SpeechServiceResponse_OutputFormatOption	A string value specifying the output format option in the response result. Internal use only. Added in version 1.5.0.
SpeechServiceResponse_RequestSnr	A boolean value specifying whether to include SNR (signal to noise ratio) in the response result. Added in version 1.18.0.
SpeechServiceResponse_TranslationRequestStablePartialResult	A boolean value to request for stabilizing translation partial results by omitting words in the end. Added in version 1.5.0.
SpeechServiceResponse_RequestWordBoundary	A boolean value specifying whether to request WordBoundary events. Added in version 1.21.0.
SpeechServiceResponse_RequestPunctuationBoundary	A boolean value specifying whether to request punctuation boundary in WordBoundary Events. Default is true. Added in version 1.21.0.
SpeechServiceResponse_RequestSentenceBoundary	A boolean value specifying whether to request sentence boundary in WordBoundary Events. Default is false. Added in version 1.21.0.
SpeechServiceResponse_SynthesisEventsSyncToAudio	A boolean value specifying whether the SDK should synchronize synthesis metadata events, (e.g. word boundary, viseme, etc.) to the audio playback. This only takes effect when the audio is played through the SDK. Default is true. If set to false, the SDK will fire the events as they come from the service, which may be out of sync with the audio playback. Added in version 1.31.0.
SpeechServiceResponse_JsonResult	The Cognitive Services Speech Service response output (in JSON format). This property is available on recognition result objects only.
SpeechServiceResponse_JsonErrorDetails	The Cognitive Services Speech Service error details (in JSON format). Under normal circumstances, you shouldn't have to use this property directly. Instead, use CancellationDetails::ErrorDetails.
SpeechServiceResponse_RecognitionLatencyMs	The recognition latency in milliseconds. Read-only, available on final speech/translation/intent results. This measures the latency between when an audio input is received by the SDK, and the moment the final result is received from the service. The SDK computes the time difference between the last audio fragment from the audio input that is contributing to the final result, and the time the final result is received from the speech service. Added in version 1.3.0.
SpeechServiceResponse_RecognitionBackend	The recognition backend. Read-only, available on speech recognition results. This indicates whether cloud (online) or embedded (offline) recognition was used to produce the result.
SpeechServiceResponse_SynthesisFirstByteLatencyMs	The speech synthesis first byte latency in milliseconds. Read-only, available on final speech synthesis results. This measures the latency between when the synthesis is started to be processed, and the moment the first byte audio is available. Added in version 1.17.0.
SpeechServiceResponse_SynthesisFinishLatencyMs	The speech synthesis all bytes latency in milliseconds. Read-only, available on final speech synthesis results. This measures the latency between when the synthesis is started to be processed, and the moment the whole audio is synthesized. Added in version 1.17.0.
SpeechServiceResponse_SynthesisUnderrunTimeMs	The underrun time for speech synthesis in milliseconds. Read-only, available on results in SynthesisCompleted events. This measures the total underrun time from PropertyId::AudioConfig_PlaybackBufferLengthInMs is filled to synthesis completed. Added in version 1.17.0.
SpeechServiceResponse_SynthesisConnectionLatencyMs	The speech synthesis connection latency in milliseconds. Read-only, available on final speech synthesis results. This measures the latency between when the synthesis is started to be processed, and the moment the HTTP/WebSocket connection is established. Added in version 1.26.0.
SpeechServiceResponse_SynthesisNetworkLatencyMs	The speech synthesis network latency in milliseconds. Read-only, available on final speech synthesis results. This measures the network round trip time. Added in version 1.26.0.
SpeechServiceResponse_SynthesisServiceLatencyMs	The speech synthesis service latency in milliseconds. Read-only, available on final speech synthesis results. This measures the service processing time to synthesize the first byte of audio. Added in version 1.26.0.
SpeechServiceResponse_SynthesisBackend	Indicates which backend the synthesis is finished by. Read-only, available on speech synthesis results, except for the result in SynthesisStarted event Added in version 1.17.0.
SpeechServiceResponse_DiarizeIntermediateResults	Determines if intermediate results contain speaker identification.
CancellationDetails_Reason	The cancellation reason. Currently unused.
CancellationDetails_ReasonText	The cancellation text. Currently unused.
CancellationDetails_ReasonDetailedText	The cancellation detailed text. Currently unused.
LanguageUnderstandingServiceResponse_JsonResult	The Language Understanding Service response output (in JSON format). Available via IntentRecognitionResult.Properties.
AudioConfig_DeviceNameForCapture	The device name for audio capture. Under normal circumstances, you shouldn't have to use this property directly. Instead, use AudioConfig::FromMicrophoneInput. NOTE: This property id was added in version 1.3.0.
AudioConfig_NumberOfChannelsForCapture	The number of channels for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0.
AudioConfig_SampleRateForCapture	The sample rate (in Hz) for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0.
AudioConfig_BitsPerSampleForCapture	The number of bits of each sample for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0.
AudioConfig_AudioSource	The audio source. Allowed values are "Microphones", "File", and "Stream". Added in version 1.3.0.
AudioConfig_DeviceNameForRender	The device name for audio render. Under normal circumstances, you shouldn't have to use this property directly. Instead, use AudioConfig::FromSpeakerOutput. Added in version 1.14.0.
AudioConfig_PlaybackBufferLengthInMs	Playback buffer length in milliseconds, default is 50 milliseconds.
AudioConfig_AudioProcessingOptions	Audio processing options in JSON format.
Speech_LogFilename	The file name to write logs. Added in version 1.4.0.
Speech_SegmentationSilenceTimeoutMs	A duration of detected silence, measured in milliseconds, after which speech-to-text will determine a spoken phrase has ended and generate a final Recognized result. Configuring this timeout may be helpful in situations where spoken input is significantly faster or slower than usual and default segmentation behavior consistently yields results that are too long or too short. Segmentation timeout values that are inappropriately high or low can negatively affect speech-to-text accuracy; this property should be carefully configured and the resulting behavior should be thoroughly validated as intended. The value must be in the range [100, 5000] milliseconds.
Speech_SegmentationMaximumTimeMs	The maximum length of a spoken phrase when using the "Time" segmentation strategy. As the length of a spoken phrase approaches this value, the Speech_SegmentationSilenceTimeoutMs will begin being reduced until either the phrase silence timeout is hit or the phrase reaches the maximum length. The value must be in the range [20000, 70000] milliseconds.
Speech_SegmentationStrategy	The strategy used to determine when a spoken phrase has ended and a final Recognized result should be generated. Allowed values are "Default", "Time", and "Semantic".
Conversation_ApplicationId	Identifier used to connect to the backend service. Added in version 1.5.0.
Conversation_DialogType	Type of dialog backend to connect to. Added in version 1.7.0.
Conversation_Initial_Silence_Timeout	Silence timeout for listening Added in version 1.5.0.
Conversation_From_Id	From id to be used on speech recognition activities Added in version 1.5.0.
Conversation_Conversation_Id	ConversationId for the session. Added in version 1.8.0.
Conversation_Custom_Voice_Deployment_Ids	Comma separated list of custom voice deployment ids. Added in version 1.8.0.
Conversation_Speech_Activity_Template	Speech activity template, stamp properties in the template on the activity generated by the service for speech. Added in version 1.10.0.
Conversation_ParticipantId	Your participant identifier in the current conversation. Added in version 1.13.0.
Conversation_Request_Bot_Status_Messages
Conversation_Connection_Id
DataBuffer_TimeStamp	The time stamp associated to data buffer written by client when using Pull/Push audio input streams. The time stamp is a 64-bit value with a resolution of 90 kHz. It is the same as the presentation timestamp in an MPEG transport stream. See https://en.wikipedia.org/wiki/Presentation_timestamp Added in version 1.5.0.
DataBuffer_UserId	The user id associated to data buffer written by client when using Pull/Push audio input streams. Added in version 1.5.0.
PronunciationAssessment_ReferenceText	The reference text of the audio for pronunciation evaluation. For this and the following pronunciation assessment parameters, see the table Pronunciation assessment parameters. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create or PronunciationAssessmentConfig::SetReferenceText. Added in version 1.14.0.
PronunciationAssessment_GradingSystem	The point system for pronunciation score calibration (FivePoint or HundredMark). Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0.
PronunciationAssessment_Granularity	The pronunciation evaluation granularity (Phoneme, Word, or FullText). Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0.
PronunciationAssessment_EnableMiscue	Defines if enable miscue calculation. With this enabled, the pronounced words will be compared to the reference text, and will be marked with omission/insertion based on the comparison. The default setting is False. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0.
PronunciationAssessment_PhonemeAlphabet	The pronunciation evaluation phoneme alphabet. The valid values are "SAPI" (default) and "IPA" Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::SetPhonemeAlphabet. Added in version 1.20.0.
PronunciationAssessment_NBestPhonemeCount	The pronunciation evaluation nbest phoneme count. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::SetNBestPhonemeCount. Added in version 1.20.0.
PronunciationAssessment_EnableProsodyAssessment	Whether to enable prosody assessment. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::EnableProsodyAssessment. Added in version 1.33.0.
PronunciationAssessment_Json	The json string of pronunciation assessment parameters Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0.
PronunciationAssessment_Params	Pronunciation assessment parameters. This property is intended to be read-only. The SDK is using it internally. Added in version 1.14.0.
PronunciationAssessment_ContentTopic	The content topic of the pronunciation assessment. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::EnableContentAssessmentWithTopic. Added in version 1.33.0.
SpeakerRecognition_Api_Version	Speaker Recognition backend API version. This property is added to allow testing and use of previous versions of Speaker Recognition APIs, where applicable. Added in version 1.18.0.
SpeechTranslation_ModelName	The name of a model to be used for speech translation. Do not use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used.
SpeechTranslation_ModelKey	This property is deprecated.
KeywordRecognition_ModelName	The name of a model to be used for keyword recognition. Do not use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used.
KeywordRecognition_ModelKey	This property is deprecated.
EmbeddedSpeech_EnablePerformanceMetrics	Enable the collection of embedded speech performance metrics which can be used to evaluate the capability of a device to use embedded speech. The collected data is included in results from specific scenarios like speech recognition. The default setting is "false". Note that metrics may not be available from all embedded speech scenarios.
SpeechSynthesisRequest_Pitch	The pitch of the synthesized speech.
SpeechSynthesisRequest_Rate	The rate of the synthesized speech.
SpeechSynthesisRequest_Volume	The volume of the synthesized speech.
SpeechSynthesisRequest_Style	The style of the synthesized speech.
SpeechSynthesisRequest_Temperature	The temperature of the synthesized speech. The temperature parameter only takes effect when the voice is a HD voice.
SpeechSynthesis_FrameTimeoutInterval	The timeout interval in milliseconds between synthesized speech audio frames. The greater of this and 10 seconds is used as a hard frame timeout. A speech synthesis timeout occurs if a) the time passed since the latest frame exceeds this timeout interval and the Real-Time Factor (RTF) exceeds its maximum value, or b) the time passed since the latest frame exceeds the hard frame timeout.
SpeechSynthesis_RtfTimeoutThreshold	The maximum Real-Time Factor (RTF) for speech synthesis. The RTF is calculated as RTF = f(d)/d where f(d) is the time taken to synthesize speech audio of duration d.

Defines speech property ids. Changed in version 1.4.0.

enum OutputFormat

Values	Descriptions
Simple
Detailed

Output format.

enum ProfanityOption

Values	Descriptions
Masked	Replaces letters in profane words with star characters.
Removed	Removes profane words.
Raw	Does nothing to profane words.

Removes profanity (swearing), or replaces letters of profane words with stars. Added in version 1.5.0.

enum ResultReason

Values	Descriptions
NoMatch	Indicates speech could not be recognized. More details can be found in the NoMatchDetails object.
Canceled	Indicates that the recognition was canceled. More details can be found using the CancellationDetails object.
RecognizingSpeech	Indicates the speech result contains hypothesis text.
RecognizedSpeech	Indicates the speech result contains final text that has been recognized. Speech Recognition is now complete for this phrase.
RecognizingIntent	Indicates the intent result contains hypothesis text and intent.
RecognizedIntent	Indicates the intent result contains final text and intent. Speech Recognition and Intent determination are now complete for this phrase.
TranslatingSpeech	Indicates the translation result contains hypothesis text and its translation(s).
TranslatedSpeech	Indicates the translation result contains final text and corresponding translation(s). Speech Recognition and Translation are now complete for this phrase.
SynthesizingAudio	Indicates the synthesized audio result contains a non-zero amount of audio data.
SynthesizingAudioCompleted	Indicates the synthesized audio is now complete for this phrase.
RecognizingKeyword	Indicates the speech result contains (unverified) keyword text. Added in version 1.3.0.
RecognizedKeyword	Indicates that keyword recognition completed recognizing the given keyword. Added in version 1.3.0.
SynthesizingAudioStarted	Indicates the speech synthesis is now started Added in version 1.4.0.
TranslatingParticipantSpeech	Indicates the transcription result contains hypothesis text and its translation(s) for other participants in the conversation. Added in version 1.8.0.
TranslatedParticipantSpeech	Indicates the transcription result contains final text and corresponding translation(s) for other participants in the conversation. Speech Recognition and Translation are now complete for this phrase. Added in version 1.8.0.
TranslatedInstantMessage	Indicates the transcription result contains the instant message and corresponding translation(s). Added in version 1.8.0.
TranslatedParticipantInstantMessage	Indicates the transcription result contains the instant message for other participants in the conversation and corresponding translation(s). Added in version 1.8.0.
EnrollingVoiceProfile	Indicates the voice profile is being enrolling and customers need to send more audio to create a voice profile. Added in version 1.12.0.
EnrolledVoiceProfile	The voice profile has been enrolled. Added in version 1.12.0.
RecognizedSpeakers	Indicates successful identification of some speakers. Added in version 1.12.0.
RecognizedSpeaker	Indicates successfully verified one speaker. Added in version 1.12.0.
ResetVoiceProfile	Indicates a voice profile has been reset successfully. Added in version 1.12.0.
DeletedVoiceProfile	Indicates a voice profile has been deleted successfully. Added in version 1.12.0.
VoicesListRetrieved	Indicates the voices list has been retrieved successfully. Added in version 1.16.0.

Specifies the possible reasons a recognition result might be generated.

enum CancellationReason

Values	Descriptions
Error	Indicates that an error occurred during speech recognition.
EndOfStream	Indicates that the end of the audio stream was reached.
CancelledByUser	Indicates that request was cancelled by the user. Added in version 1.14.0.

Defines the possible reasons a recognition result might be canceled.

enum CancellationErrorCode

Values	Descriptions
NoError	No error. If CancellationReason is EndOfStream, CancellationErrorCode is set to NoError.
AuthenticationFailure	Indicates an authentication error. An authentication error occurs if subscription key or authorization token is invalid, expired, or does not match the region being used.
BadRequest	Indicates that one or more recognition parameters are invalid or the audio format is not supported.
TooManyRequests	Indicates that the number of parallel requests exceeded the number of allowed concurrent transcriptions for the subscription.
Forbidden	Indicates that the free subscription used by the request ran out of quota.
ConnectionFailure	Indicates a connection error.
ServiceTimeout	Indicates a time-out error when waiting for response from service.
ServiceError	Indicates that an error is returned by the service.
ServiceUnavailable	Indicates that the service is currently unavailable.
RuntimeError	Indicates an unexpected runtime error.
ServiceRedirectTemporary	Indicates the Speech Service is temporarily requesting a reconnect to a different endpoint.
ServiceRedirectPermanent	Indicates the Speech Service is permanently requesting a reconnect to a different endpoint.
EmbeddedModelError	Indicates the embedded speech (SR or TTS) model is not available or corrupted.

Defines error code in case that CancellationReason is Error. Added in version 1.1.0.

enum NoMatchReason

Values	Descriptions
NotRecognized	Indicates that speech was detected, but not recognized.
InitialSilenceTimeout	Indicates that the start of the audio stream contained only silence, and the service timed out waiting for speech.
InitialBabbleTimeout	Indicates that the start of the audio stream contained only noise, and the service timed out waiting for speech.
KeywordNotRecognized	Indicates that the spotted keyword has been rejected by the keyword verification service. Added in version 1.5.0.
EndSilenceTimeout	Indicates that the audio stream contained only silence after the last recognized phrase.

Defines the possible reasons a recognition result might not be recognized.

enum ActivityJSONType

Values	Descriptions
Null
Object
Array
String
Double
UInt
Int
Boolean

Defines the possible types for an activity json value. Added in version 1.5.0.

enum SpeechSynthesisOutputFormat

Values	Descriptions
Raw8Khz8BitMonoMULaw	raw-8khz-8bit-mono-mulaw
Riff16Khz16KbpsMonoSiren	riff-16khz-16kbps-mono-siren Unsupported by the service. Do not use this value.
Audio16Khz16KbpsMonoSiren	audio-16khz-16kbps-mono-siren Unsupported by the service. Do not use this value.
Audio16Khz32KBitRateMonoMp3	audio-16khz-32kbitrate-mono-mp3
Audio16Khz128KBitRateMonoMp3	audio-16khz-128kbitrate-mono-mp3
Audio16Khz64KBitRateMonoMp3	audio-16khz-64kbitrate-mono-mp3
Audio24Khz48KBitRateMonoMp3	audio-24khz-48kbitrate-mono-mp3
Audio24Khz96KBitRateMonoMp3	audio-24khz-96kbitrate-mono-mp3
Audio24Khz160KBitRateMonoMp3	audio-24khz-160kbitrate-mono-mp3
Raw16Khz16BitMonoTrueSilk	raw-16khz-16bit-mono-truesilk
Riff16Khz16BitMonoPcm	riff-16khz-16bit-mono-pcm
Riff8Khz16BitMonoPcm	riff-8khz-16bit-mono-pcm
Riff24Khz16BitMonoPcm	riff-24khz-16bit-mono-pcm
Riff8Khz8BitMonoMULaw	riff-8khz-8bit-mono-mulaw
Raw16Khz16BitMonoPcm	raw-16khz-16bit-mono-pcm
Raw24Khz16BitMonoPcm	raw-24khz-16bit-mono-pcm
Raw8Khz16BitMonoPcm	raw-8khz-16bit-mono-pcm
Ogg16Khz16BitMonoOpus	ogg-16khz-16bit-mono-opus
Ogg24Khz16BitMonoOpus	ogg-24khz-16bit-mono-opus
Raw48Khz16BitMonoPcm	raw-48khz-16bit-mono-pcm
Riff48Khz16BitMonoPcm	riff-48khz-16bit-mono-pcm
Audio48Khz96KBitRateMonoMp3	audio-48khz-96kbitrate-mono-mp3
Audio48Khz192KBitRateMonoMp3	audio-48khz-192kbitrate-mono-mp3
Ogg48Khz16BitMonoOpus	ogg-48khz-16bit-mono-opus Added in version 1.16.0
Webm16Khz16BitMonoOpus	webm-16khz-16bit-mono-opus Added in version 1.16.0
Webm24Khz16BitMonoOpus	webm-24khz-16bit-mono-opus Added in version 1.16.0
Raw24Khz16BitMonoTrueSilk	raw-24khz-16bit-mono-truesilk Added in version 1.17.0
Raw8Khz8BitMonoALaw	raw-8khz-8bit-mono-alaw Added in version 1.17.0
Riff8Khz8BitMonoALaw	riff-8khz-8bit-mono-alaw Added in version 1.17.0
Webm24Khz16Bit24KbpsMonoOpus	webm-24khz-16bit-24kbps-mono-opus Audio compressed by OPUS codec in a WebM container, with bitrate of 24kbps, optimized for IoT scenario. (Added in 1.19.0)
Audio16Khz16Bit32KbpsMonoOpus	audio-16khz-16bit-32kbps-mono-opus Audio compressed by OPUS codec without container, with bitrate of 32kbps. (Added in 1.20.0)
Audio24Khz16Bit48KbpsMonoOpus	audio-24khz-16bit-48kbps-mono-opus Audio compressed by OPUS codec without container, with bitrate of 48kbps. (Added in 1.20.0)
Audio24Khz16Bit24KbpsMonoOpus	audio-24khz-16bit-24kbps-mono-opus Audio compressed by OPUS codec without container, with bitrate of 24kbps. (Added in 1.20.0)
Raw22050Hz16BitMonoPcm	raw-22050hz-16bit-mono-pcm Raw PCM audio at 22050Hz sampling rate and 16-bit depth. (Added in 1.22.0)
Riff22050Hz16BitMonoPcm	riff-22050hz-16bit-mono-pcm PCM audio at 22050Hz sampling rate and 16-bit depth, with RIFF header. (Added in 1.22.0)
Raw44100Hz16BitMonoPcm	raw-44100hz-16bit-mono-pcm Raw PCM audio at 44100Hz sampling rate and 16-bit depth. (Added in 1.22.0)
Riff44100Hz16BitMonoPcm	riff-44100hz-16bit-mono-pcm PCM audio at 44100Hz sampling rate and 16-bit depth, with RIFF header. (Added in 1.22.0)
AmrWb16000Hz	amr-wb-16000hz AMR-WB audio at 16kHz sampling rate. (Added in 1.24.0)
G72216Khz64Kbps	g722-16khz-64kbps G.722 audio at 16kHz sampling rate and 64kbps bitrate. (Added in 1.38.0)

Defines the possible speech synthesis output audio formats. Updated in version 1.19.0.

enum StreamStatus

Values	Descriptions
Unknown	The audio data stream status is unknown.
NoData	The audio data stream contains no data.
PartialData	The audio data stream contains partial data of a speak request.
AllData	The audio data stream contains all data of a speak request.
Canceled	The audio data stream was canceled.

Defines the possible status of audio data stream. Added in version 1.4.0.

enum ServicePropertyChannel

Values	Descriptions
UriQueryParameter	Uses URI query parameter to pass property settings to service.
HttpHeader	Uses HttpHeader to set a key/value in a HTTP header.

Defines channels used to pass property settings to service. Added in version 1.5.0.

enum VoiceProfileType

Values	Descriptions
TextIndependentIdentification	Text independent speaker identification.
TextDependentVerification	Text dependent speaker verification.
TextIndependentVerification	Text independent verification.

Defines voice profile types.

enum RecognitionFactorScope

Values	Descriptions
PartialPhrase	A Recognition Factor will apply to grammars that can be referenced as individual partial phrases.

Defines the scope that a Recognition Factor is applied to.

enum PronunciationAssessmentGradingSystem

Values	Descriptions
FivePoint	Five point calibration.
HundredMark	Hundred mark.

Defines the point system for pronunciation score calibration; default value is FivePoint. Added in version 1.14.0.

enum PronunciationAssessmentGranularity

Values	Descriptions
Phoneme	Shows the score on the full text, word and phoneme level.
Word	Shows the score on the full text and word level.
FullText	Shows the score on the full text level only.

Defines the pronunciation evaluation granularity; default value is Phoneme. Added in version 1.14.0.

enum SynthesisVoiceType

Values	Descriptions
OnlineNeural	Online neural voice.
OnlineStandard	Online standard voice.
OfflineNeural	Offline neural voice.
OfflineStandard	Offline standard voice.

Defines the type of synthesis voices Added in version 1.16.0.

enum SynthesisVoiceGender

Values	Descriptions
Unknown	Gender unknown.
Female	Female voice.
Male	Male voice.
Neutral	Neutral voice.

Defines the gender of synthesis voices Added in version 1.17.0.

enum SynthesisVoiceStatus

Values	Descriptions
Unknown	Voice status unknown.
GeneralAvailability	Voice is generally available.
Preview	Voice is in preview.
Deprecated	Voice is deprecated, do not use.

Defines the status of synthesis voices.

enum SpeechSynthesisBoundaryType

Values	Descriptions
Word	Word boundary.
Punctuation	Punctuation boundary.
Sentence	Sentence boundary.

Defines the boundary type of speech synthesis boundary event Added in version 1.21.0.

enum SegmentationStrategy

Values	Descriptions
Default	Use the default strategy and settings as determined by the Speech Service. Use in most situations.
Time	Uses a time based strategy where the amount of silence between speech is used to determine when to generate a final result.
Semantic	Uses an AI model to deterine the end of a spoken phrase based on the content of the phrase.

The strategy used to determine when a spoken phrase has ended and a final Recognized result should be generated. Allowed values are "Default", "Time", and "Semantic".

Share via