Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
Summary
Members | Descriptions |
---|---|
enum PropertyId | Defines speech property ids. Changed in version 1.4.0. |
enum OutputFormat | Output format. |
enum ProfanityOption | Removes profanity (swearing), or replaces letters of profane words with stars. Added in version 1.5.0. |
enum ResultReason | Specifies the possible reasons a recognition result might be generated. |
enum CancellationReason | Defines the possible reasons a recognition result might be canceled. |
enum CancellationErrorCode | Defines error code in case that CancellationReason is Error. Added in version 1.1.0. |
enum NoMatchReason | Defines the possible reasons a recognition result might not be recognized. |
enum ActivityJSONType | Defines the possible types for an activity json value. Added in version 1.5.0. |
enum SpeechSynthesisOutputFormat | Defines the possible speech synthesis output audio formats. Updated in version 1.19.0. |
enum StreamStatus | Defines the possible status of audio data stream. Added in version 1.4.0. |
enum ServicePropertyChannel | Defines channels used to pass property settings to service. Added in version 1.5.0. |
enum VoiceProfileType | Defines voice profile types. |
enum RecognitionFactorScope | Defines the scope that a Recognition Factor is applied to. |
enum PronunciationAssessmentGradingSystem | Defines the point system for pronunciation score calibration; default value is FivePoint. Added in version 1.14.0. |
enum PronunciationAssessmentGranularity | Defines the pronunciation evaluation granularity; default value is Phoneme. Added in version 1.14.0. |
enum SynthesisVoiceType | Defines the type of synthesis voices Added in version 1.16.0. |
enum SynthesisVoiceGender | Defines the gender of synthesis voices Added in version 1.17.0. |
enum SynthesisVoiceStatus | Defines the status of synthesis voices. |
enum SpeechSynthesisBoundaryType | Defines the boundary type of speech synthesis boundary event Added in version 1.21.0. |
enum SegmentationStrategy | The strategy used to determine when a spoken phrase has ended and a final Recognized result should be generated. Allowed values are "Default", "Time", and "Semantic". |
class AsyncRecognizer | AsyncRecognizer abstract base class. |
class AudioDataStream | Represents audio data stream used for operating audio data as a stream. Added in version 1.4.0. |
class AutoDetectSourceLanguageConfig | Class that defines auto detection source configuration Updated in 1.13.0. |
class AutoDetectSourceLanguageResult | Contains auto detected source language result Added in 1.8.0. |
class BaseAsyncRecognizer | BaseAsyncRecognizer class. |
class CancellationDetails | Contains detailed information about why a result was canceled. |
class ClassLanguageModel | Represents a list of grammars for dynamic grammar scenarios. Added in version 1.7.0. |
class Connection | Connection is a proxy class for managing connection to the speech service of the specified Recognizer. By default, a Recognizer autonomously manages connection to service when needed. The Connection class provides additional methods for users to explicitly open or close a connection and to subscribe to connection status changes. The use of Connection is optional. It is intended for scenarios where fine tuning of application behavior based on connection status is needed. Users can optionally call Open() to manually initiate a service connection before starting recognition on the Recognizer associated with this Connection. After starting a recognition, calling Open() or Close() might fail. This will not impact the Recognizer or the ongoing recognition. Connection might drop for various reasons, the Recognizer will always try to reinstitute the connection as required to guarantee ongoing operations. In all these cases Connected/Disconnected events will indicate the change of the connection status. Updated in version 1.17.0. |
class ConnectionEventArgs | Provides data for the ConnectionEvent. Added in version 1.2.0. |
class ConnectionMessage | ConnectionMessage represents implementation specific messages sent to and received from the speech service. These messages are provided for debugging purposes and should not be used for production use cases with the Azure Cognitive Services Speech Service. Messages sent to and received from the Speech Service are subject to change without notice. This includes message contents, headers, payloads, ordering, etc. Added in version 1.10.0. |
class ConnectionMessageEventArgs | Provides data for the ConnectionMessageEvent. |
class EmbeddedSpeechConfig | Class that defines embedded (offline) speech configuration. |
class EventArgs | Base class for event arguments. |
class EventSignal | Clients can connect to the event signal to receive events, or disconnect from the event signal to stop receiving events. |
class EventSignalBase | Clients can connect to the event signal to receive events, or disconnect from the event signal to stop receiving events. |
class Grammar | Represents base class grammar for customizing speech recognition. Added in version 1.5.0. |
class GrammarList | Represents a list of grammars for dynamic grammar scenarios. Added in version 1.7.0. |
class GrammarPhrase | Represents a phrase that may be spoken by the user. Added in version 1.5.0. |
class HybridSpeechConfig | Class that defines hybrid (cloud and embedded) configurations for speech recognition or speech synthesis. |
class KeywordRecognitionEventArgs | Class for the events emmited by the KeywordRecognizer. |
class KeywordRecognitionModel | Represents keyword recognition model used with StartKeywordRecognitionAsync methods. |
class KeywordRecognitionResult | Class that defines the results emitted by the KeywordRecognizer. |
class KeywordRecognizer | Recognizer type that is specialized to only handle keyword activation. |
class NoMatchDetails | Contains detailed information for NoMatch recognition results. |
class PersonalVoiceSynthesisRequest | Class that defines the speech synthesis request for personal voice (aka.ms/azureai/personal-voice). This class is in preview and is subject to change. Added in version 1.39.0. |
class PhraseListGrammar | Represents a phrase list grammar for dynamic grammar scenarios. Added in version 1.5.0. |
class PronunciationAssessmentConfig | Class that defines pronunciation assessment configuration Added in 1.14.0. |
class PronunciationAssessmentResult | Class for pronunciation assessment results. |
class PronunciationContentAssessmentResult | Class for content assessment results. |
class PropertyCollection | Class to retrieve or set a property value from a property collection. |
class RecognitionEventArgs | Provides data for the RecognitionEvent. |
class RecognitionResult | Contains detailed information about result of a recognition operation. |
class Recognizer | Recognizer base class. |
class SessionEventArgs | Base class for session event arguments. |
class SmartHandle | Smart handle class. |
class SourceLanguageConfig | Class that defines source language configuration, added in 1.8.0. |
class SourceLanguageRecognizer | Class for source language recognizers. You can use this class for standalone language detection. Added in version 1.17.0. |
class SpeechConfig | Class that defines configurations for speech / intent recognition, or speech synthesis. |
class SpeechRecognitionCanceledEventArgs | Class for speech recognition canceled event arguments. |
class SpeechRecognitionEventArgs | Class for speech recognition event arguments. |
class SpeechRecognitionModel | Speech recognition model information. |
class SpeechRecognitionResult | Base class for speech recognition results. |
class SpeechRecognizer | Class for speech recognizers. |
class SpeechSynthesisBookmarkEventArgs | Class for speech synthesis bookmark event arguments. Added in version 1.16.0. |
class SpeechSynthesisCancellationDetails | Contains detailed information about why a result was canceled. Added in version 1.4.0. |
class SpeechSynthesisEventArgs | Class for speech synthesis event arguments. Added in version 1.4.0. |
class SpeechSynthesisRequest | Class that defines the speech synthesis request. This class is in preview and is subject to change. Added in version 1.37.0. |
class SpeechSynthesisResult | Contains information about result from text-to-speech synthesis. Added in version 1.4.0. |
class SpeechSynthesisVisemeEventArgs | Class for speech synthesis viseme event arguments. Added in version 1.16.0. |
class SpeechSynthesisWordBoundaryEventArgs | Class for speech synthesis word boundary event arguments. Added in version 1.7.0. |
class SpeechSynthesizer | Class for speech synthesizer. Updated in version 1.14.0. |
class SpeechTranslationModel | Speech translation model information. |
class SynthesisVoicesResult | Contains information about result from voices list of speech synthesizers. Added in version 1.16.0. |
class VoiceInfo | Contains information about synthesis voice info Updated in version 1.17.0. |
Members
enum PropertyId
Values | Descriptions |
---|---|
SpeechServiceConnection_Key | The Cognitive Services Speech Service subscription key. If you are using an intent recognizer, you need to specify the LUIS endpoint key for your particular LUIS app. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromSubscription. |
SpeechServiceConnection_Endpoint | The Cognitive Services Speech Service endpoint (url). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromEndpoint. NOTE: This endpoint is not the same as the endpoint used to obtain an access token. |
SpeechServiceConnection_Region | The Cognitive Services Speech Service region. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromSubscription, SpeechConfig::FromEndpoint, SpeechConfig::FromHost, SpeechConfig::FromAuthorizationToken. |
SpeechServiceAuthorization_Token | The Cognitive Services Speech Service authorization token (aka access token). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromAuthorizationToken, SpeechRecognizer::SetAuthorizationToken, IntentRecognizer::SetAuthorizationToken, TranslationRecognizer::SetAuthorizationToken. |
SpeechServiceAuthorization_Type | The Cognitive Services Speech Service authorization type. Currently unused. |
SpeechServiceConnection_EndpointId | The Cognitive Services Custom Speech or Custom Voice Service endpoint id. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechConfig::SetEndpointId. NOTE: The endpoint id is available in the Custom Speech Portal, listed under Endpoint Details. |
SpeechServiceConnection_Host | The Cognitive Services Speech Service host (url). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::FromHost. |
SpeechServiceConnection_ProxyHostName | The host name of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0. |
SpeechServiceConnection_ProxyPort | The port of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0. |
SpeechServiceConnection_ProxyUserName | The user name of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0. |
SpeechServiceConnection_ProxyPassword | The password of the proxy server used to connect to the Cognitive Services Speech Service. Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetProxy. NOTE: This property id was added in version 1.1.0. |
SpeechServiceConnection_Url | The URL string built from speech configuration. This property is intended to be read-only. The SDK is using it internally. NOTE: Added in version 1.5.0. |
SpeechServiceConnection_ProxyHostBypass | Specifies the list of hosts for which proxies should not be used. This setting overrides all other configurations. Hostnames are separated by commas and are matched in a case-insensitive manner. Wildcards are not supported. |
SpeechServiceConnection_TranslationToLanguages | The list of comma separated languages used as target translation languages. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechTranslationConfig::AddTargetLanguage and SpeechTranslationConfig::GetTargetLanguages. |
SpeechServiceConnection_TranslationVoice | The name of the Cognitive Service Text to Speech Service voice. Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechTranslationConfig::SetVoiceName. NOTE: Valid voice names can be found here. |
SpeechServiceConnection_TranslationFeatures | Translation features. For internal use. |
SpeechServiceConnection_IntentRegion | The Language Understanding Service region. Under normal circumstances, you shouldn't have to use this property directly. Instead use LanguageUnderstandingModel. |
SpeechServiceConnection_RecoMode | The Cognitive Services Speech Service recognition mode. Can be "INTERACTIVE", "CONVERSATION", "DICTATION". This property is intended to be read-only. The SDK is using it internally. |
SpeechServiceConnection_RecoLanguage | The spoken language to be recognized (in BCP-47 format). Under normal circumstances, you shouldn't have to use this property directly. Instead, use SpeechConfig::SetSpeechRecognitionLanguage. |
Speech_SessionId | The session id. This id is a universally unique identifier (aka UUID) representing a specific binding of an audio input stream and the underlying speech recognition instance to which it is bound. Under normal circumstances, you shouldn't have to use this property directly. Instead use SessionEventArgs::SessionId. |
SpeechServiceConnection_UserDefinedQueryParameters | The query parameters provided by users. They will be passed to service as URL query parameters. Added in version 1.5.0. |
SpeechServiceConnection_RecoBackend | The string to specify the backend to be used for speech recognition; allowed options are online and offline. Under normal circumstances, you shouldn't use this property directly. Currently the offline option is only valid when EmbeddedSpeechConfig is used. Added in version 1.19.0. |
SpeechServiceConnection_RecoModelName | The name of the model to be used for speech recognition. Under normal circumstances, you shouldn't use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used. Added in version 1.19.0. |
SpeechServiceConnection_RecoModelKey | This property is deprecated. |
SpeechServiceConnection_RecoModelIniFile | The path to the ini file of the model to be used for speech recognition. Under normal circumstances, you shouldn't use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used. Added in version 1.19.0. |
SpeechServiceConnection_SynthLanguage | The spoken language to be synthesized (e.g. en-US) Added in version 1.4.0. |
SpeechServiceConnection_SynthVoice | The name of the TTS voice to be used for speech synthesis Added in version 1.4.0. |
SpeechServiceConnection_SynthOutputFormat | The string to specify TTS output audio format Added in version 1.4.0. |
SpeechServiceConnection_SynthEnableCompressedAudioTransmission | Indicates if use compressed audio format for speech synthesis audio transmission. This property only affects when SpeechServiceConnection_SynthOutputFormat is set to a pcm format. If this property is not set and GStreamer is available, SDK will use compressed format for synthesized audio transmission, and decode it. You can set this property to "false" to use raw pcm format for transmission on wire. Added in version 1.16.0. |
SpeechServiceConnection_SynthBackend | The string to specify TTS backend; valid options are online and offline. Under normal circumstances, you shouldn't have to use this property directly. Instead, use EmbeddedSpeechConfig::FromPath or EmbeddedSpeechConfig::FromPaths to set the synthesis backend to offline. Added in version 1.19.0. |
SpeechServiceConnection_SynthOfflineDataPath | The data file path(s) for offline synthesis engine; only valid when synthesis backend is offline. Under normal circumstances, you shouldn't have to use this property directly. Instead, use EmbeddedSpeechConfig::FromPath or EmbeddedSpeechConfig::FromPaths. Added in version 1.19.0. |
SpeechServiceConnection_SynthOfflineVoice | The name of the offline TTS voice to be used for speech synthesis Under normal circumstances, you shouldn't use this property directly. Instead, use EmbeddedSpeechConfig::SetSpeechSynthesisVoice and EmbeddedSpeechConfig::GetSpeechSynthesisVoiceName. Added in version 1.19.0. |
SpeechServiceConnection_SynthModelKey | This property is deprecated. |
SpeechServiceConnection_VoicesListEndpoint | The Cognitive Services Speech Service voices list api endpoint (url). Under normal circumstances, you don't need to specify this property, SDK will construct it based on the region/host/endpoint of SpeechConfig. Added in version 1.16.0. |
SpeechServiceConnection_InitialSilenceTimeoutMs | The initial silence timeout value (in milliseconds) used by the service. Added in version 1.5.0. |
SpeechServiceConnection_EndSilenceTimeoutMs | The end silence timeout value (in milliseconds) used by the service. Added in version 1.5.0. |
SpeechServiceConnection_EnableAudioLogging | A boolean value specifying whether audio logging is enabled in the service or not. Audio and content logs are stored either in Microsoft-owned storage, or in your own storage account linked to your Cognitive Services subscription (Bring Your Own Storage (BYOS) enabled Speech resource). Added in version 1.5.0. |
SpeechServiceConnection_LanguageIdMode | The speech service connection language identifier mode. Can be "AtStart" (the default), or "Continuous". See Language Identification document. Added in 1.25.0. |
SpeechServiceConnection_TranslationCategoryId | The speech service connection translation categoryId. |
SpeechServiceConnection_AutoDetectSourceLanguages | The auto detect source languages Added in version 1.8.0. |
SpeechServiceConnection_AutoDetectSourceLanguageResult | The auto detect source language result Added in version 1.8.0. |
SpeechServiceResponse_RequestDetailedResultTrueFalse | The requested Cognitive Services Speech Service response output format (simple or detailed). Under normal circumstances, you shouldn't have to use this property directly. Instead use SpeechConfig::SetOutputFormat. |
SpeechServiceResponse_RequestProfanityFilterTrueFalse | The requested Cognitive Services Speech Service response output profanity level. Currently unused. |
SpeechServiceResponse_ProfanityOption | The requested Cognitive Services Speech Service response output profanity setting. Allowed values are "masked", "removed", and "raw". Added in version 1.5.0. |
SpeechServiceResponse_PostProcessingOption | A string value specifying which post processing option should be used by service. Allowed values are "TrueText". Added in version 1.5.0. |
SpeechServiceResponse_RequestWordLevelTimestamps | A boolean value specifying whether to include word-level timestamps in the response result. Added in version 1.5.0. |
SpeechServiceResponse_StablePartialResultThreshold | The number of times a word has to be in partial results to be returned. Added in version 1.5.0. |
SpeechServiceResponse_OutputFormatOption | A string value specifying the output format option in the response result. Internal use only. Added in version 1.5.0. |
SpeechServiceResponse_RequestSnr | A boolean value specifying whether to include SNR (signal to noise ratio) in the response result. Added in version 1.18.0. |
SpeechServiceResponse_TranslationRequestStablePartialResult | A boolean value to request for stabilizing translation partial results by omitting words in the end. Added in version 1.5.0. |
SpeechServiceResponse_RequestWordBoundary | A boolean value specifying whether to request WordBoundary events. Added in version 1.21.0. |
SpeechServiceResponse_RequestPunctuationBoundary | A boolean value specifying whether to request punctuation boundary in WordBoundary Events. Default is true. Added in version 1.21.0. |
SpeechServiceResponse_RequestSentenceBoundary | A boolean value specifying whether to request sentence boundary in WordBoundary Events. Default is false. Added in version 1.21.0. |
SpeechServiceResponse_SynthesisEventsSyncToAudio | A boolean value specifying whether the SDK should synchronize synthesis metadata events, (e.g. word boundary, viseme, etc.) to the audio playback. This only takes effect when the audio is played through the SDK. Default is true. If set to false, the SDK will fire the events as they come from the service, which may be out of sync with the audio playback. Added in version 1.31.0. |
SpeechServiceResponse_JsonResult | The Cognitive Services Speech Service response output (in JSON format). This property is available on recognition result objects only. |
SpeechServiceResponse_JsonErrorDetails | The Cognitive Services Speech Service error details (in JSON format). Under normal circumstances, you shouldn't have to use this property directly. Instead, use CancellationDetails::ErrorDetails. |
SpeechServiceResponse_RecognitionLatencyMs | The recognition latency in milliseconds. Read-only, available on final speech/translation/intent results. This measures the latency between when an audio input is received by the SDK, and the moment the final result is received from the service. The SDK computes the time difference between the last audio fragment from the audio input that is contributing to the final result, and the time the final result is received from the speech service. Added in version 1.3.0. |
SpeechServiceResponse_RecognitionBackend | The recognition backend. Read-only, available on speech recognition results. This indicates whether cloud (online) or embedded (offline) recognition was used to produce the result. |
SpeechServiceResponse_SynthesisFirstByteLatencyMs | The speech synthesis first byte latency in milliseconds. Read-only, available on final speech synthesis results. This measures the latency between when the synthesis is started to be processed, and the moment the first byte audio is available. Added in version 1.17.0. |
SpeechServiceResponse_SynthesisFinishLatencyMs | The speech synthesis all bytes latency in milliseconds. Read-only, available on final speech synthesis results. This measures the latency between when the synthesis is started to be processed, and the moment the whole audio is synthesized. Added in version 1.17.0. |
SpeechServiceResponse_SynthesisUnderrunTimeMs | The underrun time for speech synthesis in milliseconds. Read-only, available on results in SynthesisCompleted events. This measures the total underrun time from PropertyId::AudioConfig_PlaybackBufferLengthInMs is filled to synthesis completed. Added in version 1.17.0. |
SpeechServiceResponse_SynthesisConnectionLatencyMs | The speech synthesis connection latency in milliseconds. Read-only, available on final speech synthesis results. This measures the latency between when the synthesis is started to be processed, and the moment the HTTP/WebSocket connection is established. Added in version 1.26.0. |
SpeechServiceResponse_SynthesisNetworkLatencyMs | The speech synthesis network latency in milliseconds. Read-only, available on final speech synthesis results. This measures the network round trip time. Added in version 1.26.0. |
SpeechServiceResponse_SynthesisServiceLatencyMs | The speech synthesis service latency in milliseconds. Read-only, available on final speech synthesis results. This measures the service processing time to synthesize the first byte of audio. Added in version 1.26.0. |
SpeechServiceResponse_SynthesisBackend | Indicates which backend the synthesis is finished by. Read-only, available on speech synthesis results, except for the result in SynthesisStarted event Added in version 1.17.0. |
SpeechServiceResponse_DiarizeIntermediateResults | Determines if intermediate results contain speaker identification. |
CancellationDetails_Reason | The cancellation reason. Currently unused. |
CancellationDetails_ReasonText | The cancellation text. Currently unused. |
CancellationDetails_ReasonDetailedText | The cancellation detailed text. Currently unused. |
LanguageUnderstandingServiceResponse_JsonResult | The Language Understanding Service response output (in JSON format). Available via IntentRecognitionResult.Properties. |
AudioConfig_DeviceNameForCapture | The device name for audio capture. Under normal circumstances, you shouldn't have to use this property directly. Instead, use AudioConfig::FromMicrophoneInput. NOTE: This property id was added in version 1.3.0. |
AudioConfig_NumberOfChannelsForCapture | The number of channels for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0. |
AudioConfig_SampleRateForCapture | The sample rate (in Hz) for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0. |
AudioConfig_BitsPerSampleForCapture | The number of bits of each sample for audio capture. Internal use only. NOTE: This property id was added in version 1.3.0. |
AudioConfig_AudioSource | The audio source. Allowed values are "Microphones", "File", and "Stream". Added in version 1.3.0. |
AudioConfig_DeviceNameForRender | The device name for audio render. Under normal circumstances, you shouldn't have to use this property directly. Instead, use AudioConfig::FromSpeakerOutput. Added in version 1.14.0. |
AudioConfig_PlaybackBufferLengthInMs | Playback buffer length in milliseconds, default is 50 milliseconds. |
AudioConfig_AudioProcessingOptions | Audio processing options in JSON format. |
Speech_LogFilename | The file name to write logs. Added in version 1.4.0. |
Speech_SegmentationSilenceTimeoutMs | A duration of detected silence, measured in milliseconds, after which speech-to-text will determine a spoken phrase has ended and generate a final Recognized result. Configuring this timeout may be helpful in situations where spoken input is significantly faster or slower than usual and default segmentation behavior consistently yields results that are too long or too short. Segmentation timeout values that are inappropriately high or low can negatively affect speech-to-text accuracy; this property should be carefully configured and the resulting behavior should be thoroughly validated as intended. The value must be in the range [100, 5000] milliseconds. |
Speech_SegmentationMaximumTimeMs | The maximum length of a spoken phrase when using the "Time" segmentation strategy. As the length of a spoken phrase approaches this value, the Speech_SegmentationSilenceTimeoutMs will begin being reduced until either the phrase silence timeout is hit or the phrase reaches the maximum length. The value must be in the range [20000, 70000] milliseconds. |
Speech_SegmentationStrategy | The strategy used to determine when a spoken phrase has ended and a final Recognized result should be generated. Allowed values are "Default", "Time", and "Semantic". |
Conversation_ApplicationId | Identifier used to connect to the backend service. Added in version 1.5.0. |
Conversation_DialogType | Type of dialog backend to connect to. Added in version 1.7.0. |
Conversation_Initial_Silence_Timeout | Silence timeout for listening Added in version 1.5.0. |
Conversation_From_Id | From id to be used on speech recognition activities Added in version 1.5.0. |
Conversation_Conversation_Id | ConversationId for the session. Added in version 1.8.0. |
Conversation_Custom_Voice_Deployment_Ids | Comma separated list of custom voice deployment ids. Added in version 1.8.0. |
Conversation_Speech_Activity_Template | Speech activity template, stamp properties in the template on the activity generated by the service for speech. Added in version 1.10.0. |
Conversation_ParticipantId | Your participant identifier in the current conversation. Added in version 1.13.0. |
Conversation_Request_Bot_Status_Messages | |
Conversation_Connection_Id | |
DataBuffer_TimeStamp | The time stamp associated to data buffer written by client when using Pull/Push audio input streams. The time stamp is a 64-bit value with a resolution of 90 kHz. It is the same as the presentation timestamp in an MPEG transport stream. See https://en.wikipedia.org/wiki/Presentation_timestamp Added in version 1.5.0. |
DataBuffer_UserId | The user id associated to data buffer written by client when using Pull/Push audio input streams. Added in version 1.5.0. |
PronunciationAssessment_ReferenceText | The reference text of the audio for pronunciation evaluation. For this and the following pronunciation assessment parameters, see the table Pronunciation assessment parameters. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create or PronunciationAssessmentConfig::SetReferenceText. Added in version 1.14.0. |
PronunciationAssessment_GradingSystem | The point system for pronunciation score calibration (FivePoint or HundredMark). Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0. |
PronunciationAssessment_Granularity | The pronunciation evaluation granularity (Phoneme, Word, or FullText). Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0. |
PronunciationAssessment_EnableMiscue | Defines if enable miscue calculation. With this enabled, the pronounced words will be compared to the reference text, and will be marked with omission/insertion based on the comparison. The default setting is False. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0. |
PronunciationAssessment_PhonemeAlphabet | The pronunciation evaluation phoneme alphabet. The valid values are "SAPI" (default) and "IPA" Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::SetPhonemeAlphabet. Added in version 1.20.0. |
PronunciationAssessment_NBestPhonemeCount | The pronunciation evaluation nbest phoneme count. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::SetNBestPhonemeCount. Added in version 1.20.0. |
PronunciationAssessment_EnableProsodyAssessment | Whether to enable prosody assessment. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::EnableProsodyAssessment. Added in version 1.33.0. |
PronunciationAssessment_Json | The json string of pronunciation assessment parameters Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::Create. Added in version 1.14.0. |
PronunciationAssessment_Params | Pronunciation assessment parameters. This property is intended to be read-only. The SDK is using it internally. Added in version 1.14.0. |
PronunciationAssessment_ContentTopic | The content topic of the pronunciation assessment. Under normal circumstances, you shouldn't have to use this property directly. Instead, use PronunciationAssessmentConfig::EnableContentAssessmentWithTopic. Added in version 1.33.0. |
SpeakerRecognition_Api_Version | Speaker Recognition backend API version. This property is added to allow testing and use of previous versions of Speaker Recognition APIs, where applicable. Added in version 1.18.0. |
SpeechTranslation_ModelName | The name of a model to be used for speech translation. Do not use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used. |
SpeechTranslation_ModelKey | This property is deprecated. |
KeywordRecognition_ModelName | The name of a model to be used for keyword recognition. Do not use this property directly. Currently this is only valid when EmbeddedSpeechConfig is used. |
KeywordRecognition_ModelKey | This property is deprecated. |
EmbeddedSpeech_EnablePerformanceMetrics | Enable the collection of embedded speech performance metrics which can be used to evaluate the capability of a device to use embedded speech. The collected data is included in results from specific scenarios like speech recognition. The default setting is "false". Note that metrics may not be available from all embedded speech scenarios. |
SpeechSynthesisRequest_Pitch | The pitch of the synthesized speech. |
SpeechSynthesisRequest_Rate | The rate of the synthesized speech. |
SpeechSynthesisRequest_Volume | The volume of the synthesized speech. |
SpeechSynthesisRequest_Style | The style of the synthesized speech. |
SpeechSynthesisRequest_Temperature | The temperature of the synthesized speech. The temperature parameter only takes effect when the voice is a HD voice. |
SpeechSynthesis_FrameTimeoutInterval | The timeout interval in milliseconds between synthesized speech audio frames. The greater of this and 10 seconds is used as a hard frame timeout. A speech synthesis timeout occurs if a) the time passed since the latest frame exceeds this timeout interval and the Real-Time Factor (RTF) exceeds its maximum value, or b) the time passed since the latest frame exceeds the hard frame timeout. |
SpeechSynthesis_RtfTimeoutThreshold | The maximum Real-Time Factor (RTF) for speech synthesis. The RTF is calculated as RTF = f(d)/d where f(d) is the time taken to synthesize speech audio of duration d. |
Defines speech property ids. Changed in version 1.4.0.
enum OutputFormat
Values | Descriptions |
---|---|
Simple | |
Detailed |
Output format.
enum ProfanityOption
Values | Descriptions |
---|---|
Masked | Replaces letters in profane words with star characters. |
Removed | Removes profane words. |
Raw | Does nothing to profane words. |
Removes profanity (swearing), or replaces letters of profane words with stars. Added in version 1.5.0.
enum ResultReason
Values | Descriptions |
---|---|
NoMatch | Indicates speech could not be recognized. More details can be found in the NoMatchDetails object. |
Canceled | Indicates that the recognition was canceled. More details can be found using the CancellationDetails object. |
RecognizingSpeech | Indicates the speech result contains hypothesis text. |
RecognizedSpeech | Indicates the speech result contains final text that has been recognized. Speech Recognition is now complete for this phrase. |
RecognizingIntent | Indicates the intent result contains hypothesis text and intent. |
RecognizedIntent | Indicates the intent result contains final text and intent. Speech Recognition and Intent determination are now complete for this phrase. |
TranslatingSpeech | Indicates the translation result contains hypothesis text and its translation(s). |
TranslatedSpeech | Indicates the translation result contains final text and corresponding translation(s). Speech Recognition and Translation are now complete for this phrase. |
SynthesizingAudio | Indicates the synthesized audio result contains a non-zero amount of audio data. |
SynthesizingAudioCompleted | Indicates the synthesized audio is now complete for this phrase. |
RecognizingKeyword | Indicates the speech result contains (unverified) keyword text. Added in version 1.3.0. |
RecognizedKeyword | Indicates that keyword recognition completed recognizing the given keyword. Added in version 1.3.0. |
SynthesizingAudioStarted | Indicates the speech synthesis is now started Added in version 1.4.0. |
TranslatingParticipantSpeech | Indicates the transcription result contains hypothesis text and its translation(s) for other participants in the conversation. Added in version 1.8.0. |
TranslatedParticipantSpeech | Indicates the transcription result contains final text and corresponding translation(s) for other participants in the conversation. Speech Recognition and Translation are now complete for this phrase. Added in version 1.8.0. |
TranslatedInstantMessage | Indicates the transcription result contains the instant message and corresponding translation(s). Added in version 1.8.0. |
TranslatedParticipantInstantMessage | Indicates the transcription result contains the instant message for other participants in the conversation and corresponding translation(s). Added in version 1.8.0. |
EnrollingVoiceProfile | Indicates the voice profile is being enrolling and customers need to send more audio to create a voice profile. Added in version 1.12.0. |
EnrolledVoiceProfile | The voice profile has been enrolled. Added in version 1.12.0. |
RecognizedSpeakers | Indicates successful identification of some speakers. Added in version 1.12.0. |
RecognizedSpeaker | Indicates successfully verified one speaker. Added in version 1.12.0. |
ResetVoiceProfile | Indicates a voice profile has been reset successfully. Added in version 1.12.0. |
DeletedVoiceProfile | Indicates a voice profile has been deleted successfully. Added in version 1.12.0. |
VoicesListRetrieved | Indicates the voices list has been retrieved successfully. Added in version 1.16.0. |
Specifies the possible reasons a recognition result might be generated.
enum CancellationReason
Values | Descriptions |
---|---|
Error | Indicates that an error occurred during speech recognition. |
EndOfStream | Indicates that the end of the audio stream was reached. |
CancelledByUser | Indicates that request was cancelled by the user. Added in version 1.14.0. |
Defines the possible reasons a recognition result might be canceled.
enum CancellationErrorCode
Values | Descriptions |
---|---|
NoError | No error. If CancellationReason is EndOfStream, CancellationErrorCode is set to NoError. |
AuthenticationFailure | Indicates an authentication error. An authentication error occurs if subscription key or authorization token is invalid, expired, or does not match the region being used. |
BadRequest | Indicates that one or more recognition parameters are invalid or the audio format is not supported. |
TooManyRequests | Indicates that the number of parallel requests exceeded the number of allowed concurrent transcriptions for the subscription. |
Forbidden | Indicates that the free subscription used by the request ran out of quota. |
ConnectionFailure | Indicates a connection error. |
ServiceTimeout | Indicates a time-out error when waiting for response from service. |
ServiceError | Indicates that an error is returned by the service. |
ServiceUnavailable | Indicates that the service is currently unavailable. |
RuntimeError | Indicates an unexpected runtime error. |
ServiceRedirectTemporary | Indicates the Speech Service is temporarily requesting a reconnect to a different endpoint. |
ServiceRedirectPermanent | Indicates the Speech Service is permanently requesting a reconnect to a different endpoint. |
EmbeddedModelError | Indicates the embedded speech (SR or TTS) model is not available or corrupted. |
Defines error code in case that CancellationReason is Error. Added in version 1.1.0.
enum NoMatchReason
Values | Descriptions |
---|---|
NotRecognized | Indicates that speech was detected, but not recognized. |
InitialSilenceTimeout | Indicates that the start of the audio stream contained only silence, and the service timed out waiting for speech. |
InitialBabbleTimeout | Indicates that the start of the audio stream contained only noise, and the service timed out waiting for speech. |
KeywordNotRecognized | Indicates that the spotted keyword has been rejected by the keyword verification service. Added in version 1.5.0. |
EndSilenceTimeout | Indicates that the audio stream contained only silence after the last recognized phrase. |
Defines the possible reasons a recognition result might not be recognized.
enum ActivityJSONType
Values | Descriptions |
---|---|
Null | |
Object | |
Array | |
String | |
Double | |
UInt | |
Int | |
Boolean |
Defines the possible types for an activity json value. Added in version 1.5.0.
enum SpeechSynthesisOutputFormat
Values | Descriptions |
---|---|
Raw8Khz8BitMonoMULaw | raw-8khz-8bit-mono-mulaw |
Riff16Khz16KbpsMonoSiren | riff-16khz-16kbps-mono-siren Unsupported by the service. Do not use this value. |
Audio16Khz16KbpsMonoSiren | audio-16khz-16kbps-mono-siren Unsupported by the service. Do not use this value. |
Audio16Khz32KBitRateMonoMp3 | audio-16khz-32kbitrate-mono-mp3 |
Audio16Khz128KBitRateMonoMp3 | audio-16khz-128kbitrate-mono-mp3 |
Audio16Khz64KBitRateMonoMp3 | audio-16khz-64kbitrate-mono-mp3 |
Audio24Khz48KBitRateMonoMp3 | audio-24khz-48kbitrate-mono-mp3 |
Audio24Khz96KBitRateMonoMp3 | audio-24khz-96kbitrate-mono-mp3 |
Audio24Khz160KBitRateMonoMp3 | audio-24khz-160kbitrate-mono-mp3 |
Raw16Khz16BitMonoTrueSilk | raw-16khz-16bit-mono-truesilk |
Riff16Khz16BitMonoPcm | riff-16khz-16bit-mono-pcm |
Riff8Khz16BitMonoPcm | riff-8khz-16bit-mono-pcm |
Riff24Khz16BitMonoPcm | riff-24khz-16bit-mono-pcm |
Riff8Khz8BitMonoMULaw | riff-8khz-8bit-mono-mulaw |
Raw16Khz16BitMonoPcm | raw-16khz-16bit-mono-pcm |
Raw24Khz16BitMonoPcm | raw-24khz-16bit-mono-pcm |
Raw8Khz16BitMonoPcm | raw-8khz-16bit-mono-pcm |
Ogg16Khz16BitMonoOpus | ogg-16khz-16bit-mono-opus |
Ogg24Khz16BitMonoOpus | ogg-24khz-16bit-mono-opus |
Raw48Khz16BitMonoPcm | raw-48khz-16bit-mono-pcm |
Riff48Khz16BitMonoPcm | riff-48khz-16bit-mono-pcm |
Audio48Khz96KBitRateMonoMp3 | audio-48khz-96kbitrate-mono-mp3 |
Audio48Khz192KBitRateMonoMp3 | audio-48khz-192kbitrate-mono-mp3 |
Ogg48Khz16BitMonoOpus | ogg-48khz-16bit-mono-opus Added in version 1.16.0 |
Webm16Khz16BitMonoOpus | webm-16khz-16bit-mono-opus Added in version 1.16.0 |
Webm24Khz16BitMonoOpus | webm-24khz-16bit-mono-opus Added in version 1.16.0 |
Raw24Khz16BitMonoTrueSilk | raw-24khz-16bit-mono-truesilk Added in version 1.17.0 |
Raw8Khz8BitMonoALaw | raw-8khz-8bit-mono-alaw Added in version 1.17.0 |
Riff8Khz8BitMonoALaw | riff-8khz-8bit-mono-alaw Added in version 1.17.0 |
Webm24Khz16Bit24KbpsMonoOpus | webm-24khz-16bit-24kbps-mono-opus Audio compressed by OPUS codec in a WebM container, with bitrate of 24kbps, optimized for IoT scenario. (Added in 1.19.0) |
Audio16Khz16Bit32KbpsMonoOpus | audio-16khz-16bit-32kbps-mono-opus Audio compressed by OPUS codec without container, with bitrate of 32kbps. (Added in 1.20.0) |
Audio24Khz16Bit48KbpsMonoOpus | audio-24khz-16bit-48kbps-mono-opus Audio compressed by OPUS codec without container, with bitrate of 48kbps. (Added in 1.20.0) |
Audio24Khz16Bit24KbpsMonoOpus | audio-24khz-16bit-24kbps-mono-opus Audio compressed by OPUS codec without container, with bitrate of 24kbps. (Added in 1.20.0) |
Raw22050Hz16BitMonoPcm | raw-22050hz-16bit-mono-pcm Raw PCM audio at 22050Hz sampling rate and 16-bit depth. (Added in 1.22.0) |
Riff22050Hz16BitMonoPcm | riff-22050hz-16bit-mono-pcm PCM audio at 22050Hz sampling rate and 16-bit depth, with RIFF header. (Added in 1.22.0) |
Raw44100Hz16BitMonoPcm | raw-44100hz-16bit-mono-pcm Raw PCM audio at 44100Hz sampling rate and 16-bit depth. (Added in 1.22.0) |
Riff44100Hz16BitMonoPcm | riff-44100hz-16bit-mono-pcm PCM audio at 44100Hz sampling rate and 16-bit depth, with RIFF header. (Added in 1.22.0) |
AmrWb16000Hz | amr-wb-16000hz AMR-WB audio at 16kHz sampling rate. (Added in 1.24.0) |
G72216Khz64Kbps | g722-16khz-64kbps G.722 audio at 16kHz sampling rate and 64kbps bitrate. (Added in 1.38.0) |
Defines the possible speech synthesis output audio formats. Updated in version 1.19.0.
enum StreamStatus
Values | Descriptions |
---|---|
Unknown | The audio data stream status is unknown. |
NoData | The audio data stream contains no data. |
PartialData | The audio data stream contains partial data of a speak request. |
AllData | The audio data stream contains all data of a speak request. |
Canceled | The audio data stream was canceled. |
Defines the possible status of audio data stream. Added in version 1.4.0.
enum ServicePropertyChannel
Values | Descriptions |
---|---|
UriQueryParameter | Uses URI query parameter to pass property settings to service. |
HttpHeader | Uses HttpHeader to set a key/value in a HTTP header. |
Defines channels used to pass property settings to service. Added in version 1.5.0.
enum VoiceProfileType
Values | Descriptions |
---|---|
TextIndependentIdentification | Text independent speaker identification. |
TextDependentVerification | Text dependent speaker verification. |
TextIndependentVerification | Text independent verification. |
Defines voice profile types.
enum RecognitionFactorScope
Values | Descriptions |
---|---|
PartialPhrase | A Recognition Factor will apply to grammars that can be referenced as individual partial phrases. |
Defines the scope that a Recognition Factor is applied to.
enum PronunciationAssessmentGradingSystem
Values | Descriptions |
---|---|
FivePoint | Five point calibration. |
HundredMark | Hundred mark. |
Defines the point system for pronunciation score calibration; default value is FivePoint. Added in version 1.14.0.
enum PronunciationAssessmentGranularity
Values | Descriptions |
---|---|
Phoneme | Shows the score on the full text, word and phoneme level. |
Word | Shows the score on the full text and word level. |
FullText | Shows the score on the full text level only. |
Defines the pronunciation evaluation granularity; default value is Phoneme. Added in version 1.14.0.
enum SynthesisVoiceType
Values | Descriptions |
---|---|
OnlineNeural | Online neural voice. |
OnlineStandard | Online standard voice. |
OfflineNeural | Offline neural voice. |
OfflineStandard | Offline standard voice. |
Defines the type of synthesis voices Added in version 1.16.0.
enum SynthesisVoiceGender
Values | Descriptions |
---|---|
Unknown | Gender unknown. |
Female | Female voice. |
Male | Male voice. |
Neutral | Neutral voice. |
Defines the gender of synthesis voices Added in version 1.17.0.
enum SynthesisVoiceStatus
Values | Descriptions |
---|---|
Unknown | Voice status unknown. |
GeneralAvailability | Voice is generally available. |
Preview | Voice is in preview. |
Deprecated | Voice is deprecated, do not use. |
Defines the status of synthesis voices.
enum SpeechSynthesisBoundaryType
Values | Descriptions |
---|---|
Word | Word boundary. |
Punctuation | Punctuation boundary. |
Sentence | Sentence boundary. |
Defines the boundary type of speech synthesis boundary event Added in version 1.21.0.
enum SegmentationStrategy
Values | Descriptions |
---|---|
Default | Use the default strategy and settings as determined by the Speech Service. Use in most situations. |
Time | Uses a time based strategy where the amount of silence between speech is used to determine when to generate a final result. |
Semantic | Uses an AI model to deterine the end of a spoken phrase based on the content of the phrase. |
The strategy used to determine when a spoken phrase has ended and a final Recognized result should be generated. Allowed values are "Default", "Time", and "Semantic".