Azure Text to speech (Preview)

Azure Text-to-speech allows you to build apps and services that speak naturally with more than 400 voices across 140 languages and dialects.

This connector is available in the following products and regions:

Service Class Regions
Logic Apps Standard All Logic Apps regions except the following:
     -   Azure China regions
Power Automate Premium All Power Automate regions except the following:
     -   China Cloud operated by 21Vianet
Power Apps Premium All Power Apps regions except the following:
     -   China Cloud operated by 21Vianet
Contact
Name Speech Service Power Platform Team
URL https://docs.microsoft.com/azure/cognitive-services/speech-service/support
Email [email protected]
Connector Metadata
Publisher Microsoft
Website https://docs.microsoft.com/azure/cognitive-services/speech-service/
Privacy policy https://privacy.microsoft.com
Categories AI;Website

The Speech service allows you to convert text into synthesized speech and get a list of supported voices for a region by using a REST API.

Pre-requisites

You will need the following to proceed:

Creating a connection

The connector supports the following authentication types:

Api Key ApiKey All regions Shareable
Microsoft Entra ID Integrated Use Microsoft Entra ID to access your speech service. All regions Not shareable
Default [DEPRECATED] This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility. All regions Not shareable

Api Key

Auth ID: keyBasedAuth

Applicable: All regions

ApiKey

This is shareable connection. If the power app is shared with another user, connection is shared as well. For more information, please see the Connectors overview for canvas apps - Power Apps | Microsoft Docs

Name Type Description Required
Account Key securestring Speech service key True
Region string Speech service region (Example: eastus) True

Microsoft Entra ID Integrated

Auth ID: tokenBasedAuth

Applicable: All regions

Use Microsoft Entra ID to access your speech service.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name Type Description Required
Resource ID string The cognitive services resource id (Example: /subscriptions//resourceGroups//providers/Microsoft.CognitiveServices/accounts/) True
Custom Subdomain string Custom subdomain endpoint url (Example: contoso) True

Default [DEPRECATED]

Applicable: All regions

This option is only for older connections without an explicit authentication type, and is only provided for backward compatibility.

This is not shareable connection. If the power app is shared with another user, another user will be prompted to create new connection explicitly.

Name Type Description Required
Account Key securestring Azure Cognitive Services for Neural Text-to-speech Account Key True
Region string Speech service region (Example: eastus) True

Throttling Limits

Name Calls Renewal Period
API calls per connection 100 60 seconds

Actions

Convert text to speech

Convert single text to speech.

Convert text to speech with SSML

Convert text to speech by using Speech Synthesis Markup Language (SSML)

Get list of voices

Get a full list of voices for a specific region or endpoint.

Convert text to speech

Convert single text to speech.

Parameters

Name Key Required Type Description
Voice Name
voiceName True string

The voice name output for text to speech. For example: en-US-JennyNeural.

Locale
locale True string

The locale of the contained data. For example: en-US.

Synthesized Text
synthesizedText True string

The synthesized text that needs to be converted to speech.

Output Audio Format
outputFormat string

The non-streaming audio formats. Default: riff-24khz-16bit-mono-pcm.

Style
style string

The express style of speech. For example: cheerful.

Speaking Rate
speakingRate string

The speed rate of speech. For example: -40.00%.

Convert text to speech with SSML

Convert text to speech by using Speech Synthesis Markup Language (SSML)

Parameters

Name Key Required Type Description
SSML Text
ssmlText True string

The text in SSML format (e.g. power connector)

Output Audio Format
outputFormat string

The non-streaming audio formats. Default: riff-24khz-16bit-mono-pcm.

Get list of voices

Get a full list of voices for a specific region or endpoint.

Returns

Name Path Type Description
array of object
items
object

array