Text To Speech (ElevenLabs)

Text To Speech interface

The Text to Speech input lets you turn written text into natural, spoken audio. Instead of recording a voice yourself, you simply enter the text you want to hear, and Composer generates realistic speech automatically using ElevenLabs — a high-quality voice generation service known for clear, expressive voices.

Whether you need narration, voice prompts, or spoken messages, Text to Speech lets you generate clear, natural-sounding audio directly from text.

What you can use it for

Text-to-Speech is useful in many situations, for example:

  • Creating voice narration for videos or presentations
  • Adding spoken feedback or messages to applications
  • Improving accessibility by offering audio versions of text
  • Generating placeholder or prototype voice content quickly
  • Because the voice is generated from text, you can easily update or reuse it without recording anything again.

Requirements

To use the Text to Speech input, you need:

  • An active ElevenLabs subscription (required for commercial use).
  • A valid ElevenLabs API key.

Responsibility

We provide a technical integration with ElevenLabs. Users supply their own API key and are responsible for complying with ElevenLabs' licensing and use policies.

ElevenLabs Terms of Use.

💡 ElevenLabs credits

Usage is based on a credit system, which limits how much speech you can generate. The available credits and limits depend on the plan you choose.

For more information, please visit ElevenLabs.

Troubleshooting

Known ElevenLabs issues may occasionally affect Text to Speech output. These can include audio glitches, sharp breaths between paragraphs, pronunciation issues, unexpected audio artifacts, or small variations in voice quality. These behaviors originate from ElevenLabs' voice generation system and are outside of Composer's control.

If audio sounds distorted or unexpected, regenerating the affected or previous paragraph usually resolves the issue.

For more details, known limitations, and recommended workarounds, refer to ElevenLabs' official troubleshooting documentation.

Configuration

To allow Composer to use ElevenLabs, you must provide your ElevenLabs API key as a system environment variable. Composer reads this key automatically when it starts.

Step 1: Get your ElevenLabs API key

  • Sign in to your ElevenLabs account.
  • Go to your account or profile settings.
  • Create and copy your API key. Keep this key private.

Step 2: Set API Key Permission

  • In your ElevenLabs account, make sure to give your API key permission to read Text to Speech.

API Key Permission setting

Step 3: Set the environment variable

Create a system environment variable:

  • Name: COMPOSER_ELEVENLABS_APIKEY
  • Value: your ElevenLabs API key

Windows

  1. Open Start and search for Environment Variables.
  2. Select Edit the system environment variables.
  3. Click Environment Variables.
  4. Under User variables, click New.
  5. Set:
    • Name: COMPOSER_ELEVENLABS_APIKEY
    • Value: your ElevenLabs API key
  6. Click OK to save.
  7. Reboot your system.

Linux

To make the environment variable permanent, you need to add it to your shell configuration file. This ensures it is available every time you log in.

  1. Open a terminal.
  2. Open your shell configuration file (for most systems, this is ~/.bashrc or ~/.zshrc).
  3. Add the following line at the end of the file:
    export COMPOSER_ELEVENLABS_APIKEY=your_api_key_here
    
  4. Save the file.
  5. Restart your terminal, or log out and log back in.
  6. Restart Composer.

The API key will now be available automatically every time the system starts.

💡 Important notes

  • Composer only reads environment variables when it starts.
  • If you change API key, you must restart your computer.
  • Keep your API key private and do not share it.
  • Speech To Text — the inverse direction: transcribe spoken audio into text in real time using Whisper, with optional on-screen subtitles and Script Engine triggers.
  • Crystal Speech — AI-powered denoiser for live mic input; useful when capturing the human side of a Text To Speech-driven dialogue.
  • LLM (Ollama) — pair with Text To Speech to build an AI presenter pipeline: an LLM generates the script, ElevenLabs speaks it.

Text To Speech (ElevenLabs) - Settings

General
Property Description
Start when loaded Whether to start playback automatically when the project loads. [default=true]. Saves a manual click when the project is loaded fresh; turn off if you want to trigger speech only on demand from a script or Play command.
Show advanced options Whether to reveal advanced configuration in the editor. [default=false].

Configuration

Configuration — pick the ElevenLabs voice, model, and language.

Configuration
Property Description
Voice ID ElevenLabs voice ID that generates the speech. Must match a voice your ElevenLabs account has access to. Browse the ElevenLabs Voice Library to find IDs for cloned voices, voices you've designed, or pre-made voices.
Model ID ElevenLabs model ID used for generating speech. Different models trade off quality, latency, multilingual support, and credit cost. Pick a "flash" model for low-latency live use, a higher-quality model for prerecorded content. See the ElevenLabs models documentation for the current options.
Language (ISO 639-1) Language code (ISO 639-1) for the generated speech (for example "en", "sv", "es"). Leave empty to use the model's default language. Useful when the same model is used for multiple languages and you want to force a specific one.

Text to Speech

Text to Speech — the text the operator will read aloud on the next Play.

Text to Speech
Property Description
Text to Speak The text the model will read aloud on the next Play command. Set from a script for fully automated speech, or type into the field for interactive use. Long text blocks generate longer audio clips and use more ElevenLabs credits.

Voice Settings

Voice Settings — fine-tune the character of the generated voice.

Voice Settings
Property Description
Stability How stable and consistent the voice sounds. [min=0, max=1, default=1]. Lower values give more emotional, varied delivery — good for character voices and performances. Higher values sound calmer and more monotone — good for narration and announcements where consistency matters.
Speaker Boost Whether to boost similarity to the original speaker. [default=true]. Helps the synthesised voice stay closer to the cloned source. Slightly increases generation time and latency.
Similarity Boost How closely the synthesised voice matches the original. [min=0, max=1, default=0.75]. Higher values stick more tightly to the source voice. Lower values let the model improvise more, which can sound more natural but less recognisable.
Style How much the speaker's vocal style is exaggerated. [min=0, max=1, default=0]. Higher values amplify the speaker's natural style — useful for character voices. May increase generation latency. Stick close to 0 for neutral narration.
Speed Speech playback speed. [min=0.7, max=1.2, default=1.0]. Values below 1.0 slow the speech down; above 1.0 speed it up. Useful for matching timing to a specific cue or making content easier to follow.
Reset Reset voice settings to their defaults (stability, speaker boost, similarity, style, speed).

Commands

Commands — Play, Stop, and the playback state.

Commands
Property Description
Playback State Current playback state of the input — Running or Stopped (read-only).
Play Send the current Text to ElevenLabs and play the resulting speech audio. If caching is on and the same text/voice/model has been generated before, audio is played from disk instead of making a new request.
Stop Stop the current speech playback and cancel any in-flight ElevenLabs request.

Status

Status — what the input is doing right now and how long the latest request took.

Status
Property Description
Status Current state of the request (read-only). Reports whether the input is standing by, sending, receiving audio, speaking, playing from cache, finished, or in an error state. Useful from a script to wait for speech to finish before triggering the next cue.
Response Latest HTTP response message from the ElevenLabs API (read-only). Shows the status code and any error text. Useful for diagnosing failures (invalid API key, exhausted credits, voice not available) without checking the log.
Response Time (ms) Time taken until the first audio chunk arrived from ElevenLabs, in milliseconds (read-only). Useful for monitoring service responsiveness. Zero before the first chunk and when playing from cache.
Speech Duration Length of the generated speech, formatted as seconds or mm:ss (read-only). Populated once the full response has been received (or loaded from cache).

Cache Settings

Cache Settings — cache generated audio on disk to avoid repeated ElevenLabs charges.

Cache Settings
Property Description
Enable Cache Whether to cache generated audio so repeated playback doesn't hit ElevenLabs again. [default=true]. On is recommended for any workflow that replays the same lines (announcements, jingles, scripted Q&A) — you only pay the API cost once. Cache key is text\|voice\|model\|language, so changing any of those generates fresh audio.
Keep Cache Time unit used together with CacheRetentionDuration to compute cache expiry. Pick Forever to keep cached audio indefinitely. Pick Minutes/Hours/Days/Months to auto- expire entries older than the chosen interval.
Cache Expiration How long cached files are kept before they expire, in units of CacheRetentionUnit. [min=1, max=10000]. Ignored when CacheRetentionUnit is Forever.
Cache Folder Folder where cached audio files are written (read-only). Resolved automatically based on the project location and the Composer media directory.
Total in cache Number of cached audio files currently on disk (read-only). Useful for monitoring cache growth.
Clear Cache Delete all cached audio files. Asks for confirmation first; cannot be undone. Useful when voice settings change and you want to force fresh audio everywhere.

Inherits from: AbstractInput, AbstractAudioProcessing, AbstractAudioMetering.

See also: Text To Speech (ElevenLabs) in Script Engine Objects.

Shared input properties

Every input — regardless of source type — exposes the following property groups. They are surfaced in the property panel only when Show advanced options is enabled on the input.

Icon

  • Icon text — short text shown on the input's icon in the Inputs list. Useful as a quick visual label (channel number, mic name, camera position) to tell otherwise-similar inputs apart at a glance. Empty by default; has no effect on rendering or routing.

Audio mixer

  • Hide in audio mixer — when on, hides the input from the audio mixer view without disabling its audio. Useful for de-cluttering the mixer while keeping the audio routed (e.g. fixed background music, ambient beds, pre-aligned playout). [default=false]

Render Options

  • Invisible (Do not render in scene) — when on, the input is skipped during rendering and produces no picture on any layer or scene. Audio routing is unaffected. Toggle from a script for cued-in / cued-out behaviour during a show. [default=false]
  • Do not render input — disables the input's internal render entirely (no decode or capture work is done). Stronger than Invisible: that one renders but doesn't display; this one stops the input from doing any work at all. Useful for reducing CPU / network load on heavy sources (e.g. high-bitrate RTMP / SRT streams, large media files) when the input is temporarily not needed. Audio meters are cleared while disabled. [default=false]
  • Do not render input controller — chooses what drives the Do not render input flag. Let Composer decide (the default) hands control to the project-level Render Tuning optimiser, which automatically pauses inputs that aren't used by any active scene. Manual Configuration ignores Render Tuning and lets the Do not render input toggle control the flag directly — use this to keep a network source warm even when it's currently off-air, or to take a heavy input down by hand regardless of scene activity. [default=Let Composer decide]

Optional TAGS

  • TAGS — one or more free-form tag words used to classify this input (typically space- or comma-separated). Picked up by Composer's Smart Search to filter or find inputs by category — e.g. camera, music, interview, sponsor. Has no effect on rendering.

Audio configuration and processing options

For inputs capable of processing audio, additional audio configuration and processing options are available through the audio mixer and the Channel Strip Inspector.

  • Audio mixer — monitor levels, adjust gain and pan, mute / solo inputs, and configure auxiliary sends to Audio Channel Strip submix buses, all from a centralised mixer-style interface.
  • Channel Strip Inspector — advanced per-strip audio processing for the selected input:
    • Input trim, stereo remapping, and audio delay
    • Channel mapping (8-channel mode unlocks the full MAPPING tab)
    • Gate
    • Low-cut filter
    • Equaliser (5-band parametric)
    • Compressor
    • Sidechain ducking (a second compressor whose gain reduction is driven by another input's level — e.g. dipping music under a voice-over)
    • Limiter

For the full audio signal flow, see Audio processing workflow.