Crystal Speech

The Crystal Speech Operator is an advanced AI-powered denoiser that enhances live audio streams by eliminating unwanted background noise in real time. Whether you're in a noisy environment or dealing with microphone interference, this feature ensures that only clear, high-quality speech comes through. By isolating voices and filtering out distractions, Crystal Speech delivers crisp, uninterrupted audio for a seamless listening experience.

Note!

Crystal Speech introduces a few frames of latency.

For optimal synchronization between audio and video, consider using the Video Delay operator in conjunction with Crystal Speech. This ensures that any audio processing delay introduced by Crystal Speech is matched by a corresponding video delay, keeping your audio and video in sync during live streams or recordings.

Audio Output and Mono Processing

To achieve the best noise reduction in real time, Crystal Speech processes audio in mono. By working with a single channel, it ensures clearer, more consistent speech while effectively eliminating unwanted background noise. This approach is standard in many high-performance denoisers, enabling efficient and precise noise filtering.

Crystal Speech overview

Note

This operator displays the audio input level before processing and the output level after the operator has been applied, allowing you to monitor how the adjustments affect the signal.

A miniature audio meter (VU meter) in the header indicates incoming audio, so you can quickly verify that the operator is receiving audio even when it is collapsed.

Tip

Use the Audio tab in Settings to adjust how long the signal overload indicator stays active, and Project Options to change the maximum peak level displayed in all audio meters to your preference.

  • Speech To Text — pair with Crystal Speech to feed the cleaner, denoised audio into Whisper for more accurate real-time transcription and subtitles.
  • Text To Speech (ElevenLabs) — the inverse direction: turn scripted text into spoken audio in real time, useful for AI-driven announcements and presenter pipelines.
  • Video Delay — compensate for the few frames of audio latency Crystal Speech introduces, keeping audio and video in sync.

Crystal Speech - Settings

State

State — current status of the noise-cancellation engine.

State
Property Description
State Current operator state (read-only). Idle — model loaded, ready to process. Processing — actively cleaning audio. Stopped — manually stopped, audio passes through unchanged. ModelError — the noise-cancellation model failed to load (check the log for the file path the operator tried). RunError — a runtime failure occurred during processing. Useful from a script to react to state changes (e.g. send an alert if the state flips to RunError mid-show).

Action

Action — start/stop the operator.

Action
Property Description
Auto-Start when loaded When true, the operator starts processing automatically once the project loads. [default=true]. The recommended setting for most projects — noise cancellation is on the moment the show begins. Disable to control activation manually (via StartCommand from a script or button), for example to enable cleanup only during certain segments.
StartCommand Start the noise-cancellation engine. Switches OperatorState from Idle or Stopped to Processing.
StopCommand Stop the noise-cancellation engine. The input audio passes through unchanged while stopped.

Dry/Wet Level

Dry/Wet Level — blend between the original and the cleaned signal.

Property Description
Level (%) Mix between the original audio (dry) and the noise-cancelled audio (wet), in percent. [min=0, max=100, default=85]. 0 outputs the original unchanged. 100 outputs only the cleaned signal. Mid values blend the two — useful when full denoising sounds too "processed", strips natural room ambience, or makes a voice feel disconnected from the scene. Uses an equal- power crossfade so perceived loudness stays roughly constant across the slider.
Bypass When true, outputs the original audio unchanged — the cleaned signal is ignored. [default=false]. Quick A/B comparison toggle — turn on to hear the raw source, off to hear the cleaned result. The engine keeps running in the background either way, so the switch is instant and there's no glitch when toggling on air.
ResetCommand Reset Dry/Wet level and Bypass to their defaults (85%, off).

Latency

Latency — extra delay introduced by the noise-cancellation processing.

Latency
Property Description
Audio Latency (Frames) Processing delay measured in audio frames (read-only). Reflects how many frames the engine buffers internally before output is available. To keep other audio paths time-aligned with the cleaned signal, apply a matching delay (using the Delay operator) on the parallel branches.
Audio Latency (ms) Processing delay in milliseconds (read-only). Same delay as AudioLatencyFrames, expressed as time. Measured once on first startup and reused on subsequent stop/start cycles, so the value stays stable throughout a session.
Total Resyncs Number of times the audio buffer has had to resync since processing started (read-only). A resync happens when the engine falls behind and the buffer overflows — typically caused by a load spike on the host or a debugger pause. The occasional resync is harmless; frequent ones indicate the host is overloaded — reduce other work on the machine or move the cleanup elsewhere. Useful from a script to alert when the count climbs unexpectedly.

Log

Log — recent component messages and warnings.

Log
Property Description
ComponentLog

Inherits from: AbstractAudioOperator, AbstractOperator, AbstractAudioMetering.

See also: Crystal Speech in Script Engine Objects.