Speech To Text

Speech To Text operator properties for Script Engine. Listens to the layer's audio and transcribes it to text in real time, optionally drawing the result on screen as live captions. Useful for live captioning, accessibility overlays, post-show transcripts, automated content moderation, foreign-language productions where subtitles need to appear instantly, and any workflow where what's being said needs to be captured as text. Includes optional find-and-replace rules for fixing recurring misrecognitions or filtering profanity. Accuracy depends on the loaded model and the quality of the incoming audio (clear voice, low background noise gives the best results).

Property	Type	Access	Description
`ShowAdvancedOptions`	`bool`	`get/set`	Whether to reveal advanced configuration in the editor. [default=false]. Toggle on to show options like the script callback fields.
`ModelSourceUrl`	`Uri`	`get/set`	Path to the speech recognition model file (.bin) used for transcription. Larger models give more accurate recognition but cost more processing time per second of audio. Loading a new model reinitialises the operator.
`AutoStart`	`bool`	`get/set`	Whether to start listening automatically once the model finishes loading. [default=true]. Saves a manual click when the project is loaded fresh; turn off if you want to start transcription only on demand.
`OperatorState`	`OperatorState`	`get`	Current state of the operator (read-only). Reports whether the model is loading, ready, running, stopped, or in an error state.
`StartCommand`	`Command`	`get`	Begin transcribing the layer's audio. Available once a valid model is loaded.
`StopCommand`	`Command`	`get`	Stop transcribing audio.
`ClearCommand`	`Command`	`get`	Clear the on-screen captions and reset the audio context so previous speech doesn't bleed back in.
`ConfidenceLevel`	`int`	`get/set`	Minimum confidence (in percent) a recognised word must score to appear on screen. [min=10, max=100, default=70]. Raise to suppress weak guesses — fewer words appear, but the ones that do are more likely correct. Lower to surface more text, at the cost of occasional misheard words.
`AudioBuffer`	`int`	`get/set`	How much audio is collected before each transcription pass, in milliseconds. [min=100, max=2000, default=300]. Lower values give snappier captions at the cost of accuracy — there's less context for the model to reason from. Higher values give better accuracy but captions appear with more delay.
`PauseThreshold`	`int`	`get/set`	How long the speaker must be silent before a new subtitle card is started, in milliseconds. [min=0, max=5000, default=750]. Shorter values break the captions into smaller chunks more often. Longer values let long sentences flow as one block but cards stay on screen longer between sentences.
`NoSpeechThreshold`	`int`	`get/set`	How aggressively to filter audio segments that probably contain no speech, as a percentage. [min=0, max=100, default=60]. Lower values filter more aggressively — good for noisy environments where the model hears phantom words during silence. Raise if quiet speech is being missed.
`SubtitlesScreenTimout`	`int`	`get/set`	How long captions stay on screen after the speaker stops talking, in milliseconds. [min=100, max=10000, default=5000]. Longer values give the audience more reading time. Shorter values keep the screen uncluttered between sentences.
`ResetThresholdValuesCommand`	`Command`	`get`	Reset all threshold values to their defaults (confidence, audio buffer, pause detection, no-speech, screen timeout).
`ShowSubtitlesCheckBox`	`bool`	`get/set`	Whether to draw captions on the output image. [default=true]. Turn off if you only want to use the recognised text from a script (via `RecentText` and the callback) without on-screen captions.
`SubtitlesPosX`	`int`	`get/set`	Horizontal position of the caption block, in pixels from the left edge. [min=0, max=4096].
`SubtitlesPosY`	`int`	`get/set`	Vertical position of the caption block, in pixels from the top edge. [min=0, max=4096].
`ResetTextPositionCommand`	`Command`	`get`	Reset caption position to the default location.
`RecentText`	`FormattedMessage`	`get`	Most recently transcribed text (read-only). Updates every time the speech recognition engine produces a new result. Read this from a script to forward the live transcript to chat overlays, captions widgets, or any external system.
`IsSubtitleActive`	`bool`	`get`	True while a caption is currently being shown (read-only). Resets when the text-on-screen timeout expires or the operator is cleared. Useful for scripts that need to react when speech starts or ends.
`SubtitleStartTime`	`string`	`get`	UTC timestamp when the current subtitle segment started (read-only). Updated on each new speech segment. Useful for tagging subtitles with absolute time when feeding them to external systems.
`SubtitleEndTime`	`string`	`get`	UTC timestamp when the current subtitle segment ended (read-only). Empty while the speaker is still talking; populated once the segment closes.
`SubtitleStartPts`	`long`	`get`	Video stream timestamp marking when the current subtitle segment started (read-only). Zero if the input does not provide a presentation timestamp. Useful for matching captions to specific frames when post-processing recordings.
`SubtitleEndPts`	`long`	`get`	Video stream timestamp marking when the current subtitle segment ended (read-only). Zero while speech is still active, or if the input does not provide a presentation timestamp.
`FontSize`	`int`	`get/set`	Caption font size, in pixels. [min=11, max=60, default=32].
`FontColorR`	`int`	`get/set`	Red component of the caption text colour. [min=0, max=255, default=255].
`FontColorG`	`int`	`get/set`	Green component of the caption text colour. [min=0, max=255, default=255].
`FontColorB`	`int`	`get/set`	Blue component of the caption text colour. [min=0, max=255, default=255].
`FontAlpha`	`int`	`get/set`	Caption text opacity. [min=0, max=255, default=255]. 0 is fully transparent, 255 is fully solid.
`SubtitleBackgroundAlpha`	`int`	`get/set`	Caption background opacity. [min=0, max=255, default=90]. 0 hides the background, 255 makes it fully solid. A subtle dark background helps readability over busy footage.
`ResetTextAppearanceCommand`	`Command`	`get`	Reset all text appearance settings (font size, colour, alpha, background) to their defaults.
`MaxLineLimit`	`int`	`get/set`	Maximum number of caption lines on screen at once. [min=1, max=10]. When the limit is reached the oldest line scrolls away. Lower values keep the screen uncluttered; higher values give more reading time across longer monologues.
`MaxCharPerLineLimit`	`int`	`get/set`	Maximum characters per caption line before wrapping. [min=1, max=200].
`SmallLettersOnly`	`bool`	`get/set`	Whether all captions are forced to lower-case. [default=false]. "I" and a few common contractions are still kept capitalised for readability.
`ResetTextSettingsCommand`	`Command`	`get`	Reset text settings (max lines, max chars per line, lower-case mode) to their defaults.
`EnableTextReplacement`	`bool`	`get/set`	Whether to apply find-and-replace rules to the recognised text. [default=false]. Useful for fixing recurring misheard words ("Hugh" → "you"), expanding domain abbreviations, or filtering profanity. Pair with a rules file via `TextReplacementFileUrl`.
`TextReplacementFileUrl`	`Uri`	`get/set`	Path to a JSON file containing find-and-replace rules. Each rule is a key/value pair where the key is the pattern to find and the value is the replacement. Supports exact matches and wildcard patterns (`*` for any characters, `?` for one character). Reload Media re-reads the file if it changes on disk.
`RulesLoadedCount`	`int`	`get`	Number of rules successfully loaded from the rules file (read-only).
`ReplacementsAppliedCount`	`int`	`get`	Total number of replacements applied since the rules file was loaded (read-only).
`EnableReplacementStats`	`bool`	`get/set`	Whether to log replacement statistics to a file in the Reports folder. [default=false]. Records which rules fired and how often, written to a file next to the executable and updated periodically while running. Useful for tuning the rules file by seeing what's actually being matched.
`ReplacementStatsFileName`	`string`	`get`	Path of the current replacement-stats file (read-only).
`ReplacementStatsStatus`	`string`	`get`	Status of the replacement-stats logger (read-only).
`EnableTranscription`	`bool`	`get/set`	Whether to write all recognised speech to a transcript file in the Reports folder. [default=false]. Each speech segment is written as a timestamped line. Useful for archiving live shows and for accessibility post-production. The project must be saved first, since the transcript filename is based on the project name.
`TranscriptFileName`	`string`	`get`	Path of the current transcript file (read-only).
`ScriptCallbackFunction`	`string`	`get/set`	Name of a Script Engine function to call each time the transcription updates. Receives a JSON payload with the recognised text and PTS timing. Leave empty to disable. Useful for forwarding live captions to chat overlays, third-party caption systems, or moderation pipelines.

Inherits from: AbstractOperator, AbstractAudioMetering.

See also: Speech To Text in Operators — user-facing introduction, screenshots, and section summaries.