Speech To Text

Property Type Access Description
ShowAdvancedOptions bool get/set Whether to reveal advanced configuration in the editor. [default=false]. Toggle on to show options like the script callback fields.
ModelSourceUrl Uri get/set Path to the speech recognition model file (.bin) used for transcription. Larger models give more accurate recognition but cost more processing time per second of audio. Loading a new model reinitialises the operator.
AutoStart bool get/set Whether to start listening automatically once the model finishes loading. [default=true]. Saves a manual click when the project is loaded fresh; turn off if you want to start transcription only on demand.
OperatorState OperatorState get Current state of the operator (read-only). Reports whether the model is loading, ready, running, stopped, or in an error state.
StartCommand Command get Begin transcribing the layer's audio. Available once a valid model is loaded.
StopCommand Command get Stop transcribing audio.
ClearCommand Command get Clear the on-screen captions and reset the audio context so previous speech doesn't bleed back in.
ConfidenceLevel int get/set Minimum confidence (in percent) a recognised word must score to appear on screen. [min=10, max=100, default=70]. Raise to suppress weak guesses — fewer words appear, but the ones that do are more likely correct. Lower to surface more text, at the cost of occasional misheard words.
AudioBuffer int get/set How much audio is collected before each transcription pass, in milliseconds. [min=100, max=2000, default=300]. Lower values give snappier captions at the cost of accuracy — there's less context for the model to reason from. Higher values give better accuracy but captions appear with more delay.
PauseThreshold int get/set How long the speaker must be silent before a new subtitle card is started, in milliseconds. [min=0, max=5000, default=750]. Shorter values break the captions into smaller chunks more often. Longer values let long sentences flow as one block but cards stay on screen longer between sentences.
NoSpeechThreshold int get/set How aggressively to filter audio segments that probably contain no speech, as a percentage. [min=0, max=100, default=60]. Lower values filter more aggressively — good for noisy environments where the model hears phantom words during silence. Raise if quiet speech is being missed.
SubtitlesScreenTimout int get/set How long captions stay on screen after the speaker stops talking, in milliseconds. [min=100, max=10000, default=5000]. Longer values give the audience more reading time. Shorter values keep the screen uncluttered between sentences.
ResetThresholdValuesCommand Command get Reset all threshold values to their defaults (confidence, audio buffer, pause detection, no-speech, screen timeout).
ShowSubtitlesCheckBox bool get/set Whether to draw captions on the output image. [default=true]. Turn off if you only want to use the recognised text from a script (via RecentText and the callback) without on-screen captions.
SubtitlesPosX int get/set Horizontal position of the caption block, in pixels from the left edge. [min=0, max=4096].
SubtitlesPosY int get/set Vertical position of the caption block, in pixels from the top edge. [min=0, max=4096].
ResetTextPositionCommand Command get Reset caption position to the default location.
RecentText FormattedMessage get Most recently transcribed text (read-only). Updates every time the speech recognition engine produces a new result. Read this from a script to forward the live transcript to chat overlays, captions widgets, or any external system.
IsSubtitleActive bool get True while a caption is currently being shown (read-only). Resets when the text-on-screen timeout expires or the operator is cleared. Useful for scripts that need to react when speech starts or ends.
SubtitleStartTime string get UTC timestamp when the current subtitle segment started (read-only). Updated on each new speech segment. Useful for tagging subtitles with absolute time when feeding them to external systems.
SubtitleEndTime string get UTC timestamp when the current subtitle segment ended (read-only). Empty while the speaker is still talking; populated once the segment closes.
SubtitleStartPts long get Video stream timestamp marking when the current subtitle segment started (read-only). Zero if the input does not provide a presentation timestamp. Useful for matching captions to specific frames when post-processing recordings.
SubtitleEndPts long get Video stream timestamp marking when the current subtitle segment ended (read-only). Zero while speech is still active, or if the input does not provide a presentation timestamp.
FontSize int get/set Caption font size, in pixels. [min=11, max=60, default=32].
FontColorR int get/set Red component of the caption text colour. [min=0, max=255, default=255].
FontColorG int get/set Green component of the caption text colour. [min=0, max=255, default=255].
FontColorB int get/set Blue component of the caption text colour. [min=0, max=255, default=255].
FontAlpha int get/set Caption text opacity. [min=0, max=255, default=255]. 0 is fully transparent, 255 is fully solid.
SubtitleBackgroundAlpha int get/set Caption background opacity. [min=0, max=255, default=90]. 0 hides the background, 255 makes it fully solid. A subtle dark background helps readability over busy footage.
ResetTextAppearanceCommand Command get Reset all text appearance settings (font size, colour, alpha, background) to their defaults.
MaxLineLimit int get/set Maximum number of caption lines on screen at once. [min=1, max=10]. When the limit is reached the oldest line scrolls away. Lower values keep the screen uncluttered; higher values give more reading time across longer monologues.
MaxCharPerLineLimit int get/set Maximum characters per caption line before wrapping. [min=1, max=200].
SmallLettersOnly bool get/set Whether all captions are forced to lower-case. [default=false]. "I" and a few common contractions are still kept capitalised for readability.
ResetTextSettingsCommand Command get Reset text settings (max lines, max chars per line, lower-case mode) to their defaults.
EnableTextReplacement bool get/set Whether to apply find-and-replace rules to the recognised text. [default=false]. Useful for fixing recurring misheard words ("Hugh" → "you"), expanding domain abbreviations, or filtering profanity. Pair with a rules file via TextReplacementFileUrl.
TextReplacementFileUrl Uri get/set Path to a JSON file containing find-and-replace rules. Each rule is a key/value pair where the key is the pattern to find and the value is the replacement. Supports exact matches and wildcard patterns (* for any characters, ? for one character). Reload Media re-reads the file if it changes on disk.
RulesLoadedCount int get Number of rules successfully loaded from the rules file (read-only).
ReplacementsAppliedCount int get Total number of replacements applied since the rules file was loaded (read-only).
EnableReplacementStats bool get/set Whether to log replacement statistics to a file in the Reports folder. [default=false]. Records which rules fired and how often, written to a file next to the executable and updated periodically while running. Useful for tuning the rules file by seeing what's actually being matched.
ReplacementStatsFileName string get Path of the current replacement-stats file (read-only).
ReplacementStatsStatus string get Status of the replacement-stats logger (read-only).
EnableTranscription bool get/set Whether to write all recognised speech to a transcript file in the Reports folder. [default=false]. Each speech segment is written as a timestamped line. Useful for archiving live shows and for accessibility post-production. The project must be saved first, since the transcript filename is based on the project name.
TranscriptFileName string get Path of the current transcript file (read-only).
ScriptCallbackFunction string get/set Name of a Script Engine function to call each time the transcription updates. Receives a JSON payload with the recognised text and PTS timing. Leave empty to disable. Useful for forwarding live captions to chat overlays, third-party caption systems, or moderation pipelines.

See also: Speech To Text in Operators — user-facing introduction, screenshots, and section summaries.