LLM (Ollama)
LLM (Ollama) input properties for Script Engine. Talks to a locally hosted Ollama server and produces text generated by a large language model. Useful for AI hosts and assistants in broadcasts, automatic summarisation of chat or news feeds, on-the-fly translation, scripted Q&A bots, content moderation, and any workflow that needs natural-language text generated by AI on demand. Includes optional chat persistence so conversations are saved to disk and can be resumed later. Response speed and quality depend entirely on the chosen model and the server hardware.
| Property | Type | Access | Description |
|---|---|---|---|
ShowAdvancedOptions |
bool |
get/set |
Whether to reveal advanced configuration in the editor. [default=false]. |
AutoStart |
bool |
get/set |
Whether to connect to the Ollama server automatically when the project loads. [default=true]. Saves a manual click when the project is loaded fresh; turn off if you want to connect only on demand from a script or button press. |
ServerUrl |
string |
get/set |
URL of the Ollama server to talk to (e.g. http://localhost:11434). Use http:// for direct connections — local ollama serve only serves plain HTTP. Use https:// for Ollama Cloud (https://ollama.com) or for a self-hosted Ollama sitting behind a reverse proxy that terminates TLS. Vanilla ollama serve has no built-in authentication and accepts every request regardless of any Authorization header — for endpoints that DO require a Bearer token (Ollama Cloud, a reverse- proxied Ollama with auth configured, or another Ollama-API-compatible service), set the COMPOSER_OLLAMA_APIKEY environment variable on the Composer machine. |
LlmStatus |
LlmStatus |
get |
Current state of the LLM input (read-only). Reports whether the input is disconnected, connecting, connected, receiving a response, thinking, or in an error state. |
ConnectCommand |
Command |
get |
Connect to the Ollama server using the current ServerUrl. |
DisconnectCommand |
Command |
get |
Disconnect from the Ollama server and cancel any in-flight response. |
AvailableModels |
StringCollectionEnum |
get/set |
List of every model installed on the connected Ollama server. Populated after connecting. Pick the one you want to chat with — text models for conversations, embedding-only or image-only models won't accept prompts and the Send button will be disabled. Changing the selection resets the current chat history. |
ContextSize |
ContextWindowSize |
get/set |
Maximum tokens the model can attend over per request (Ollama's num_ctx). Set to ModelDefault to use the Modelfile value (typically 2048-4096); any other member forces the corresponding override. Larger values let longer chats fit but increase VRAM usage. |
EnableThinking |
bool |
get/set |
Enable the reasoning phase on thinking-capable models. Ignored on models without a thinking capability. |
Temperature |
float |
get/set |
Controls how creative or predictable the model's responses are [min=0.0, max=2.0]. Lower values make the model pick the most expected words — best for code, math, and factual answers. Higher values make it take more chances and pick less obvious words — best for creative writing and brainstorming. 0.0 always picks the single most likely word (predictable but boring). |
TopP |
float |
get/set |
Probability threshold for filtering word choices [min=0.0, max=1.0]. At each step, the model gathers the most-likely words until their combined probability reaches Top P; those words become the eligible pool. The pool size adapts to the model's confidence — fewer words when one choice dominates, more when probabilities spread evenly. Counterintuitively, lower Top P often keeps fewer words, not more (Top P = 0.5 may end up with just one word). Useful range: 0.9–1.0; below 0.5 is effectively greedy. Lower it if responses contain odd or surprising words. |
TopK |
int |
get/set |
Hard cap on how many word choices the model considers at each step [min=0, max=200]. 0 = disabled (no Top K filter; only Top P narrows the pool); 1 = always picks the single most likely word (fully predictable, equivalent to Temperature 0.0); 200 = effectively no cap. Applies before Top P in the sampling pipeline: Top K caps the candidate count first, then Top P narrows further within those K words. The smaller pool always wins — a low Top K can prevent Top P from gathering as many words as it would like. |
Seed |
int |
get/set |
Controls the randomness used when picking words [default=-1]. -1 picks a fresh random number every request, so the same prompt produces a different response each time (normal chat behaviour). Setting a specific number (e.g. 42) makes the output reproducible: same seed + same prompt + same options gives the exact same response every time. Useful for regression tests ("did my prompt change actually improve things, or was the difference just random?"), debugging weird responses (reproduce them to investigate), or demos that need consistent output. |
MaxOutputTokens |
int |
get/set |
Maximum length of the model's response, measured in tokens (~0.75 words per token, so 100 tokens ≈ 75 words ≈ a short paragraph) [default=-1, unlimited]. -1 = no limit; the model stops when it is naturally done. Set a positive value (e.g. 100) to enforce a hard cap on response length — useful for bounding latency or cost in automated pipelines. This is a guillotine, not a polite request: the model is unaware of the cap and will be cut off mid-sentence. For "respond briefly" behaviour, instruct the model in the System Prompt instead, and use this only as a safety ceiling. |
StopSequences |
string |
get/set |
Custom stop sequences, one per line. The model halts immediately when it produces any of these strings (the matched string is excluded from the response). Empty lines are ignored. The model's own chat-template end-of-turn tokens (e.g. <|eot_id|>, <end_of_turn>) are applied automatically and are not shown here — you don't need to add them. Use this field for your own stops, e.g. "User:" to prevent the model from roleplaying both sides of a conversation, or a marker like "---" to halt at a specific boundary. |
ResetTuningCommand |
Command |
get |
Resets the six Response Tuning options to the selected model's effective defaults (Modelfile values where present, Ollama floor otherwise). Requires an active connection; disabled otherwise. |
SystemPrompt |
string |
get/set |
Optional system prompt that frames every request the model receives. Leave empty to use whatever system prompt the model ships with. Set it to give the model a persona, restrict it to a topic, or override the default. Changes apply immediately to the next prompt. Useful for "act as a sports commentator", "always reply in JSON", or strict moderation rules. |
UserPrompt |
string |
get/set |
The user message to send to the model on the next Send Prompt. Set this from a script for fully automated chat flows, or type into the field for interactive use. Cleared on each successful send. |
SendPromptCommand |
Command |
get |
Send the current UserPrompt to the connected model. Requires an active connection and a model that supports text completion. Replies are streamed into LastResponseText and surfaced through the script callback if one is configured. |
ClearChatCommand |
Command |
get |
Start a new chat — cancels any in-flight response and resets chat history, token counters, and the last response. |
LastPrompt |
string |
get |
The user prompt from the most recent exchange (read-only). Mirrors what was sent to the model so a script can correlate prompt and response. |
LastResponseText |
string |
get |
The full text of the most recent response from the model (read-only). Updated as the response streams in. Read this from a script to forward AI-generated text to overlays, captions, or external systems. |
LastResponseTime |
int |
get |
Time taken to receive the most recent response, in milliseconds (read-only). Useful for monitoring server load and detecting slow responses. |
ChatMessageCount |
int |
get |
Number of prompts sent in the current chat session (read-only). Resets on New Chat, model change, or disconnect. |
ScriptCallbackFunction |
string |
get/set |
Name of a Script Engine function to call each time a response is received. The function receives an object with prompt, response, model, and messageCount fields. Leave empty to disable. Useful for forwarding AI replies to chat overlays, triggering scene changes, or feeding generated text into other components. |
EnableChatPersistence |
bool |
get/set |
Whether to auto-save each exchange to a chat file under Documents/Composer/LLM Chats/. [default=false]. On preserves conversations across sessions so you can resume them later. Files are compacted automatically so size stays bounded by the selected model's context window. |
AvailableChats |
StringCollectionEnum |
get/set |
List of saved chats found in the chats folder. Pick an entry to load it immediately; the active chat is pre-selected. Refreshed when chat history is enabled and after each auto-save. The top entry is empty — picking it starts a fresh chat. |
LastSaved |
string |
get |
Timestamp of the last disk write for the current chat (read-only). Empty until the first auto-save or load. |
OpenChatFolderCommand |
Command |
get |
Open the chats folder in the operating system's file manager. Useful for backing up, renaming, or deleting chat files outside of Composer. |
RefreshChatListCommand |
Command |
get |
Rescan the chats folder. Use after renaming or deleting chat files outside of Composer to refresh AvailableChats. |
PlaybackState |
PlaybackState |
get |
Connection state — drives the enable/disable state of commands (read-only). |
ModelVram |
string |
get |
Memory consumed by the model currently loaded in Ollama (read-only). |
ModelProcessor |
string |
get |
How much of the model is running on the GPU vs the CPU (read-only). Format like "100% GPU" or "60% GPU/40% CPU". Lower GPU percentages mean slower responses — Ollama spills layers to CPU when the model doesn't fit in VRAM. |
ModelContextLength |
int |
get |
How much text (in tokens) the model can hold in context (read-only). Populated after the first prompt is sent. Compare with TokensUsed to gauge how full the chat is. |
TokensUsed |
int |
get |
Tokens consumed by the chat history on the most recent request (read-only). Compare with ModelContextLength to see how full the chat history is — click New Chat to reset when getting close to the limit. Reset by New Chat, model change, or disconnect. |
TruncationCount |
int |
get |
Number of times Ollama has silently truncated the chat history this session (read-only). Non-zero means the model has lost earlier context and responses may be degraded (shorter or less detailed) — click New Chat to reset, or raise Context Size to give the model more room (this can also be configured server-side in Ollama). |
Inherits from: AbstractInput, AbstractAudioProcessing, AbstractAudioMetering.
See also: LLM (Ollama) in Inputs — user-facing introduction, screenshots, and section summaries.