Streaming & Parsing
How Daydreams processes LLM response streams in real-time.
Large Language Models (LLMs) generate responses token by token. To enable responsive and interactive agents, Daydreams processes this output as it arrives (streaming) and parses the structured information (like action calls or outputs) embedded within the stream. The framework relies on an XML-based structure in the LLM's response for reliable parsing.
Expected LLM Response Format
As detailed in the Prompting
section, the framework expects the LLM to wrap its response in a <response>
tag and use specific child tags like <reasoning>
, <action_call>
, and
<output>
.
The Parsing Pipeline
The processing of the LLM's raw text stream involves several components working together:
- LLM Stream: The raw
AsyncIterable<string>
coming from the LLM provider (e.g., viastreamText
from the AI SDK). xmlStreamParser
(xml.ts
): This generator function is the low-level parser.- Input: Consumes chunks of text from the LLM stream.
- Logic: It looks for potential XML tag boundaries (
<
,>
). Based on a provided set ofparseTags
(e.g.,{"reasoning", "action_call", "output"}
) and ashouldParse
function (which determines if a specific tag occurrence should be treated as structure or just text), it identifies the start and end of relevant XML elements. - Output: Yields
XMLToken
objects:{ type: "start", name: "...", attributes: {...} }
{ type: "end", name: "..." }
{ type: "text", content: "..." }
handleStream
(streaming.ts
): This function orchestrates the parsing.- Input: Takes the LLM stream, the set of
parseTags
, theshouldParse
function, and ahandler
callback. - Logic: It iterates through the
XMLToken
s yielded byxmlStreamParser
. It maintains a stack to handle nested elements and reconstructs logicalStackElement
objects. AStackElement
represents a parsed XML tag, accumulating itscontent
as text tokens arrive between the start and end tokens. - Output: Calls the provided
handler
callback whenever aStackElement
is created or its content is updated, and importantly, when it's considered complete (done: true
- when the corresponding end tag is parsed).
- Input: Takes the LLM stream, the set of
createContextStreamHandler
(streaming.ts
): This function, called duringagent.run
, sets up the run-specific state and provides the actualhandler
callback function tohandleStream
.- The
handler
Callback: This function bridges the parsed elements to the framework's log objects.- It uses
getOrCreateRef
to associate eachStackElement
(identified by its index in the stream) with a specificLog
object (Thought
,ActionCall
,OutputRef
). - As text content arrives for a
StackElement
, it updates thecontent
of the correspondingLog
object. - When
handleStream
signals that aStackElement
is complete (el.done
), this handler callshandlePushLog
.
- It uses
- The
handlePushLog
(streaming.ts
): This function acts on the completedLog
objects derived from the parsed stream.- Input: Receives a complete
Log
object (Thought
,ActionCall
,OutputRef
, etc.). - Logic: Based on the
log.ref
type, it dispatches the log to the appropriate processing function:Thought
: Logs the reasoning.ActionCall
: TriggershandleActionCallStream
->handleActionCall
for argument parsing, template resolution, and task execution.OutputRef
: TriggershandleOutputStream
->handleOutput
for schema validation and executing the output handler.
- Output: Updates
WorkingMemory
, notifies subscribers, and potentially triggers further asynchronous operations (like action execution).
- Input: Receives a complete
Summary Flow
This streaming and parsing pipeline allows Daydreams to react to the LLM's output incrementally, enabling more interactive agent behavior and efficient handling of structured commands like action calls and outputs, even before the entire LLM response is finished.