Reasoning Replay Cache
v3.8.1Last updated: 2026-05-13
Was this page helpful?
Loading OmniRoute...
Source of truth: ,
Last updated: 2026-05-13 β v3.8.0
produced by thinking-mode models and replays it transparently on multi-turn requests when the upstream provider requires it. This eliminates the HTTP 400 errors that strict providers raise when a client's conversation history is missing the prior turn's reasoning.
previous assistant message includes the original . The upstream returns 400 with messages like:
from the history they replay. OmniRoute restores it from a server-side cache so the request the upstream sees is consistent. Issue #1628 introduced the hybrid memory/SQLite persistence so the cache survives process restarts.
(two sites, around lines 4093 and 4380). Replay happens in after schema coercion but before dispatch.
(LRU-by-creation) backed by a SQLite table for crash recovery and dashboard visibility.
in |
||
table () |
Defaults:
)
- (
)
- first
CREATE TABLE IF NOT EXISTS reasoning_cache (
tool_call_id TEXT PRIMARY KEY,
provider TEXT NOT NULL,
model TEXT NOT NULL,
reasoning TEXT NOT NULL,
char_count INTEGER NOT NULL DEFAULT 0,
created_at TEXT NOT NULL DEFAULT (datetime('now')),
expires_at INTEGER NOT NULL
);
, , , . is stored as Unix epoch seconds; the SELECT layer normalizes legacy text values via .
returns . The function checks two lists in .
Provider IDs (exact match, case-insensitive):
Model regex patterns (case-insensitive):
. Both require management authentication ( from ).
clamped to ) |
||
GET response shape:
{
"stats": {
"memoryEntries": 12,
"dbEntries": 47,
"totalEntries": 47,
"totalChars": 138291,
"hits": 84,
"misses": 6,
"replays": 81,
"replayRate": "90.0%",
"byProvider": { "deepseek": { "entries": 32, "chars": 98412 } },
"byModel": { "deepseek-reasoner": { "entries": 32, "chars": 98412 } },
"oldestEntry": "2026-05-13T10:00:00.000Z",
"newestEntry": "2026-05-13T11:42:11.000Z"
},
"entries": [
{
"toolCallId": "call_abc",
"provider": "deepseek",
"model": "deepseek-reasoner",
"reasoning": "...",
"charCount": 3128,
"createdAt": "...",
"expiresAt": "..."
}
]
}
purges expired memory entries and runs . Health-check workers call this periodically.
- Crash recovery: After a restart, memory is empty but the DB still holds unexpired entries. The first lookup for a given
is a DB hit; subsequent lookups are memory hits.
- No reasoning, no cache:
returns when the assistant message has no / field, so non-thinking responses cost nothing.
- Non-strict providers: When
is and the target format is OpenAI, the translator strips any field from outgoing messages β OpenAI Chat Completions does not accept it.
,