E2E Performance Optimization Plan

Output

E2E Performance Optimization Plan

Konsole input → Claude processing → Kontask card output

Current Request Flow

Browser → konui:8666 → konsole:8020 → Claude subprocess → MCP tools → kontask card
   ↓           ↓              ↓                ↓                  ↓
  fetch    SSE proxy     spawn proc      stream-json         API call

1. Claude Process Spawn Latency (HIGH IMPACT)

Problem: Every prompt spawns a new claude --print process. Startup = 1-3 seconds.

Option A	Long-running daemon: Keep Claude running, pipe prompts via stdin. Eliminates cold start.
Option B	Process pool: Pre-spawn 2-3 Claude processes, round-robin assignment. Reduces wait.
Option C	Warm session cache: After first prompt, keep process alive for 5 min. Reuse for --resume.

Estimated savings: 1-2 seconds per request

2. Model Selection (MEDIUM IMPACT)

Problem: Currently uses Opus 4.5 for all requests. Overkill for simple queries.

Option A	Haiku for simple: Route "show me X" queries to Haiku, complex tasks to Opus.
Option B	User choice: Add model selector in console UI (Haiku/Sonnet/Opus toggle).

Estimated savings: 2-5 seconds for simple queries (Haiku is ~3x faster)

3. Kontask Creation Timing (MEDIUM IMPACT)

Problem: Kontask created after Claude finishes. User waits for full response before seeing card.

Option A	Streaming output: Render partial HTML as chunks arrive. Show skeleton card immediately.
Option B	Optimistic card: Create placeholder kontask at request start, update when complete.

Estimated savings: Perceived latency reduced by showing progress earlier

4. SSE Proxy Overhead (LOW-MEDIUM IMPACT)

Problem: Browser → konui → konsole adds hop. Each event is JSON parsed twice.

Option A	Direct connection: Browser connects to konsole:8020 directly (CORS config).
Option B	Binary protocol: Use WebSocket with MessagePack instead of SSE/JSON.

Estimated savings: 50-100ms per event, more noticeable with many tool calls

5. Prompt Wrapping Overhead (LOW IMPACT)

Problem: wrapPromptWithInstructions() adds ~500 chars of kontask instructions to every prompt.

Option A	System prompt: Move kontask instructions to system prompt (once per session).
Option B	Conditional: Only add instructions when response likely needs kontask.

Estimated savings: ~100ms token processing

6. Tool Call Latency (VARIABLE)

Problem: Some requests make 5-10 tool calls (Bash, Read, etc). Each adds ~200ms.

Option A	Parallel tools: Claude already does this, but ensure no serialization in stream parsing.
Option B	Caching: Cache frequent lookups (wf-list, suburbs.json) in memory.

Estimated savings: 500ms-2s for data-heavy requests

Recommended Priority Order

Process warm cache - Biggest win, medium complexity. Keep Claude alive between prompts.
Model selection - Easy win. Add --model haiku option for simple queries.
Optimistic kontask card - UX improvement. Show skeleton immediately.
Data caching - Cache wf-list results for 60 seconds in memory.
Direct konsole connection - Removes proxy hop, requires CORS setup.

Current vs Target Latency

Request Type	Current	Target	Savings
Simple query (e.g., "Show status")	6-10s	2-4s	~60%
Data lookup (e.g., "Tell me about Broadbeach")	8-15s	4-8s	~50%
Complex task (e.g., "Run all tests")	30-60s	25-50s	~20%

Quick Actions

Original Request

No layout configured

Details

Type General

Status Completed

Created 1 Jan 2026, 11:08 pm

Updated 2 Jan 2026, 3:10 pm

Created By claude

Raw Data

{
  "id": "671159f1-8c3a-40ee-a9fd-93c12fec8a65",
  "type": "general",
  "status": "completed",
  "title": "E2E Performance Optimization Plan",
  "description": "Analysis of konsole→Claude→kontask latency with improvement opportunities",
  "context": {
    "output": "<div style=\"background:#1e1e1e;color:#e0e0e0;padding:20px;border-radius:8px;font-family:sans-serif;\">\n<h2 style=\"color:#fff;margin:0 0 8px 0;\">E2E Performance Optimization Plan</h2>\n<p style=\"color:#7f848e;margin:0 0 20px 0;\">Konsole input → Claude processing → Kontask card output</p>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;\">\n<h3 style=\"color:#61afef;margin:0 0 12px 0;font-size:1rem;\">Current Request Flow</h3>\n<pre style=\"background:#1a1a1a;padding:12px;border-radius:4px;font-family:monospace;font-size:0.8rem;overflow-x:auto;line-height:1.5;margin:0;\"><span style=\"color:#7f848e;\">Browser</span> → <span style=\"color:#56b6c2;\">konui:8666</span> → <span style=\"color:#e06c75;\">konsole:8020</span> → <span style=\"color:#c678dd;\">Claude subprocess</span> → <span style=\"color:#98c379;\">MCP tools</span> → <span style=\"color:#e5c07b;\">kontask card</span>\n   ↓           ↓              ↓                ↓                  ↓\n<span style=\"color:#7f848e;\">  fetch    SSE proxy     spawn proc      stream-json         API call</span></pre>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;border-left:4px solid #e06c75;\">\n<h3 style=\"color:#e06c75;margin:0 0 12px 0;font-size:1rem;\">1. Claude Process Spawn Latency (HIGH IMPACT)</h3>\n<p style=\"color:#e0e0e0;margin:0 0 12px 0;font-size:0.9rem;\"><strong>Problem:</strong> Every prompt spawns a new <code style=\"background:#1a1a1a;padding:2px 6px;border-radius:3px;\">claude --print</code> process. Startup = 1-3 seconds.</p>\n<table style=\"width:100%;border-collapse:collapse;font-size:0.85rem;\">\n<tr style=\"border-bottom:1px solid #444;\">\n<td style=\"padding:8px 0;color:#98c379;width:100px;\">Option A</td>\n<td style=\"color:#e0e0e0;\"><strong>Long-running daemon:</strong> Keep Claude running, pipe prompts via stdin. Eliminates cold start.</td>\n</tr>\n<tr style=\"border-bottom:1px solid #444;\">\n<td style=\"padding:8px 0;color:#98c379;\">Option B</td>\n<td style=\"color:#e0e0e0;\"><strong>Process pool:</strong> Pre-spawn 2-3 Claude processes, round-robin assignment. Reduces wait.</td>\n</tr>\n<tr>\n<td style=\"padding:8px 0;color:#98c379;\">Option C</td>\n<td style=\"color:#e0e0e0;\"><strong>Warm session cache:</strong> After first prompt, keep process alive for 5 min. Reuse for --resume.</td>\n</tr>\n</table>\n<p style=\"color:#7f848e;margin:12px 0 0 0;font-size:0.8rem;\"><strong>Estimated savings:</strong> 1-2 seconds per request</p>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;border-left:4px solid #e5c07b;\">\n<h3 style=\"color:#e5c07b;margin:0 0 12px 0;font-size:1rem;\">2. Model Selection (MEDIUM IMPACT)</h3>\n<p style=\"color:#e0e0e0;margin:0 0 12px 0;font-size:0.9rem;\"><strong>Problem:</strong> Currently uses Opus 4.5 for all requests. Overkill for simple queries.</p>\n<table style=\"width:100%;border-collapse:collapse;font-size:0.85rem;\">\n<tr style=\"border-bottom:1px solid #444;\">\n<td style=\"padding:8px 0;color:#98c379;width:100px;\">Option A</td>\n<td style=\"color:#e0e0e0;\"><strong>Haiku for simple:</strong> Route \"show me X\" queries to Haiku, complex tasks to Opus.</td>\n</tr>\n<tr>\n<td style=\"padding:8px 0;color:#98c379;\">Option B</td>\n<td style=\"color:#e0e0e0;\"><strong>User choice:</strong> Add model selector in console UI (Haiku/Sonnet/Opus toggle).</td>\n</tr>\n</table>\n<p style=\"color:#7f848e;margin:12px 0 0 0;font-size:0.8rem;\"><strong>Estimated savings:</strong> 2-5 seconds for simple queries (Haiku is ~3x faster)</p>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;border-left:4px solid #98c379;\">\n<h3 style=\"color:#98c379;margin:0 0 12px 0;font-size:1rem;\">3. Kontask Creation Timing (MEDIUM IMPACT)</h3>\n<p style=\"color:#e0e0e0;margin:0 0 12px 0;font-size:0.9rem;\"><strong>Problem:</strong> Kontask created after Claude finishes. User waits for full response before seeing card.</p>\n<table style=\"width:100%;border-collapse:collapse;font-size:0.85rem;\">\n<tr style=\"border-bottom:1px solid #444;\">\n<td style=\"padding:8px 0;color:#98c379;width:100px;\">Option A</td>\n<td style=\"color:#e0e0e0;\"><strong>Streaming output:</strong> Render partial HTML as chunks arrive. Show skeleton card immediately.</td>\n</tr>\n<tr>\n<td style=\"padding:8px 0;color:#98c379;\">Option B</td>\n<td style=\"color:#e0e0e0;\"><strong>Optimistic card:</strong> Create placeholder kontask at request start, update when complete.</td>\n</tr>\n</table>\n<p style=\"color:#7f848e;margin:12px 0 0 0;font-size:0.8rem;\"><strong>Estimated savings:</strong> Perceived latency reduced by showing progress earlier</p>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;border-left:4px solid #c678dd;\">\n<h3 style=\"color:#c678dd;margin:0 0 12px 0;font-size:1rem;\">4. SSE Proxy Overhead (LOW-MEDIUM IMPACT)</h3>\n<p style=\"color:#e0e0e0;margin:0 0 12px 0;font-size:0.9rem;\"><strong>Problem:</strong> Browser → konui → konsole adds hop. Each event is JSON parsed twice.</p>\n<table style=\"width:100%;border-collapse:collapse;font-size:0.85rem;\">\n<tr style=\"border-bottom:1px solid #444;\">\n<td style=\"padding:8px 0;color:#98c379;width:100px;\">Option A</td>\n<td style=\"color:#e0e0e0;\"><strong>Direct connection:</strong> Browser connects to konsole:8020 directly (CORS config).</td>\n</tr>\n<tr>\n<td style=\"padding:8px 0;color:#98c379;\">Option B</td>\n<td style=\"color:#e0e0e0;\"><strong>Binary protocol:</strong> Use WebSocket with MessagePack instead of SSE/JSON.</td>\n</tr>\n</table>\n<p style=\"color:#7f848e;margin:12px 0 0 0;font-size:0.8rem;\"><strong>Estimated savings:</strong> 50-100ms per event, more noticeable with many tool calls</p>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;border-left:4px solid #56b6c2;\">\n<h3 style=\"color:#56b6c2;margin:0 0 12px 0;font-size:1rem;\">5. Prompt Wrapping Overhead (LOW IMPACT)</h3>\n<p style=\"color:#e0e0e0;margin:0 0 12px 0;font-size:0.9rem;\"><strong>Problem:</strong> <code style=\"background:#1a1a1a;padding:2px 6px;border-radius:3px;\">wrapPromptWithInstructions()</code> adds ~500 chars of kontask instructions to every prompt.</p>\n<table style=\"width:100%;border-collapse:collapse;font-size:0.85rem;\">\n<tr style=\"border-bottom:1px solid #444;\">\n<td style=\"padding:8px 0;color:#98c379;width:100px;\">Option A</td>\n<td style=\"color:#e0e0e0;\"><strong>System prompt:</strong> Move kontask instructions to system prompt (once per session).</td>\n</tr>\n<tr>\n<td style=\"padding:8px 0;color:#98c379;\">Option B</td>\n<td style=\"color:#e0e0e0;\"><strong>Conditional:</strong> Only add instructions when response likely needs kontask.</td>\n</tr>\n</table>\n<p style=\"color:#7f848e;margin:12px 0 0 0;font-size:0.8rem;\"><strong>Estimated savings:</strong> ~100ms token processing</p>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;margin-bottom:16px;border-left:4px solid #7f848e;\">\n<h3 style=\"color:#7f848e;margin:0 0 12px 0;font-size:1rem;\">6. Tool Call Latency (VARIABLE)</h3>\n<p style=\"color:#e0e0e0;margin:0 0 12px 0;font-size:0.9rem;\"><strong>Problem:</strong> Some requests make 5-10 tool calls (Bash, Read, etc). Each adds ~200ms.</p>\n<table style=\"width:100%;border-collapse:collapse;font-size:0.85rem;\">\n<tr style=\"border-bottom:1px solid #444;\">\n<td style=\"padding:8px 0;color:#98c379;width:100px;\">Option A</td>\n<td style=\"color:#e0e0e0;\"><strong>Parallel tools:</strong> Claude already does this, but ensure no serialization in stream parsing.</td>\n</tr>\n<tr>\n<td style=\"padding:8px 0;color:#98c379;\">Option B</td>\n<td style=\"color:#e0e0e0;\"><strong>Caching:</strong> Cache frequent lookups (wf-list, suburbs.json) in memory.</td>\n</tr>\n</table>\n<p style=\"color:#7f848e;margin:12px 0 0 0;font-size:0.8rem;\"><strong>Estimated savings:</strong> 500ms-2s for data-heavy requests</p>\n</div>\n\n<div style=\"background:#22c55e20;padding:16px;border-radius:6px;margin-bottom:16px;border:1px solid #22c55e;\">\n<h3 style=\"color:#22c55e;margin:0 0 12px 0;font-size:1rem;\">Recommended Priority Order</h3>\n<ol style=\"margin:0;padding-left:20px;color:#e0e0e0;line-height:1.8;\">\n<li><strong>Process warm cache</strong> - Biggest win, medium complexity. Keep Claude alive between prompts.</li>\n<li><strong>Model selection</strong> - Easy win. Add <code style=\"background:#1a1a1a;padding:2px 4px;border-radius:3px;\">--model haiku</code> option for simple queries.</li>\n<li><strong>Optimistic kontask card</strong> - UX improvement. Show skeleton immediately.</li>\n<li><strong>Data caching</strong> - Cache wf-list results for 60 seconds in memory.</li>\n<li><strong>Direct konsole connection</strong> - Removes proxy hop, requires CORS setup.</li>\n</ol>\n</div>\n\n<div style=\"background:#2d2d2d;padding:16px;border-radius:6px;\">\n<h3 style=\"color:#fff;margin:0 0 12px 0;font-size:1rem;\">Current vs Target Latency</h3>\n<table style=\"width:100%;border-collapse:collapse;font-size:0.9rem;\">\n<tr style=\"border-bottom:2px solid #444;background:#1a1a1a;\">\n<th style=\"text-align:left;padding:10px;\">Request Type</th>\n<th style=\"text-align:right;padding:10px;\">Current</th>\n<th style=\"text-align:right;padding:10px;\">Target</th>\n<th style=\"text-align:right;padding:10px;\">Savings</th>\n</tr>\n<tr style=\"border-bottom:1px solid #333;\">\n<td style=\"padding:10px;\">Simple query (e.g., \"Show status\")</td>\n<td style=\"padding:10px;text-align:right;color:#e06c75;\">6-10s</td>\n<td style=\"padding:10px;text-align:right;color:#98c379;\">2-4s</td>\n<td style=\"padding:10px;text-align:right;color:#e5c07b;\">~60%</td>\n</tr>\n<tr style=\"border-bottom:1px solid #333;\">\n<td style=\"padding:10px;\">Data lookup (e.g., \"Tell me about Broadbeach\")</td>\n<td style=\"padding:10px;text-align:right;color:#e06c75;\">8-15s</td>\n<td style=\"padding:10px;text-align:right;color:#98c379;\">4-8s</td>\n<td style=\"padding:10px;text-align:right;color:#e5c07b;\">~50%</td>\n</tr>\n<tr>\n<td style=\"padding:10px;\">Complex task (e.g., \"Run all tests\")</td>\n<td style=\"padding:10px;text-align:right;color:#e06c75;\">30-60s</td>\n<td style=\"padding:10px;text-align:right;color:#98c379;\">25-50s</td>\n<td style=\"padding:10px;text-align:right;color:#e5c07b;\">~20%</td>\n</tr>\n</table>\n</div>\n\n</div>",
    "requestedAt": "2026-01-01T13:10:00.000Z",
    "requestId": "982b4b21-b75a-440b-a196-c255c2616e78"
  },
  "createdBy": "claude",
  "createdAt": "2026-01-01T13:08:21.223Z",
  "updatedAt": "2026-01-02T05:10:54.071Z",
  "requestId": "982b4b21-b75a-440b-a196-c255c2616e78"
}