Leontes: Self-Hosted AI Agent for Windows | Open Source

An AI agent that thinks in stages, acts before you ask, and writes its own tools. Self-hosted. Open source. Windows.

Not another chatbot wrapper. A neuroscience-inspired cognitive system that perceives, remembers, plans, acts, and learns.

How Leontes thinks

Most agents are a while-loop around a chat API. Leontes runs a 5-stage pipeline inspired by Global Workspace Theory and Kahneman's dual-process model.

1

Perceive

Extract entities, classify intent, detect urgency. No LLM needed. Fast pattern matching only.

2

Enrich

Search four memory types. Resolve "Sarah" to a real person via the knowledge graph.

3

Plan

LLM picks tools and strategy. Can pause here to ask you a question, then resume.

4

Execute

Stream the response. Call tools. If the server crashes, it picks up from the last checkpoint.

5

Reflect

Store what worked, who you mentioned, what you prefer. Next time you ask about Sarah, it already knows she's on the Alpha team.

Each stage checkpoints its state. If the server crashes mid-pipeline, it picks up where it left off. Every decision is traced. Ask "why did you do that?" and get a real answer. Built on Microsoft Agent Framework Workflows.

Two brains, one agent

Inspired by Kahneman's System 1 / System 2 model. Most OS events are handled by fast reflexes. The "conscious mind" only activates when something surprising happens.

System 1: Sentinel

Fast, local, free. Watches file downloads, clipboard, calendar, and active windows. Applies heuristic filters: regex, frequency analysis, time rules. No LLM calls. Handles most events by reflex.

🧠

System 2: Thinking Pipeline

Slow, deliberate, powerful. The full 5-stage cognitive pipeline. Only triggered when System 1 detects something it can't handle alone. Your agent notices when you copy an IBAN and asks if you want to find the matching invoice.

What it can do

Modules that work together. Not features bolted on top of an LLM.

🧬

Hierarchical Memory

Ask about a meeting from two weeks ago and it remembers. Mention Sarah and it knows she's on the Alpha team. Four memory types (working, episodic, semantic, procedural) in PostgreSQL with pgvector.

🔗

Synapse Graph

Knowledge graph linking people, files, and projects. "Send the report to Sarah" finds the right person, file, and channel. Graph-augmented retrieval, not flat vector search.

🪟

Structural Vision

"What error is showing in that dialog?" It reads the UI tree via Windows UI Automation and answers from structure, not screenshots. Password fields and excluded apps are never captured.

⚙️

Tool Forge

The agent writes, compiles (Roslyn), tests, and registers new tools at runtime. You approve before anything runs. Unused tools are pruned automatically.

💬

Proactive Communication

The agent can send notifications, ask mid-task questions, request permissions, and stream progress updates. Not just reactive chat.

📊

Confidence Scoring

Every response has a confidence score (0 to 1). When the agent is unsure, it asks. When it's confident, it acts. Ask "why did you do that?" and see the full decision trace.

💰

Cost Aware

Token budgets per feature. Automatic model routing: small model for simple tasks, large for complex reasoning. Background tasks throttle first; your chat never silently blocks.

🛡️

Resilient

LLM goes down? Sentinel heuristics, local tools, and memory retrieval keep working. Bounded queues with backpressure. Each pipeline stage degrades independently instead of failing the whole request.

📡

Multi-Channel

CLI, Signal (E2E encrypted), Telegram (Bot API). Same brain, same memory. Talk from your terminal or message from your phone.

🌐

Open Protocols

AG-UI for web frontends (CopilotKit compatible). MCP to connect external tool servers. A2A for agent-to-agent delegation. All three industry standards.

🔍

Observable

Why did it suggest that? Open the trace. Per-stage timing, decision records, token usage, confidence scores. Replay any interaction and see exactly what it considered before choosing.

🎭

Agent Persona

Personality, tone, and boundaries defined in a plain Markdown file. Two model tiers: large for deep reasoning, small for fast summaries. Per-stage temperature. Budget pressure automatically routes tasks to the cheaper tier.

Three hosts, one brain

Built on .NET 10, PostgreSQL 17 + pgvector, and the Microsoft Agent Framework. Clean architecture with dependency flowing inward only.

Leontes.Api

The brain. Thinking Pipeline, HTTP endpoints, SSE streaming, auto-migration, rate limiting. Handles chat from CLI, Signal, and Telegram.

Leontes.Worker

The senses. Windows Service running Sentinel monitors and messaging bridges. Watches your OS and forwards events to the brain.

Leontes.Cli

The voice. dotnet tool installed as leontes. Chat, setup wizard, privacy controls, budget dashboard, telemetry viewer.

Inspired by: Global Workspace Theory (Dehaene), Dual-Process Theory (Kahneman), Generative Agents (Park et al.), Voyager (Wang et al.), Free Energy Principle (Friston).

Active development

17 features, 15 built, 2 specified

The 5-stage Thinking Pipeline, Hierarchical Memory, Sentinel, Structural Vision, Persona, Resilience, Observability, and Cost Control are implemented end-to-end against a real LLM. Tool Forge and the AG-UI / MCP / A2A protocol layer remain specified and are next on the roadmap.

Follow progress on GitHub

Want to build on this?

The spec is public and PRs are welcome. Found a gap? Have a use case? Reach out or open an issue.