How Ren works

Ren is a small, local-first FastAPI service that turns plain intent into action through Claude and a pluggable tool system — with your memory and identity kept on the device it runs on.

From a sentence to an action

STEP 1

A request arrives

You talk to Ren over HTTP (POST /chat, or SSE /chat/stream) or the /ws/audio voice socket. It’s a small local FastAPI service — by default bound to 127.0.0.1.

STEP 2

The agent assembles context

It loads recent turns from on-device memory and a cached system prompt (a stable base framework + your persona), then calls Claude at the requested tier.

STEP 3

Bounded multi-round tools

Claude can request tools; the registry dispatches them (async) and feeds results back, up to a round budget. If the budget is hit, one final call forces a written answer.

STEP 4

Persist & reply

The turn is written to the local SQLite store (WAL), older context is compacted into a rolling summary, and the reply streams back to you. Nothing leaves the device but the Claude call itself.

curl -s localhost:8000/chat \
  -H 'content-type: application/json' \
  -d '{"message": "dim the living room to movie night"}'
# → Ren picks a tier, runs the home tools, persists the turn, and replies.

Small parts, clean seams

Each component does one thing and stays replaceable — the design is built to grow without rewrites.

FastAPI surface

The HTTP/WebSocket service — /health, /chat, /memory, /threads, /identity, /ws/audio — with lifespan-managed state.

Agent loop

History + system prompt → Claude → bounded multi-round tool calls → persist. The heart of every turn.

Tool registry

A plugin registry with a dangerous gate. Register a tool (name, JSON schema, sync/async handler) and nothing else changes.

Local memory

Async SQLite (WAL, migrations) on your device — conversation turns and durable notes. Gitignored; never synced.

Identity

A local ed25519 keypair; attest() returns a self-signed identity card — the seam for a future trust network.

Model tiers

Code asks for fast / default / hard; one place maps intent → Claude model. A tiny heuristic, the router seam.

Home engine

A provider abstraction over Home Assistant, a generic webhook, and the capability-model connectors — rooms, scenes, automations, a state watcher.

Voice pipeline

Optional, CPU-only: Whisper STT, Piper TTS, Silero VAD, and wake-word — loaded only when you enable it.


The principles it holds to

Private by default

Memory is a SQLite file on your machine. Nothing about Ren requires a cloud account beyond the Claude API call itself.

Hardware-agnostic now, sovereign later

Ren runs anywhere Python runs. Only identity/ knows about hardware — today a key file; later, silicon-rooted, with no caller changes.

Speak in intent, not model ids

Callers ask for fast/default/hard. The mapping lives in one place — the seam where a real router will later live.

Configurable persona

The system prompt is a stable base framework plus your name + free-text overlay, composed once at startup as a prompt-cache anchor.

Under the hood: the tool loop is bounded (a max number of rounds, then a final call forces a written answer), and the prompt uses two cache breakpoints — the static system prefix and the latest message — so repeated tool rounds and shared context read from cache instead of being reprocessed.

Go deeper