Troubleshooting the Dev Environment

Start every triage with the same three commands:

kilter status    # pod states — look for CrashLoopBackOff, ImagePullBackOff
kilter logs <service>
kilter env       # verify connection strings and ports

Symptom → fix

Symptom	Likely cause	Fix
Cluster dead after reboot	KIND containers stop on host reboot	`kilter up` — it detects the dead cluster and recreates it
Pod in `CrashLoopBackOff` (postgres)	Data directory corruption	`kilter db reset`
Pod in `CrashLoopBackOff` (app)	Missing env vars	Check `kilter env` and your `.env` file
Pod in `CrashLoopBackOff` (app, schema errors in logs)	Schema drift — code expects tables that don't exist	`kilter db migrate`
`ImagePullBackOff`	KIND can't pull private images	Use a public image — check `kilter catalog` for defaults
Port already in use	Another process holds a kilter-allocated port	Find it via `kilter env`, kill the process, then `kilter down && kilter up`
"Connection refused" app → service	Service pod down, or app using localhost instead of cluster DNS	`kilter status` + `kilter logs <service>`; in-cluster URLs are `<service>.<namespace>.svc:<port>`
Code edits don't reach the pod	Tilt `live_update` sync rules miss your directory	See below
Login/Keto 500s after DB reset	Stale Ory `nid` cache	See below

Edits not syncing into the pod

This fails silently: Tilt runs fine, the pod stays up, but edits in an unsynced directory never propagate — while files that are in the sync list (like package.json) still hot-reload, masking the bug.

Diagnose by comparing mtimes: edit a file, then stat it on the host and inside the pod (kubectl exec ... -- stat /app/<file> using the project kubeconfig at ~/.cache/kilter/<name>/kubeconfig). Host recent + pod stale confirms it.

Fix: re-run kilter render so the generator emits sync rules matching your layout. If you've ejected the Tiltfile, splice the missing sync() lines in yourself, and confirm every top-level source directory appears in live_update.

Ory 500s after postgres restart or `kilter db reset`

Ory services (Kratos, Keto, Hydra) cache a network ID (nid) in memory. A DB reset invalidates it and every Ory call starts returning 500 (foreign-key errors on nid in the logs).

Liveness probes auto-restart the pods within about 90 seconds. For immediate recovery:

export KUBECONFIG=~/.cache/kilter/<name>/kubeconfig
kubectl rollout restart deployment ory-kratos ory-keto ory-hydra -n <name>-dev

Restart vs stop vs destroy

Goal	Command	What survives
One stuck pod	`kilter restart <service>`	Everything else
Pause Tilt briefly	`kilter down`	Cluster and pods keep running
End of day, free RAM	`kilter stop`	State preserved on disk; `kilter up` resumes in ~30s
Corrupted state, start over	`kilter destroy && kilter up`	Nothing — full recreate

Destroy is the last resort

kilter destroy loses all cluster state including the database. Try kilter restart, kilter db reset, or the Ory rollout-restart first — most "everything is broken" states have a narrower fix.

Disk filling up

Tilt loads a fresh app image into KIND on every rebuild; node disk grows until it fills the host. Reclaim without tearing down:

kilter prune            # prunes kilter-built images, keeps recent builds
kilter prune --dry-run  # preview first

Still stuck: go direct

export KUBECONFIG=~/.cache/kilter/<name>/kubeconfig
kubectl describe pod <pod> -n <name>-dev          # events, restart reasons
kubectl get events -n <name>-dev --sort-by=.lastTimestamp