Troubleshooting the Dev Environment
Start every triage with the same three commands:
kilter status # pod states — look for CrashLoopBackOff, ImagePullBackOff
kilter logs <service>
kilter env # verify connection strings and portsSymptom → fix
| Symptom | Likely cause | Fix |
|---|---|---|
| Cluster dead after reboot | KIND containers stop on host reboot | kilter up — it detects the dead cluster and recreates it |
Pod in CrashLoopBackOff (postgres) | Data directory corruption | kilter db reset |
Pod in CrashLoopBackOff (app) | Missing env vars | Check kilter env and your .env file |
Pod in CrashLoopBackOff (app, schema errors in logs) | Schema drift — code expects tables that don't exist | kilter db migrate |
ImagePullBackOff | KIND can't pull private images | Use a public image — check kilter catalog for defaults |
| Port already in use | Another process holds a kilter-allocated port | Find it via kilter env, kill the process, then kilter down && kilter up |
| "Connection refused" app → service | Service pod down, or app using localhost instead of cluster DNS | kilter status + kilter logs <service>; in-cluster URLs are <service>.<namespace>.svc:<port> |
| Code edits don't reach the pod | Tilt live_update sync rules miss your directory | See below |
| Login/Keto 500s after DB reset | Stale Ory nid cache | See below |
Edits not syncing into the pod
This fails silently: Tilt runs fine, the pod stays up, but edits in an unsynced directory never propagate — while files that are in the sync list (like package.json) still hot-reload, masking the bug.
Diagnose by comparing mtimes: edit a file, then stat it on the host and inside the pod (kubectl exec ... -- stat /app/<file> using the project kubeconfig at ~/.cache/kilter/<name>/kubeconfig). Host recent + pod stale confirms it.
Fix: re-run kilter render so the generator emits sync rules matching your layout. If you've ejected the Tiltfile, splice the missing sync() lines in yourself, and confirm every top-level source directory appears in live_update.
Ory 500s after postgres restart or kilter db reset
Ory services (Kratos, Keto, Hydra) cache a network ID (nid) in memory. A DB reset invalidates it and every Ory call starts returning 500 (foreign-key errors on nid in the logs).
Liveness probes auto-restart the pods within about 90 seconds. For immediate recovery:
export KUBECONFIG=~/.cache/kilter/<name>/kubeconfig
kubectl rollout restart deployment ory-kratos ory-keto ory-hydra -n <name>-devRestart vs stop vs destroy
| Goal | Command | What survives |
|---|---|---|
| One stuck pod | kilter restart <service> | Everything else |
| Pause Tilt briefly | kilter down | Cluster and pods keep running |
| End of day, free RAM | kilter stop | State preserved on disk; kilter up resumes in ~30s |
| Corrupted state, start over | kilter destroy && kilter up | Nothing — full recreate |
kilter destroy loses all cluster state including the database. Try kilter restart, kilter db reset, or the Ory rollout-restart first — most "everything is broken" states have a narrower fix.
Disk filling up
Tilt loads a fresh app image into KIND on every rebuild; node disk grows until it fills the host. Reclaim without tearing down:
kilter prune # prunes kilter-built images, keeps recent builds
kilter prune --dry-run # preview firstStill stuck: go direct
export KUBECONFIG=~/.cache/kilter/<name>/kubeconfig
kubectl describe pod <pod> -n <name>-dev # events, restart reasons
kubectl get events -n <name>-dev --sort-by=.lastTimestamp