I built an AI coworker on Tuesday. I'm rebuilding him on Thursday.

This is a short story about what I set up, what went wrong, and what Claude and I talked through on Thursday that made me rethink the whole thing. Sheldon is still very much being built — none of this is a retrospective on a finished project. I just figured it was worth writing down while the decisions are fresh.

Sheldon, physically

Sheldon lives on a Mac mini that sits in a home-office room at our place in San Mateo, plugged in, always on. The mini isn't anyone's computer — it's his computer. He has his own email on our 312school.com domain. He has his own 1Password account with his own vault, where his credentials live. He has his own GitHub user, his own Slack user, and a fresh Claude API key generated just for him. When he commits something, the commit author is Sheldon. When he posts in a channel, he shows up as a user with an avatar, not a webhook.

I did the accounts-first thing before any of the software setup because I wanted him to be a real entity in our systems rather than a script pretending to be a bot. Even if nothing else worked, he'd at least have a proper seat.

Picking a runtime

Once the accounts existed, the question was what to actually run on the mini. There are a bunch of options — LettaBot, NanoClaw, CrewAI, build-your-own-with-the-SDK — but by early April it was kind of obvious where I'd start. I went with OpenClaw.

If you've missed the OpenClaw thing

OpenClaw is an open-source AI agent that lives on your machine. You install it, point it at an LLM (Sheldon uses Claude), plug it into your messaging apps, and it runs as a background process — always on, responsive, holding context.

It went viral in a way that no other agent framework has. 100,000 GitHub stars in its first week in late January 2026. 163,000 a few months later. Everyone I know has tried it or is about to try it.

I think the secret is that it doesn't feel like a framework. Most agent tooling feels like you're wiring up parts. OpenClaw feels like there's somebody home. It ships with opinions about voice, personality, and memory. It uses first-person when it talks to you. It has a SOUL.md where you describe who it is. It has a heartbeat — a scheduler that fires on a cron and makes it proactive.

What Sheldon is made of

Once OpenClaw was installed and running with Sheldon's credentials, the thing that made him feel real was that everything is markdown files. His identity, memory, operating rules, team directory, tools he knows about — all of it sitting in ~/.openclaw/ on the mini. You can cat them. You can edit them. The whole state of "who Sheldon is" is legible to a human.

Sheldon's workspace

~/.openclaw/workspace/

├── 📄 IDENTITY.md who I am, role, metadata

├── 💎 SOUL.md personality, voice, values

├── 📄 AGENTS.md operating rules, authority, git workflow

├── 👥 USER.md the team — who each of us is

├── 🔧 TOOLS.md internal APIs, endpoints, credentials

├── 📄 priorities.md what we're focused on this week

├── 📂 skills/ custom skills (each a folder)

└── 📂 memory/ daily notes, things he's learned

├── 📄 2026-04-14.md

└── 📄 2026-04-15.md

Every time Sheldon is asked to do something, his context is reassembled from these files. The LLM itself doesn't remember anything between calls — the filesystem does. Each response starts with "who am I and what have I been doing?" being reconstructed from disk. It's a pretty elegant trick. Hold onto this detail, it becomes the crux of everything later.

Putting the managed files under git

What bothered me on Tuesday was that all of this was sitting on one computer. If the mini died, Sheldon's identity died with it. If I edited IDENTITY.md directly on the mini, there was no history of what changed. It had that "SSH'd into prod to fix a config" smell.

So I created a private repo called sheldon-workspace and wrote a sync script that ran every five minutes. The repo became the source of truth for the managed files — identity, operating rules, priorities, tools, the team directory. Anything describing who Sheldon is.

Here's what the sync did on each run:

Sync cycle — every 5 minutes

Copy out

Workspace files → cloned repo folder

Capture changes

If anything changed, commit to a branch, push, and open a PR for review

Reset to main

Hard reset the cloned repo folder to whatever main says

Copy in

Repo folder → back into workspace, overwriting whatever's there

The practical effect of steps 3 and 4: main is the only persistent state. Any change Sheldon makes locally survives at most five minutes. It gets captured in a PR — so nothing's lost — but the workspace reverts to whatever main says until I merge. That's the design. It's what makes the repo authoritative.

One thing I liked about this setup: I could tell Sheldon directly in Slack "update your identity file, add this to your priorities," and he'd edit the file. Five minutes later the sync would auto-commit it to a branch and open a PR. I'd review, merge, and now the change was in the source of truth. No SSH, no direct file editing, just chat.

It worked well. Until it didn't.

The day his files said one thing and he believed another

Wednesday morning I asked Sheldon in Slack to update his identity file. His IDENTITY.md said he was the AI Product Manager at 312 School — I wanted to expand that to AI Product Manager & Applied AI Research Assistant at 312 School. He said okay, made the change. I went on with my day.

A few hours later I noticed he was still introducing himself as Research Assistant. I checked the file. It said AI Product Manager at 312 School — the old version. The sync had done exactly what it was supposed to do: the change hadn't been merged to main yet because I hadn't merged the auto-PR, so the next sync reverted the workspace. Sheldon's edit had lasted maybe four minutes.

Fine, that's on me for not merging. But here's the part that stopped me cold: Sheldon still believed he was also a research assistant. In Slack he kept referencing the expanded role, offering research help, acting on the version of himself that no longer existed on disk. The file said one thing. His running session believed another.

The divergence

On disk

IDENTITY.md (after sync)

AI Product Manager at 312 School

In session

What Sheldon believed

AI Product Manager & Applied AI Research Assistant at 312 School

Restart didn't fix it. Nightly reset didn't fix it.

I restarted his Gateway — the front-end process OpenClaw uses to receive messages and dispatch them to the LLM. No change. He came back up with the same divergence. The nightly reset at 4am ran, no change. The file was one input among several, and it wasn't winning.

A pricing change I'd missed

I kept thinking about the setup on Wednesday without doing much. Something about it wasn't sitting right. On Thursday I sat down with Claude and did it properly — research, walked through alternatives, stress-tested the thinking. Two things came out of that.

The first was an aside, but it's worth flagging because it surprised me. On April 4, Anthropic emailed everyone on Claude Pro and Max subscriptions to say that subscription limits would no longer cover third-party harnesses like OpenClaw. Usage now bills at full API rates, separate bucket. Anyone running OpenClaw hoping to piggyback on a monthly subscription is now paying per-token.

Not OpenClaw-specific — any long-running harness is in the same boat. Annoying, but not the real problem.

Why the architecture doesn't hold up

The real problem was the architecture.

OpenClaw's memory model is: load everything into the prompt, every call. Fresh install, maybe 50KB of context per call. After a few weeks of real use, you're pushing 100KB — session history alone can hit 28k tokens. That's not a bug, that's the architecture working as designed.

Here's the thing: LLMs are not good at long prompts. Context windows keep getting bigger — 200k tokens now, a million on some models — and the marketing makes it sound like that means the model can just absorb everything. In practice, the more you stuff in, the worse the model gets at following any specific piece of it. Old instructions slide out of focus. Recent noise takes over. It's not forgetting exactly — it's more like the model's attention gets spread thin, and older guidance stops landing the way it used to.

Context growth over time — instructions get drowned

Day 1

~12k tokens

Rules land ✓

Week 1

~45k tokens ← instructions here

Mostly fine

Week 2

~75k tokens ← buried

Attention drifts

Week 3+

~100k+ tokens ← drowned

Rules ignored

I saw a post this week from someone at Meta Superintelligence Labs who was using OpenClaw to manage their inbox. They'd told it explicitly: read my emails, summarize, do not act on them. It worked fine at first. Then a few days in, the agent started deleting emails. The "don't delete" instruction was still technically in the context, but it had been buried under days of new instructions, summaries, and conversation, and the model just… stopped weighting it. The rule didn't fail because it was removed. It failed because it got drowned.

The magic is real. The foundations aren't.

OpenClaw is a brilliant demo and I don't want to dunk on it. But the more I worked with it, the more it felt like a weekend project that went viral on first impressions rather than a system someone designed for the long run.

The pivot

My first instinct was "pick a different agent framework." NanoClaw and LettaBot were both reasonable options. But swapping frameworks was solving the wrong problem. Every framework I looked at wanted to own Sheldon's identity and memory inside itself. I'd be swapping one opaque blob for a cleaner one, and in six months I'd hit the same class of drift in a new form.

What I actually wanted was simpler: a runtime that reads Sheldon's identity from the repo when a session starts, acts, writes back, and shuts down. Stateless worker. If I stop liking the runtime, I swap it. The identity doesn't move.

That's what led me to Claude Managed Agents. A rough analogy for how it differs from OpenClaw: OpenClaw is like hosting on an EC2 instance — one machine that's always on, holding everything in its head. Managed Agents is more like Lambda — spin up on demand, do the work, shut down. Same job, completely different shape.

OpenClaw 🖥️

Like EC2

Always on. One machine. Everything in its head. Identity, memory, and runtime tangled together.

Managed Agents ⚡

Like Lambda

Spin up on demand. Read identity from repo. Act. Write back. Shut down. Pay per second.

The new stack is Managed Agents as Sheldon's brain, a Slack Bolt listener on the Mac mini to bridge Slack into the API, Claude Cowork for heavier computer-use tasks, and the same repo sitting underneath all of it. The one thing I got right on Tuesday is the one thing that makes Thursday's pivot cheap.

The OpenClaw files — IDENTITY.md, USER.md, all of them — stay. They're good files. It's just that now they get read by Managed Agents instead of OpenClaw, and I'm building my own mechanism around the repo to handle the rest: task management, and a nightly job that has Sheldon save important memories by opening a PR from a new branch. I get to review and approve any identity or memory changes before they land in main — which means me or the team can clean things up, add context, or reject anything that drifted. Human in the loop on who Sheldon thinks he is. That feels right.

I'll get into the specifics of how all of this connects in the next post. For now, I'm starting the rebuild.