Building a Protocol in Public

[Case Study] Building a Protocol in Public: 100 Builds, 7 Days, and What Actually Works
A follow-up to "OGP: Federation Belongs at the Gateway, Not the Agent"
David Proctor · Apr 06, 2026
Update (April 6, 2026): Versions 0.2.28–0.2.31 have shipped since this was written, adding alias support (BUILD-114), removal notifications when peers disconnect (BUILD-113), agent-specific notification routing (BUILD-115), and a race condition fix for peer storage (BUILD-116). The status table below reflects the current state as of 0.2.31.
From Paper to Production
Two weeks ago, I published a framework for what I thought OGP should be. The architecture was clean on paper: cryptographic identity, bilateral trust, signed intents, controlled boundaries. Stanislav Huseletov connected from Spain. A message crossed the internet. The demo worked.
Then I tried to make it actually work for real users.
Since that article, I've pushed 100+ builds to the OGP repository. Not polish. Not features. Fixes for things that broke the moment the protocol left my local machine. This is an honest accounting of what I got wrong, what held up, and where federation between AI agents actually stands today.
What Looked Right on Paper
The original design had four components I still believe in:
1
Cryptographic identity per gateway
Ed25519 keypairs, public keys published at /.well-known/ogp
2
Bilateral trust establishment
Federation requests with human approval
3
Signed intent messages
Every interaction cryptographically verifiable
4
Controlled information boundaries
The gateway filters what reaches the agent
The theory was that if you got those four right, everything else was implementation detail. I was half right. Those four are necessary. They are not sufficient.
What Broke Immediately
Port-Based Peer IDs
The first OGP builds used hostname:port as the peer identifier. username.gw.clawporate.example.com:3001. Seemed reasonable — each gateway has a unique address.
The problem surfaced within hours of testing with Clawporate. Cloud gateways run behind load balancers. The internal port (3001) isn't the external port (443). Tunnel services like ngrok rotate URLs on every restart. A peer that was something.ngrok-free.app:18790 yesterday is different.ngrok-free.app:18790 today. Same gateway, different identity string. Federation broke constantly.
The fix (BUILD-111): Peer identity is now derived from the first 16 characters of the Ed25519 public key. 302a300506032b65. This is stable across tunnel rotations, port changes, load balancer reconfigurations. The gateway URL becomes just an address. The public key is the identity.
This seems obvious in retrospect. It wasn't obvious until the failures were live.
One important nuance: public-key identity solves the who problem, not the where problem. A peer with key 302a300506032b65 is always 302a300506032b65 — message signing and verification work regardless of URL. But the daemon still needs a current gatewayUrl to route outbound messages. Today, if a peer's ngrok URL rotates, the stored URL goes stale and messages fail to deliver until the peer re-federates with its new address. The fix for this is either persistent public URLs (Cloudflare Named Tunnels, paid ngrok) or a future re-announcement mechanism. Worth knowing if you're testing across machines with free tunnels.
There's also a migration shim still in federation.ts that falls back to ${hostname}:${port} as the peer ID if no public key is present. It exists to handle legacy peers that haven't upgraded. In practice any peer running a current OGP build will have a keypair — but if you're connecting to something old, the shim can create an ID inconsistency. It'll be removed once legacy peer support is formally dropped.
The addPeer() Bug
Here's a bug that almost shipped to production: the federation/request endpoint was creating peer objects, sending notifications, returning HTTP 200... and never actually persisting the peer to disk.
The code path looked like:
Receive request → Validate signature → Create peer object → Send notification → Return success
Step 6 (addPeer()) was missing. Every request appeared successful. No peers were ever saved. I found this during testing with the 0.2.26 release candidate. The federation handshake reported success on both sides. Neither side could actually message the other because neither had stored the peer.
The fix (0.2.26): One line. addPeer(peerData). This is the kind of bug you only catch when two independent systems try to connect, not when you're testing locally with one daemon talking to itself.
Identity Normalization
Even after switching to public-key-based IDs, there was a deeper issue. When receiving a federation request, the daemon was trusting the sender's peer.id field instead of deriving the ID from the public key. This meant if a legacy gateway sent ogp.domain.com:18790 as its ID, the receiving gateway stored it that way, even though the receiving gateway expected 302a300506032b65.
The result: federation appeared to work, but agent-comms failed with "Unknown peer" because the sender and receiver had different IDs for the same peer.
The fix (0.2.27): The daemon now completely ignores the sender's peer.id. It always derives the peer ID from publicKey.substring(0,16). If a peer already exists with that public key, it returns already-pending-or-approved. Identity is normalized at the point of receipt, not at the point of sending.
What Actually Works Now
As of OGP 0.2.31:
Request/Approve flow
Intent negotiation
Message intent
Agent-comms
Project collaboration
Activity logging
Removal notifications
Peer aliases
The federation between my personal gateway and the Clawporate cloud deployment is live and functional. I can send messages. I can create shared projects. The handshake completes without manual peer injection.
What's Still Rough
The Approval UX
Today, approving a federation request requires running a CLI command:
ogp federation approve 302a300506032b65
This should be a notification in your chat channel: "Alice's gateway wants to connect. Approve?" with a button. The infrastructure exists (OGP fires OpenClaw system events), but the UI layer isn't polished.
The Doorman Layer
When a federated peer has the agent-comms scope, messages need to be filtered before they reach your agent. The Doorman (Layer 3 of OGP's scope model) handles this at runtime.
It's fully built and wired in. Every incoming message hits checkAccess() in doorman.ts before processing. Scope violations return 403s before the agent ever sees them. Rate limit breaches return 429s with a Retry-After header. The rate limiting is sliding-window per peer per intent, configurable via --rate <requests>/<seconds> (default: 100 req/hour).
For agent-comms specifically, topic-level policies let you configure per-peer per-topic delivery: off (block), summary (compressed), or full (deliver everything). So a peer spamming the general topic can be throttled or silenced without touching the federation itself.
This is the three-layer scope model in practice: Layer 1 declares what your gateway supports, Layer 2 negotiates what a peer gets, Layer 3 enforces it at runtime on every request.
Persistent Public URLs
Free ngrok tunnels rotate URLs on every restart. This breaks peer records. The realistic options, with their actual tradeoffs:
Cloudflare Named Tunnels — free and persistent, but require a domain you own and have on Cloudflare. Once set up they're the best option, but the setup is not trivial.
Free ngrok — also rotates URLs on restart, so not actually a solution for persistence. Requires account registration before you can use the CLI at all. Paid ngrok gets you a static domain, but that's a recurring cost.
Cloudflare Quick Tunnels (cloudflared tunnel --url) — on-demand, no domain required, no registration. But the URLs rotate on every restart just like free ngrok, and they've been flaky in testing.
None of these are zero-config. The Named Tunnel path is the most reliable, but "free" doesn't mean "easy" here. This is real friction for anyone trying to test federation across machines without a VPS.
What Comes Next
Clear Engineering
Granular scope enforcement — "you can read calendar, but only free/busy, not event titles"
Reply chains — multi-turn agent conversations, not just one-shot intents
DNS-based discovery — _ogp.example.com TXT records so email addresses map to gateways
Approval UX — federation request notifications with one-tap approve in your chat channel
Genuinely Hard
Intent negotiation — agreeing on payload schemas before the first message
Cross-gateway context sharing — how much of my agent's memory should a peer see?
Enterprise policy layers — who in my organization can federate with whom
The Core Thesis Still Holds
Despite the 100 builds and the bugs and the pivots, the original framework was directionally correct:
OGP is not a skill
It doesn't add capabilities to your agent. It creates the conditions where two agents can safely interact across trust boundaries.
The gateway is the right layer
Not the agent (too much exposure). Not the API (requires pre-existing integration). The gateway sits between your agent and the world, controlling what gets in and what goes out.
Federation is bilateral, not centralized
No central registry. No intermediary. Two gateways decide to trust each other, establish that trust through cryptographic signatures, and maintain that trust relationship independently.
What I got wrong was the implementation details. What held up was the architecture.
How to Try It
OGP is open source and installable today:
npm install -g @dp-pcs/ogp@latest
ogp setup
ogp start --background
ogp status # shows your federation URL
Protocol Specification
The protocol specification lives at github.com/dp-pcs/openclaw-federation.
Implementation
The implementation is at github.com/dp-pcs/ogp.
If you install it and it breaks, file an issue. That's how this gets better.
View on GitHub
Protocol Spec
About the Author
David Proctor
David Proctor is VP of AI at Trilogy. He writes about AI infrastructure, agent protocols, and what actually works in production.
Follow on Substack