Skip to content
mcp-pay View on GitHub

← Back to writing

Wallet-side vs server-side metering for agentic spend

Neul Labs · ·
protocolwallets

When an LLM agent calls a paid MCP tool, two parties have a strong interest in knowing how much it cost: the wallet paying the bill, and the server collecting it. They keep their own books. The interesting question — the one mcp-pay had to take a position on in v0.1 — is whether the protocol should privilege one of them as the source of truth, or treat both as legitimate.

The short answer is both. The long answer is more interesting, because the design of the manifest, the 402 flow, and the per-rail accounting all push toward a specific split: the wallet counts spend, the server counts usage, and the rail counts settlement. Each party meters what it can verify, and nobody is forced to trust anyone else.

What “metering” even means in this context

Three different numbers get called “metering” in conversations about paid agentic tools, and they are not interchangeable.

  1. Spend metering. How much money has my agent spent today, this hour, on this plan? Owned by the wallet.
  2. Usage metering. How many times was this tool called, with what arguments, by which caller? Owned by the server.
  3. Settlement metering. Which payments cleared, which got refunded, what’s the on-chain or off-chain state? Owned by the rail.

If you collapse these, you get conflicts of interest. A server that meters spend has an incentive to overcount. A wallet that meters usage has no incentive to count failed calls. A rail that meters either of the above is going to be wrong because it sees one of the two leaves, not the trunk. The metering question is really three metering questions, and v0.1 keeps them separate by giving each party its own surface.

What the manifest gives the wallet

The wallet’s job, when it sees a pay.json, is to decide whether the agent it represents can afford a planned call sequence. To do that it needs prices it can trust.

The manifest provides them. Every tool, resource, or prompt with a per_call rule has an amount and a currency. The wallet does not have to call the server to discover the price. It does not have to trust a 402 mid-flight to discover the price either, because v0.1 requires the 402 to match the manifest. If they don’t match, the wallet refuses.

This pushes the spend metering responsibility cleanly to the wallet. Before the agent dispatches a call, the wallet has read the manifest, knows the price, and either:

  • Reserves the amount against the agent’s plan budget, or
  • Vetoes the call because the budget is exhausted, or
  • Routes the call to a different server with an equivalent tool at a lower price.

The wallet meters spend per call, per plan, per day, per agent identity. None of this is in the mcp-pay schema, because it is wallet implementation, not protocol. The protocol just makes spend metering possible by publishing prices the wallet can read ahead of time.

A useful corollary: if your wallet does not pre-fetch manifests, it cannot meter spend without trusting the server’s 402 quotes. v0.1 puts the manifest at /.well-known/mcp/pay.json specifically so that a one-time fetch per server is the only requirement. Wallets that do this once and cache it correctly get accurate spend metering for free.

What the schema gives the server

The server’s job is usage metering. Which tool was called. When. By which caller, if it can be identified. How long the response took. Whether the result was successful. None of this is in the manifest because none of it is the wallet’s business — but the server needs it for capacity planning, for fraud detection, and for the optional stats field that v0.1 leaves available for servers that want to publish aggregate usage.

The schema explicitly separates pricing from stats. Pricing is static and cacheable; it changes when the operator decides to change it. Stats are dynamic; they reflect actual call volume. The reasoning is that anything cacheable should be cached aggressively, and anything dynamic should be queried only when needed. Wallets fetch pricing once. Aggregators fetch stats often. Nobody is forced to choose between freshness and bandwidth.

The server, then, runs its own meter on usage. It can publish a digest of that meter through the optional stats field if it wants to. Most production servers will not, because usage volumes are competitive information. That’s fine. The schema doesn’t require it.

What the rail tells you

Settlement is the part neither side can fake. Once a payment is signed and broadcast, the rail says yes or no. For x402, that means a facilitator returns { valid: true }. For Lightning, the payment either settles or it doesn’t. For card rails, an issuer either approves the charge or it doesn’t.

This is the source of truth for “what did the agent actually pay for.” It is the most expensive of the three numbers to consult, because consulting it usually means a network call to a third party. But it is also the most authoritative.

mcp-pay v0.1 keeps the rail outside the manifest on purpose. The manifest tells you which rails the server takes — accepts: [...] — but it does not try to summarize settlement status. Doing so would make the manifest dynamic, which would break caching, which would make the wallet’s life harder. The right shape is: read the manifest once, settle through the rail as needed, ask the rail for confirmation.

The reference server demonstrates this. The X-PAYMENT-RESPONSE header carries enough information for the wallet to confirm the rail confirmed. If the wallet wants stronger evidence, it asks the rail directly. If it doesn’t, it trusts the server. The choice is the wallet’s, not the protocol’s.

Why this split scales

Centralizing metering would have been simpler to write down. “The server keeps the books, the wallet trusts them, the rail provides settlement receipts.” Every spec writer’s first draft includes some version of that. We had it in ours.

The reason we removed it is concentration. Centralized metering means an agent’s spend is whatever the server says it is. That works fine when one party owns both sides, like a SaaS company and its own AI features. It does not work when an agent is calling fifty independent MCP servers in a day, each with its own incentive to mismeasure.

Distributed metering — wallet for spend, server for usage, rail for settlement — has a different shape. No party can lie unilaterally and have the lie stick. If the server claims an agent made 500 calls but the wallet recorded 450, that is a useful discrepancy: the wallet investigates, and if the rail confirms 450 settlements, the server is wrong. If the rail confirms 500, the wallet’s accounting is off. Either way, the truth is recoverable.

You only get that property if each meter is independent. mcp-pay’s job is to keep them independent.

Implications for wallet builders

If you are building a wallet for an agentic system, three concrete things follow from this split.

Fetch manifests aggressively. A wallet that does not pre-fetch /.well-known/mcp/pay.json for every MCP server in scope is missing the spend-metering primitive the protocol is offering. The fetch is cheap. The manifest is cacheable. The wallet should treat it as a hard requirement.

Reconcile against the rail, not the server. Settlement confirmation should come from the rail’s authoritative endpoint — the facilitator for x402, the LN node for Lightning, the card processor for cards. The server is an interested party. The rail is not. If the two disagree, the rail wins.

Track your own usage independently. The server’s usage metering is for the server’s purposes; it is not your spend log. The wallet should record every dispatched call locally, with the manifest price at the time of dispatch, and reconcile against the rail later. That gives the wallet a tamper-evident spend history that does not depend on any server’s honesty.

Implications for server builders

The mirror image applies. Servers should keep usage metering for their own purposes — capacity planning, fraud signals — and not assume agents trust it for billing. The billing source of truth is the rail. Servers that try to invoice from their own logs will find wallets insisting on rail reconciliation, and the wallets will be correct to insist.

The cleanest server architecture in our reference implementation has three layers: the request handler that returns 402 or 200; the usage logger that records every call regardless of payment state; and the settlement adapter that talks to the facilitator. None of the three needs to know what the other is doing for the system to be correct. That is the whole reason for the split.

What this looks like for v0.2

A short wishlist for the next draft: a stats shape wallets can crawl without leaking per-call detail; a way for wallets to publish a budget policy servers can consult before pricing escalates. None of those change the split. Wallets meter spend. Servers meter usage. Rails meter settlement. v0.1 sets up the surfaces; v0.2 sharpens them.