Architecture
System overview
MetaMCP sits between the LLM and N child MCP servers, exposing 4 meta-tools instead of the combined tool set of all child servers. For a visual overview, see What is MetaMCP?.
The LLM communicates with MetaMCP over stdio. MetaMCP communicates with child servers over stdio (local processes), Streamable HTTP, or SSE (remote servers). Each local child server runs as a separate process spawned and managed by MetaMCP. Remote servers are accessed over the network with optional OAuth authorization.
All interactions flow through the 4 meta-tools: mcp_discover for search, mcp_provision for setup, mcp_call for invocation, and mcp_execute for multi-step code workflows.
Connection state machine
Each child server connection follows a 5-state lifecycle. The state machine governs when connections can be used, retried, or discarded.
| State | Value | Description | Valid transitions |
|---|---|---|---|
| IDLE | 'idle' |
Ready for use. | CONNECTING, ACTIVE, CLOSED |
| CONNECTING | 'connecting' |
Handshake in progress. | ACTIVE, FAILED, CLOSED |
| ACTIVE | 'active' |
Processing a request. | IDLE, FAILED, CLOSED |
| FAILED | 'failed' |
Circuit breaker tripped. | CONNECTING, CLOSED |
| CLOSED | 'closed' |
Deallocated. Terminal state. | None (no transitions out) |
A connection starts in IDLE or CONNECTING (for lazy-spawned servers). On a successful handshake, it transitions to ACTIVE when processing a request, then back to IDLE when done.
If a request fails, the connection transitions to FAILED. After the circuit breaker cooldown expires, it can attempt to reconnect (FAILED to CONNECTING). If the server is shut down or evicted, the connection moves to CLOSED, which is terminal.
The CLOSED state is irreversible. Once a connection is closed, MetaMCP will spawn a new process if that server is needed again.
Pool architecture
MetaMCP manages child server connections through a bounded connection pool. The pool prevents unbounded resource usage and provides backpressure when many servers are in use.
Pool configuration defaults
| Parameter | Default | Description |
|---|---|---|
poolSize |
20 |
Maximum concurrent child server connections. |
resPoolSize |
0 |
Reserve connections above poolSize. |
minPoolSize |
0 |
Minimum connections to keep alive. |
resPoolTimeout |
5000 |
Milliseconds before reserve pool activates. |
idleTimeoutMs |
300000 |
Idle connection timeout (5 minutes). |
failureThreshold |
5 |
Failures before circuit breaker trips. |
cooldownMs |
30000 |
Circuit breaker cooldown (30 seconds). |
Bounded pool
The poolSize parameter sets the hard limit on concurrent connections. When the pool is full and a new connection is needed, MetaMCP evicts the least recently used idle connection.
Reserve pool
The resPoolSize parameter adds capacity above the main pool. Reserve connections activate only after resPoolTimeout milliseconds of waiting, providing a buffer for burst traffic without permanently increasing the pool size.
With the default resPoolSize of 0, the reserve pool is disabled.
Minimum connections
The minPoolSize parameter keeps a minimum number of connections alive. These connections are not subject to idle timeout eviction. This is useful for critical servers that must respond quickly.
With the default minPoolSize of 0, all idle connections are eligible for eviction.
LIFO idle list
Idle connections are stored in a LIFO (last-in, first-out) stack. When a connection is returned to the pool, it goes to the top. When a connection is needed, the most recently used one is taken from the top.
LIFO ordering keeps recently active connections warm, reducing the chance of using a stale connection that may have timed out at the OS level.
Idle timeout sweep
A periodic sweep checks for connections that have been idle longer than idleTimeoutMs. These connections are closed and their server processes terminated. The sweep respects minPoolSize, keeping at least that many connections alive.
Catalog building
MetaMCP builds a tool catalog on first use, not at startup.
When mcp_discover is called for the first time (or when a server is first accessed), MetaMCP connects to the server, performs the MCP handshake, and calls tools/list to retrieve the server's tool definitions. The response is cached in memory.
Subsequent mcp_discover calls search the cached catalog without reconnecting to child servers. The catalog is indexed for hybrid search: both semantic similarity and keyword matching are used to rank results.
If a server's connection is evicted and later re-established, MetaMCP refreshes the catalog for that server.
Schema caching
To accelerate cold starts, MetaMCP persists tool schemas to disk at ~/.metamcp/cache/<server-name>/schema.json. When a server is spawned, the cached schema is loaded immediately — making tool discovery available before the MCP handshake completes.
After the live connection establishes and the fresh catalog arrives, MetaMCP compares it against the cached version. If the tool set has changed (different tool names or descriptions), the cache is updated transparently. If the cache matches, no write occurs.
Schema caching is automatic and requires no configuration. Cached schemas are never treated as authoritative — they are a performance optimization, not a source of truth.
For details on how discovery and search work, see Discovery.
Error classification
Not all errors are equal. A 401 Unauthorized from a misconfigured API key is fundamentally different from a transient network timeout. MetaMCP classifies connection errors into five categories:
| Category | Examples | Circuit breaker impact |
|---|---|---|
auth |
401 Unauthorized, 403 Forbidden, OAuth token expired | Never trips. Auth errors are logged but do not increment the failure counter. |
offline |
ECONNREFUSED, ETIMEDOUT, ENOTFOUND, DNS failures, spawn ENOENT | Trips after threshold. These are transient and may resolve on retry. |
http |
500 Internal Server Error, 502 Bad Gateway, 503 Service Unavailable | Trips after threshold. Server-side errors that may recover. |
stdio-exit |
Child process exited with code 1, killed by SIGSEGV | Trips after threshold. Process crash that warrants backoff. |
other |
Unrecognized errors | Trips after threshold. Conservative default. |
This classification prevents permanent errors (like wrong credentials) from tripping circuit breakers. Without classification, a single misconfigured server could cycle through breaker trips and cooldowns indefinitely, never recovering because the root cause is not transient.
The error classifier operates transparently inside callTool(). No configuration is needed.
Server lifecycle
Servers can declare a lifecycle that controls how MetaMCP manages their idle connections:
keep-alive— The server is exempt from the global idle timeout. It stays connected until MetaMCP shuts down. Optionally, a per-serveridleTimeoutMsoverrides the global timeout with a custom value.ephemeral— The server is torn down as soon as it becomes idle. It is respawned on the next request.- No declaration — The server follows the global idle timeout (default: 5 minutes).
Lifecycle declarations are set per-server in .mcp.json. See Configuration for syntax.
The idle sweep in the connection pool respects these declarations. Keep-alive servers are skipped during the global sweep. Ephemeral servers are evicted immediately. This allows operators to fine-tune resource usage per server without changing global pool settings.
Graceful shutdown
MetaMCP handles SIGINT and SIGTERM signals for clean shutdown.
The shutdown sequence:
- Guard activation. A
shuttingDownflag prevents re-entrant shutdown if multiple signals arrive. - Drain active connections. MetaMCP waits for in-progress requests to complete.
- Escalating termination. Child server processes receive progressively more aggressive signals on a timer sequence: 50ms, 100ms, 200ms, 400ms, 800ms delays between escalation steps.
- Force kill. If a child server has not exited after the escalation sequence (approximately 1,550ms total), MetaMCP sends
SIGKILL.
This progression ensures well-behaved servers have time to clean up, while misbehaving servers are forcefully terminated within a predictable time window.
If you observe child server processes lingering after MetaMCP exits, it may indicate the child server does not handle SIGTERM properly. Check the child server's documentation for shutdown behavior.
Next steps
- Connection Pool for advanced pool tuning and monitoring.
- Circuit Breaker for failure isolation patterns.
- The Four Tools for how the meta-tools interact with the architecture.