watchdog¶
Full name: tenets.mcp.watchdog
watchdog¶
Self-termination watchdog for the stdio MCP server.
A stdio MCP server already exits when its stdin hits EOF or it receives SIGTERM. But an MCP client that reconnects may spawn a fresh server and abandon the old one with the pipe still held open (so no EOF arrives) and without sending SIGTERM. The abandoned server then blocks forever on a read that never completes — it leaks, accumulating one stray process per reconnect.
This watchdog is the standard safety net for that case: a daemon thread that terminates the process when it is clearly abandoned —
- orphaned — the parent process exited (the server is reparented to PID 1), or
- idle — no MCP request (ping / list / call) arrived for longer than the timeout.
The client transparently respawns the server on next use, so termination is safe. Orphan detection is always on (a dead parent is unambiguous); the idle check is opt-out via idle_timeout <= 0.
Functions:¶
should_terminate¶
should_terminate(
idle_seconds: float, idle_timeout: float, ppid: int, inflight: int = 0
) -> Tuple[bool, str]
Pure decision: should the server self-terminate now?
Orphan detection (ppid == 1) is always active — a dead parent means the client is gone, so an in-flight request's result has nowhere to go anyway. Otherwise, a request currently being processed (inflight > 0) means the server is busy, never idle: a long-running call is never reaped mid-flight, however small the timeout. The idle check fires only when nothing is in flight, idle_timeout > 0, and the server has been idle longer than it.
Source code in tenets/mcp/watchdog.py
def should_terminate(
idle_seconds: float, idle_timeout: float, ppid: int, inflight: int = 0
) -> Tuple[bool, str]:
"""Pure decision: should the server self-terminate now?
Orphan detection (``ppid == 1``) is always active — a dead parent means the
client is gone, so an in-flight request's result has nowhere to go anyway.
Otherwise, a request currently being processed (``inflight > 0``) means the
server is busy, never idle: a long-running call is never reaped mid-flight,
however small the timeout. The idle check fires only when nothing is in flight,
``idle_timeout > 0``, and the server has been idle longer than it.
"""
if ppid == ORPHAN_PPID:
return True, "orphaned (parent process exited)"
if inflight > 0:
return False, ""
if idle_timeout > 0 and idle_seconds > idle_timeout:
return True, f"idle {idle_seconds:.0f}s > {idle_timeout:.0f}s timeout"
return False, ""
start_idle_watchdog¶
start_idle_watchdog(
get_state: Callable[[], Tuple[float, int]],
idle_timeout: float,
*,
poll_interval: float = 20.0,
log: Optional[Callable[[str], None]] = None,
monotonic: Callable[[], float] = time.monotonic,
getppid: Callable[[], int] = os.getppid,
sleep: Callable[[float], None] = time.sleep,
terminate: Optional[
Callable[[str, Optional[Callable[[str], None]]], None]
] = None
) -> threading.Thread
Start a daemon thread that self-terminates the process when abandoned.
get_state returns a consistent snapshot (last_activity_monotonic, inflight) read atomically. Reading both together is the point: if the watchdog read the timestamp and the in-flight count separately, a long request could finish between the two reads and the watchdog would pair a stale (request-start) timestamp with a freshly-decremented count of 0 — reaping the server right as the request completes, before its response is flushed. The clock / ppid / sleep / terminate seams are injectable so the loop is testable without real threads-of-control or signals.
Source code in tenets/mcp/watchdog.py
def start_idle_watchdog(
get_state: Callable[[], Tuple[float, int]],
idle_timeout: float,
*,
poll_interval: float = 20.0,
log: Optional[Callable[[str], None]] = None,
monotonic: Callable[[], float] = time.monotonic,
getppid: Callable[[], int] = os.getppid,
sleep: Callable[[float], None] = time.sleep,
terminate: Optional[Callable[[str, Optional[Callable[[str], None]]], None]] = None,
) -> threading.Thread:
"""Start a daemon thread that self-terminates the process when abandoned.
``get_state`` returns a *consistent* snapshot ``(last_activity_monotonic,
inflight)`` read atomically. Reading both together is the point: if the watchdog
read the timestamp and the in-flight count separately, a long request could
finish between the two reads and the watchdog would pair a stale (request-start)
timestamp with a freshly-decremented count of 0 — reaping the server right as the
request completes, before its response is flushed. The clock / ppid / sleep /
terminate seams are injectable so the loop is testable without real
threads-of-control or signals.
"""
terminate = terminate or _default_terminate
def _loop() -> None:
while True:
sleep(poll_interval)
last_activity, inflight = get_state()
idle = monotonic() - last_activity
stop, reason = should_terminate(idle, idle_timeout, getppid(), inflight)
if stop:
terminate(reason, log)
return
thread = threading.Thread(target=_loop, name="tenets-mcp-watchdog", daemon=True)
thread.start()
return thread