Skip to content

watchdog

Full name: tenets.mcp.watchdog

watchdog

Self-termination watchdog for the stdio MCP server.

A stdio MCP server already exits when its stdin hits EOF or it receives SIGTERM. But an MCP client that reconnects may spawn a fresh server and abandon the old one with the pipe still held open (so no EOF arrives) and without sending SIGTERM. The abandoned server then blocks forever on a read that never completes — it leaks, accumulating one stray process per reconnect.

This watchdog is the standard safety net for that case: a daemon thread that terminates the process when it is clearly abandoned —

  • orphaned — the parent process exited (the server is reparented to PID 1), or
  • idle — no MCP request (ping / list / call) arrived for longer than the timeout.

The client transparently respawns the server on next use, so termination is safe. Orphan detection is always on (a dead parent is unambiguous); the idle check is opt-out via idle_timeout <= 0.

Functions:

should_terminate

Python
should_terminate(
    idle_seconds: float, idle_timeout: float, ppid: int, inflight: int = 0
) -> Tuple[bool, str]

Pure decision: should the server self-terminate now?

Orphan detection (ppid == 1) is always active — a dead parent means the client is gone, so an in-flight request's result has nowhere to go anyway. Otherwise, a request currently being processed (inflight > 0) means the server is busy, never idle: a long-running call is never reaped mid-flight, however small the timeout. The idle check fires only when nothing is in flight, idle_timeout > 0, and the server has been idle longer than it.

Source code in tenets/mcp/watchdog.py
Python
def should_terminate(
    idle_seconds: float, idle_timeout: float, ppid: int, inflight: int = 0
) -> Tuple[bool, str]:
    """Pure decision: should the server self-terminate now?

    Orphan detection (``ppid == 1``) is always active — a dead parent means the
    client is gone, so an in-flight request's result has nowhere to go anyway.
    Otherwise, a request currently being processed (``inflight > 0``) means the
    server is busy, never idle: a long-running call is never reaped mid-flight,
    however small the timeout. The idle check fires only when nothing is in flight,
    ``idle_timeout > 0``, and the server has been idle longer than it.
    """
    if ppid == ORPHAN_PPID:
        return True, "orphaned (parent process exited)"
    if inflight > 0:
        return False, ""
    if idle_timeout > 0 and idle_seconds > idle_timeout:
        return True, f"idle {idle_seconds:.0f}s > {idle_timeout:.0f}s timeout"
    return False, ""

start_idle_watchdog

Python
start_idle_watchdog(
    get_state: Callable[[], Tuple[float, int]],
    idle_timeout: float,
    *,
    poll_interval: float = 20.0,
    log: Optional[Callable[[str], None]] = None,
    monotonic: Callable[[], float] = time.monotonic,
    getppid: Callable[[], int] = os.getppid,
    sleep: Callable[[float], None] = time.sleep,
    terminate: Optional[
        Callable[[str, Optional[Callable[[str], None]]], None]
    ] = None
) -> threading.Thread

Start a daemon thread that self-terminates the process when abandoned.

get_state returns a consistent snapshot (last_activity_monotonic, inflight) read atomically. Reading both together is the point: if the watchdog read the timestamp and the in-flight count separately, a long request could finish between the two reads and the watchdog would pair a stale (request-start) timestamp with a freshly-decremented count of 0 — reaping the server right as the request completes, before its response is flushed. The clock / ppid / sleep / terminate seams are injectable so the loop is testable without real threads-of-control or signals.

Source code in tenets/mcp/watchdog.py
Python
def start_idle_watchdog(
    get_state: Callable[[], Tuple[float, int]],
    idle_timeout: float,
    *,
    poll_interval: float = 20.0,
    log: Optional[Callable[[str], None]] = None,
    monotonic: Callable[[], float] = time.monotonic,
    getppid: Callable[[], int] = os.getppid,
    sleep: Callable[[float], None] = time.sleep,
    terminate: Optional[Callable[[str, Optional[Callable[[str], None]]], None]] = None,
) -> threading.Thread:
    """Start a daemon thread that self-terminates the process when abandoned.

    ``get_state`` returns a *consistent* snapshot ``(last_activity_monotonic,
    inflight)`` read atomically. Reading both together is the point: if the watchdog
    read the timestamp and the in-flight count separately, a long request could
    finish between the two reads and the watchdog would pair a stale (request-start)
    timestamp with a freshly-decremented count of 0 — reaping the server right as the
    request completes, before its response is flushed. The clock / ppid / sleep /
    terminate seams are injectable so the loop is testable without real
    threads-of-control or signals.
    """
    terminate = terminate or _default_terminate

    def _loop() -> None:
        while True:
            sleep(poll_interval)
            last_activity, inflight = get_state()
            idle = monotonic() - last_activity
            stop, reason = should_terminate(idle, idle_timeout, getppid(), inflight)
            if stop:
                terminate(reason, log)
                return

    thread = threading.Thread(target=_loop, name="tenets-mcp-watchdog", daemon=True)
    thread.start()
    return thread