Skip to content

Architecture

AI-Hydro is designed as an open platform for autonomous hydrological and earth science research. Its architecture separates agent interaction, tool orchestration, and domain computation so that AI models can reason over trustworthy scientific workflows instead of improvising brittle scripts from scratch.


System Overview

flowchart TB
    subgraph EXT["VS Code Extension (TypeScript)"]
        direction LR
        USER["Researcher
(natural language)"] --> LLM["AI Agent
Claude / GPT / Gemini"]
        LLM --> CLIENT["MCP Client
JSON-RPC over stdio"]
    end

    CLIENT -->|stdio| MCP

    subgraph MCP["aihydro-mcp (Python / FastMCP)"]
        direction TB
        TOOLS["Tool Registry
28 built-in tools"]
        PLUGINS["Plugin Discovery
entry_points('aihydro.tools')"]
        TOOLS --- PLUGINS
    end

    MCP -->|fetch| APIS["Federal Data APIs
USGS NWIS · GridMET
3DEP · NLCD · NHDPlus"]
    MCP -->|compute| ML["ML Backends
conceptual · deep learning"]
    MCP -->|read/write| SESSION

    subgraph SESSION["Persistent State (~/.aihydro/)"]
        direction TB
        RP["ResearcherProfile
researcher.json"]
        PS["ProjectSession
projects/<name>/project.json"]
        HS["HydroSession
sessions/<gauge>.json"]
        RP --> PS --> HS
    end

    HS -->|provenance metadata| PROV["HydroResult
{ data, meta }"]
    PROV --> LLM

The flow in plain English:

  1. Researcher describes intent in natural language
  2. The LLM agent decides which tools to call (no user code required)
  3. Tool calls go via JSON-RPC stdio to the Python MCP server
  4. The server fetches data from federal APIs, runs computation, and saves results to ~/.aihydro/
  5. Every result carries HydroResult provenance metadata (source, parameters, timestamp)
  6. The agent interprets results and responds — or writes a standalone script if no tool exists

Memory Hierarchy

flowchart LR
    subgraph MEMORY["Persistent Memory (~/.aihydro/)"]
        direction TB
        RP["🧬 ResearcherProfile
researcher.json

Who you are:
expertise · preferred models
active project · observations"]
        PS["📁 ProjectSession
projects/&lt;name&gt;/project.json

What you're working on:
gauges · journal · literature"]
        HS["💧 HydroSession
sessions/&lt;gauge&gt;.json

What was computed:
watershed · streamflow · signatures
geomorphic · model · notes"]
        RP -->|context for| PS
        PS -->|contains| HS
    end

    HS -->|appended to| RMD[".aihydrorules/research.md
auto-injected into
every conversation"]

Each tier survives VS Code restarts, new conversations, and weeks between sessions. The researcher never re-explains their context — it is always present.


MCP Communication Flow

sequenceDiagram
    participant R as Researcher
    participant LLM as AI Agent (LLM)
    participant MCP as aihydro-mcp
    participant API as USGS NLDI API
    participant DB as HydroSession

    R->>LLM: "Delineate the watershed for gauge 01031500"
    LLM->>MCP: tools/call delineate_watershed(gauge_id="01031500")
    MCP->>API: GET /linked-data/comid/position?coords=...
    API-->>MCP: GeoJSON watershed polygon
    MCP->>DB: session.set_slot("watershed", data, meta)
    DB-->>MCP: saved ✓
    MCP-->>LLM: {area_km2: 1247.3, perimeter_km: 198.6, ...}
    LLM-->>R: "Watershed delineated — 1,247 km², centroid 44.58°N..."

Layer 1 — VS Code Extension

Language: TypeScript
Lineage: Built on top of the open-source Cline base (Apache 2.0), then specialized for hydrological and earth science research workflows

Responsibilities: - Renders the chat interface and tool call log - Manages AI provider connections and API keys - Acts as an MCP client — sends tool call requests, receives results - Handles file reads/writes and terminal execution for standalone scripts - Auto-registers the ai-hydro MCP server on activation

When no tool exists for a task, the agent writes a standalone Python script and executes it via the integrated terminal — combining the reliability of structured tools with the flexibility of the full Python ecosystem.


Layer 2 — MCP Server

Language: Python
Framework: FastMCP
Protocol: Model Context Protocol (JSON-RPC over stdio)

python/ai_hydro/mcp/
├── app.py             — FastMCP singleton + agent instructions
├── __init__.py        — imports all tool modules (triggers registration)
├── tools_analysis.py  — analysis tools
├── tools_session.py   — session tools
├── tools_modelling.py — modelling tools
├── tools_project.py   — project/literature/persona tools
├── tools_docs.py      — version helpers
├── helpers.py         — shared validation, caching, session utilities
└── registry.py        — entry-point plugin discovery

Tool registration happens at import time via @mcp.tool() decorators. Plugin discovery scans aihydro.tools entry points and registers community tools automatically.


Layer 3 — Python Backend

Package: aihydro-tools (PyPI)

Data retrieval

Module Source
data/streamflow.py USGS NWIS
data/forcing.py GridMET
data/landcover.py NLCD
data/soil.py POLARIS

Analysis

Module What
analysis/watershed.py NHDPlus delineation
analysis/signatures.py Flow statistics
analysis/twi.py Terrain analysis
analysis/geomorphic.py Basin morphometry
analysis/curve_number.py CN grid

Session persistence

Class Storage
HydroSession ~/.aihydro/sessions/<gauge>.json
ProjectSession ~/.aihydro/projects/<name>/project.json
ResearcherProfile ~/.aihydro/researcher.json

Dependency Management

Heavy dependencies are lazy-loaded — the server starts successfully even if only the [data] extra is installed:

try:
    import geopandas as gpd
    import pynhd
    _GEO_AVAILABLE = True
except ImportError:
    _GEO_AVAILABLE = False

def delineate_watershed(gauge_id: str) -> dict:
    if not _GEO_AVAILABLE:
        return {"error": "Install aihydro-tools[analysis] for watershed tools."}
    # ... proceed ...

Tools return informative errors for missing extras rather than crashing the server.