Architecture¶
AI-Hydro is built on three layers: the VS Code extension (agent interface), the MCP server (tool execution), and the Python backend (domain computation + session persistence).
System Overview¶
flowchart TB
subgraph EXT["VS Code Extension (TypeScript)"]
direction LR
USER["Researcher
(natural language)"] --> LLM["AI Agent
Claude / GPT / Gemini"]
LLM --> CLIENT["MCP Client
JSON-RPC over stdio"]
end
CLIENT -->|stdio| MCP
subgraph MCP["aihydro-mcp (Python / FastMCP)"]
direction TB
TOOLS["Tool Registry
26 built-in tools"]
PLUGINS["Plugin Discovery
entry_points('aihydro.tools')"]
TOOLS --- PLUGINS
end
MCP -->|fetch| APIS["Federal Data APIs
USGS NWIS · GridMET
3DEP · NLCD · NHDPlus"]
MCP -->|compute| ML["ML Backends
conceptual · deep learning"]
MCP -->|read/write| SESSION
subgraph SESSION["Persistent State (~/.aihydro/)"]
direction TB
RP["ResearcherProfile
researcher.json"]
PS["ProjectSession
projects/<name>/project.json"]
HS["HydroSession
sessions/<gauge>.json"]
RP --> PS --> HS
end
HS -->|provenance metadata| PROV["HydroResult
{ data, meta }"]
PROV --> LLM
style EXT fill:#0f0f1e,stroke:#00a3ff,color:#e0f0ff
style MCP fill:#0f0f1e,stroke:#00a3ff,color:#e0f0ff
style SESSION fill:#0f0f1e,stroke:#7dd3fc,color:#e0f0ff
style APIS fill:#1a1a2e,stroke:#94a3b8,color:#94a3b8
style ML fill:#1a1a2e,stroke:#94a3b8,color:#94a3b8
style PROV fill:#1a1a2e,stroke:#00ddff,color:#00ddff The flow in plain English:
- Researcher describes intent in natural language
- The LLM agent decides which tools to call (no user code required)
- Tool calls go via JSON-RPC stdio to the Python MCP server
- The server fetches data from federal APIs, runs computation, and saves results to
~/.aihydro/ - Every result carries
HydroResultprovenance metadata (source, parameters, timestamp) - The agent interprets results and responds — or writes a standalone script if no tool exists
Memory Hierarchy¶
flowchart LR
subgraph MEMORY["Persistent Memory (~/.aihydro/)"]
direction TB
RP["🧬 ResearcherProfile
researcher.json
Who you are:
expertise · preferred models
active project · observations"]
PS["📁 ProjectSession
projects/<name>/project.json
What you're working on:
gauges · journal · literature"]
HS["💧 HydroSession
sessions/<gauge>.json
What was computed:
watershed · streamflow · signatures
geomorphic · model · notes"]
RP -->|context for| PS
PS -->|contains| HS
end
HS -->|appended to| RMD[".clinerules/research.md
auto-injected into
every conversation"]
style MEMORY fill:#0f0f1e,stroke:#00a3ff,color:#e0f0ff
style RMD fill:#1a1a2e,stroke:#00ddff,color:#00ddff Each tier survives VS Code restarts, new conversations, and weeks between sessions. The researcher never re-explains their context — it is always present.
MCP Communication Flow¶
sequenceDiagram
participant R as Researcher
participant LLM as AI Agent (LLM)
participant MCP as aihydro-mcp
participant API as USGS NLDI API
participant DB as HydroSession
R->>LLM: "Delineate the watershed for gauge 01031500"
LLM->>MCP: tools/call delineate_watershed(gauge_id="01031500")
MCP->>API: GET /linked-data/comid/position?coords=...
API-->>MCP: GeoJSON watershed polygon
MCP->>DB: session.set_slot("watershed", data, meta)
DB-->>MCP: saved ✓
MCP-->>LLM: {area_km2: 1247.3, perimeter_km: 198.6, ...}
LLM-->>R: "Watershed delineated — 1,247 km², centroid 44.58°N..." Layer 1 — VS Code Extension¶
Language: TypeScript
Base: Fork of Cline (Apache 2.0)
Responsibilities: - Renders the chat interface and tool call log - Manages AI provider connections and API keys - Acts as an MCP client — sends tool call requests, receives results - Handles file reads/writes and terminal execution for standalone scripts - Auto-registers the ai-hydro MCP server on activation
When no tool exists for a task, the agent writes a standalone Python script and executes it via the integrated terminal — combining the reliability of structured tools with the flexibility of the full Python ecosystem.
Layer 2 — MCP Server¶
Language: Python
Framework: FastMCP
Protocol: Model Context Protocol (JSON-RPC over stdio)
python/ai_hydro/mcp/
├── app.py — FastMCP singleton + agent instructions
├── __init__.py — imports all tool modules (triggers registration)
├── tools_analysis.py — analysis tools
├── tools_session.py — session tools
├── tools_modelling.py — modelling tools
├── tools_project.py — project/literature/persona tools
├── tools_docs.py — version helpers
├── helpers.py — shared validation, caching, session utilities
└── registry.py — entry-point plugin discovery
Tool registration happens at import time via @mcp.tool() decorators. Plugin discovery scans aihydro.tools entry points and registers community tools automatically.
Layer 3 — Python Backend¶
Package: aihydro-tools (PyPI)
Data retrieval¶
| Module | Source |
|---|---|
data/streamflow.py | USGS NWIS |
data/forcing.py | GridMET |
data/landcover.py | NLCD |
data/soil.py | POLARIS |
Analysis¶
| Module | What |
|---|---|
analysis/watershed.py | NHDPlus delineation |
analysis/signatures.py | Flow statistics |
analysis/twi.py | Terrain analysis |
analysis/geomorphic.py | Basin morphometry |
analysis/curve_number.py | CN grid |
Session persistence¶
| Class | Storage |
|---|---|
HydroSession | ~/.aihydro/sessions/<gauge>.json |
ProjectSession | ~/.aihydro/projects/<name>/project.json |
ResearcherProfile | ~/.aihydro/researcher.json |
Dependency Management¶
Heavy dependencies are lazy-loaded — the server starts successfully even if only the [data] extra is installed:
try:
import geopandas as gpd
import pynhd
_GEO_AVAILABLE = True
except ImportError:
_GEO_AVAILABLE = False
def delineate_watershed(gauge_id: str) -> dict:
if not _GEO_AVAILABLE:
return {"error": "Install aihydro-tools[analysis] for watershed tools."}
# ... proceed ...
Tools return informative errors for missing extras rather than crashing the server.