Architecture¶
AI-Hydro is designed as an open platform for autonomous hydrological and earth science research. Its architecture separates agent interaction, tool orchestration, and domain computation so that AI models can reason over trustworthy scientific workflows instead of improvising brittle scripts from scratch.
System Overview¶
flowchart TB
subgraph EXT["VS Code Extension (TypeScript)"]
direction LR
USER["Researcher
(natural language)"] --> LLM["AI Agent
Claude / GPT / Gemini"]
LLM --> CLIENT["MCP Client
JSON-RPC over stdio"]
end
CLIENT -->|stdio| MCP
subgraph MCP["aihydro-mcp (Python / FastMCP)"]
direction TB
TOOLS["Tool Registry
28 built-in tools"]
PLUGINS["Plugin Discovery
entry_points('aihydro.tools')"]
TOOLS --- PLUGINS
end
MCP -->|fetch| APIS["Federal Data APIs
USGS NWIS · GridMET
3DEP · NLCD · NHDPlus"]
MCP -->|compute| ML["ML Backends
conceptual · deep learning"]
MCP -->|read/write| SESSION
subgraph SESSION["Persistent State (~/.aihydro/)"]
direction TB
RP["ResearcherProfile
researcher.json"]
PS["ProjectSession
projects/<name>/project.json"]
HS["HydroSession
sessions/<gauge>.json"]
RP --> PS --> HS
end
HS -->|provenance metadata| PROV["HydroResult
{ data, meta }"]
PROV --> LLM The flow in plain English:
- Researcher describes intent in natural language
- The LLM agent decides which tools to call (no user code required)
- Tool calls go via JSON-RPC stdio to the Python MCP server
- The server fetches data from federal APIs, runs computation, and saves results to
~/.aihydro/ - Every result carries
HydroResultprovenance metadata (source, parameters, timestamp) - The agent interprets results and responds — or writes a standalone script if no tool exists
Memory Hierarchy¶
flowchart LR
subgraph MEMORY["Persistent Memory (~/.aihydro/)"]
direction TB
RP["🧬 ResearcherProfile
researcher.json
Who you are:
expertise · preferred models
active project · observations"]
PS["📁 ProjectSession
projects/<name>/project.json
What you're working on:
gauges · journal · literature"]
HS["💧 HydroSession
sessions/<gauge>.json
What was computed:
watershed · streamflow · signatures
geomorphic · model · notes"]
RP -->|context for| PS
PS -->|contains| HS
end
HS -->|appended to| RMD[".aihydrorules/research.md
auto-injected into
every conversation"] Each tier survives VS Code restarts, new conversations, and weeks between sessions. The researcher never re-explains their context — it is always present.
MCP Communication Flow¶
sequenceDiagram
participant R as Researcher
participant LLM as AI Agent (LLM)
participant MCP as aihydro-mcp
participant API as USGS NLDI API
participant DB as HydroSession
R->>LLM: "Delineate the watershed for gauge 01031500"
LLM->>MCP: tools/call delineate_watershed(gauge_id="01031500")
MCP->>API: GET /linked-data/comid/position?coords=...
API-->>MCP: GeoJSON watershed polygon
MCP->>DB: session.set_slot("watershed", data, meta)
DB-->>MCP: saved ✓
MCP-->>LLM: {area_km2: 1247.3, perimeter_km: 198.6, ...}
LLM-->>R: "Watershed delineated — 1,247 km², centroid 44.58°N..." Layer 1 — VS Code Extension¶
Language: TypeScript
Lineage: Built on top of the open-source Cline base (Apache 2.0), then specialized for hydrological and earth science research workflows
Responsibilities: - Renders the chat interface and tool call log - Manages AI provider connections and API keys - Acts as an MCP client — sends tool call requests, receives results - Handles file reads/writes and terminal execution for standalone scripts - Auto-registers the ai-hydro MCP server on activation
When no tool exists for a task, the agent writes a standalone Python script and executes it via the integrated terminal — combining the reliability of structured tools with the flexibility of the full Python ecosystem.
Layer 2 — MCP Server¶
Language: Python
Framework: FastMCP
Protocol: Model Context Protocol (JSON-RPC over stdio)
python/ai_hydro/mcp/
├── app.py — FastMCP singleton + agent instructions
├── __init__.py — imports all tool modules (triggers registration)
├── tools_analysis.py — analysis tools
├── tools_session.py — session tools
├── tools_modelling.py — modelling tools
├── tools_project.py — project/literature/persona tools
├── tools_docs.py — version helpers
├── helpers.py — shared validation, caching, session utilities
└── registry.py — entry-point plugin discovery
Tool registration happens at import time via @mcp.tool() decorators. Plugin discovery scans aihydro.tools entry points and registers community tools automatically.
Layer 3 — Python Backend¶
Package: aihydro-tools (PyPI)
Data retrieval¶
| Module | Source |
|---|---|
data/streamflow.py | USGS NWIS |
data/forcing.py | GridMET |
data/landcover.py | NLCD |
data/soil.py | POLARIS |
Analysis¶
| Module | What |
|---|---|
analysis/watershed.py | NHDPlus delineation |
analysis/signatures.py | Flow statistics |
analysis/twi.py | Terrain analysis |
analysis/geomorphic.py | Basin morphometry |
analysis/curve_number.py | CN grid |
Session persistence¶
| Class | Storage |
|---|---|
HydroSession | ~/.aihydro/sessions/<gauge>.json |
ProjectSession | ~/.aihydro/projects/<name>/project.json |
ResearcherProfile | ~/.aihydro/researcher.json |
Dependency Management¶
Heavy dependencies are lazy-loaded — the server starts successfully even if only the [data] extra is installed:
try:
import geopandas as gpd
import pynhd
_GEO_AVAILABLE = True
except ImportError:
_GEO_AVAILABLE = False
def delineate_watershed(gauge_id: str) -> dict:
if not _GEO_AVAILABLE:
return {"error": "Install aihydro-tools[analysis] for watershed tools."}
# ... proceed ...
Tools return informative errors for missing extras rather than crashing the server.