Provenance & Session Schema¶
One of AI-Hydro's central claims is that reproducibility should be a natural byproduct of the analysis process — not an afterthought. This page documents exactly what is recorded, where it is stored, and how it can be used to reconstruct or audit any analysis.
The Core Contract: HydroResult¶
Every tool in AI-Hydro returns a HydroResult — a structured object with two parts:
@dataclass
class HydroResult:
data: dict # the actual result values
meta: HydroMeta # who computed this, with what, from where, when
@dataclass
class HydroMeta:
tool: str # function name
version: str # aihydro-tools version
source: str # data source description
retrieved_at: str # ISO 8601 UTC timestamp
parameters: dict # all input parameters used
This is what makes auditing possible. Every result carries the information needed to reproduce the computation independently.
HydroSession Schema¶
Location: ~/.aihydro/sessions/<gauge_id>.json
Full annotated schema:
{
"gauge_id": "01031500",
"created_at": "2026-04-10T09:00:00Z",
"updated_at": "2026-04-10T14:22:00Z",
"version": "1.2.0",
"watershed": {
"data": {
"area_km2": 1247.3,
"perimeter_km": 198.6,
"centroid_lat": 44.58,
"centroid_lon": -70.54,
"bbox": [-71.2, 44.1, -70.0, 45.1],
"geometry_wkt": "POLYGON ((-71.2 44.1, ...))"
},
"meta": {
"tool": "delineate_watershed",
"version": "1.2.0",
"source": "USGS NLDI / NHDPlus",
"retrieved_at": "2026-04-10T09:14:22Z",
"parameters": { "gauge_id": "01031500" }
}
},
"streamflow": {
"data": {
"dates": ["2000-01-01", "2000-01-02", "..."],
"discharge_cms": [12.4, 11.8, "..."],
"record_count": 9131,
"missing_days": 14,
"units": "m3/s"
},
"meta": {
"tool": "fetch_streamflow_data",
"version": "1.2.0",
"source": "USGS NWIS (waterservices.usgs.gov)",
"retrieved_at": "2026-04-10T09:17:05Z",
"parameters": {
"gauge_id": "01031500",
"start_date": "2000-01-01",
"end_date": "2024-12-31"
}
}
},
"signatures": {
"data": {
"baseflow_index": 0.52,
"runoff_ratio": 0.41,
"mean_annual_discharge_cms": 37.2,
"cv_daily_discharge": 1.14,
"q5_cms": 118.4,
"q25_cms": 52.1,
"q50_cms": 24.3,
"q75_cms": 11.6,
"q95_cms": 3.2,
"fdc_slope": -1.84,
"high_flow_freq_days_yr": 8.2,
"low_flow_freq_days_yr": 62.1,
"recession_constant": 0.967,
"rising_limb_density": 0.31
},
"meta": {
"tool": "extract_hydrological_signatures",
"version": "1.2.0",
"source": "Derived from USGS NWIS streamflow",
"retrieved_at": "2026-04-10T09:18:30Z",
"parameters": { "gauge_id": "01031500" }
}
},
"geomorphic": {
"data": {
"area_km2": 1247.3,
"perimeter_km": 198.6,
"mean_elevation_m": 412.1,
"max_elevation_m": 1024.0,
"min_elevation_m": 134.0,
"relief_m": 890.0,
"mean_slope_deg": 8.3,
"elongation_ratio": 0.71,
"circularity_ratio": 0.40,
"form_factor": 0.33,
"drainage_density_km_km2": 0.82,
"stream_frequency": 1.14
},
"meta": {
"tool": "extract_geomorphic_parameters",
"version": "1.2.0",
"source": "3DEP 10m DEM (USGS) via NHDPlus delineation",
"retrieved_at": "2026-04-10T09:21:10Z",
"parameters": { "gauge_id": "01031500", "dem_resolution_m": 10 }
}
},
"model": {
"data": {
"framework": "hbv",
"nse_train": 0.84,
"kge_train": 0.81,
"rmse_train_cms": 14.2,
"nse_val": 0.79,
"kge_val": 0.76,
"rmse_val_cms": 16.8,
"train_period": ["2000-10-01", "2007-09-30"],
"val_period": ["2000-10-01", "2005-09-30"],
"parameters": {
"TT": 0.21, "CFMAX": 4.12, "FC": 312.4,
"LP": 0.78, "BETA": 2.31, "K0": 0.41,
"K1": 0.18, "K2": 0.04, "UZL": 48.2,
"PERC": 1.84, "MAXBAS": 2.1
}
},
"meta": {
"tool": "train_hydro_model",
"version": "1.2.0",
"source": "GridMET forcing + CAMELS streamflow",
"retrieved_at": "2026-04-10T10:44:00Z",
"parameters": {
"gauge_id": "01031500",
"framework": "hbv",
"train_start": "2000-10-01",
"train_end": "2007-09-30",
"epochs": 500,
"n_restarts": 3
}
}
},
"notes": [
{
"timestamp": "2026-04-10T11:02:00Z",
"text": "High BFI consistent with fractured bedrock geology. Worth investigating with isotope data."
}
]
}
ProjectSession Schema¶
Location: ~/.aihydro/projects/<project_name>/project.json
{
"name": "New England Basins",
"description": "Comparing snowmelt-driven runoff across Maine and New Hampshire catchments.",
"created_at": "2026-04-10T09:00:00Z",
"updated_at": "2026-04-10T14:22:00Z",
"version": "1.2.0",
"gauge_ids": ["01031500", "01013500", "01054200"],
"topics": ["snowmelt", "baseflow", "New England"],
"literature_dir": "~/.aihydro/projects/new_england_basins/literature/",
"literature_indexed": true,
"literature_index_updated": "2026-04-10T12:00:00Z",
"journal": [
{
"timestamp": "2026-04-10T14:22:00Z",
"entry": "HBV performed significantly better on the smaller basins. May be related to the prevalence of lakes in the larger ones."
}
],
"metrics": {}
}
ResearcherProfile Schema¶
Location: ~/.aihydro/researcher.json
{
"name": "Mohammad Galib",
"institution": "Purdue University",
"role": "PhD Researcher",
"domain": "Computational Hydrology",
"expertise": ["watershed modelling", "differentiable hydrology", "CAMELS benchmark"],
"tools_familiarity": {
"HBV-light": "advanced",
"NeuralHydrology": "intermediate"
},
"preferred_models": ["HBV-light", "LSTM"],
"research_focus": "Investigating the role of geology in controlling baseflow generation across CAMELS-US catchments.",
"active_project": "New England Basins",
"communication_style": "concise, technical",
"observations": [
"Prefers NSE and KGE together rather than NSE alone for model evaluation.",
"Tends to work with 20-year streamflow records for signature extraction."
],
"updated_at": "2026-04-10T14:22:00Z",
"version": "1.2.0"
}
Privacy
All three files — HydroSession, ProjectSession, and ResearcherProfile — are stored locally at ~/.aihydro/. Nothing is sent to any cloud service. The AI agent reads them via the MCP server running on your own machine.
What the Provenance Enables¶
| Use case | How |
|---|---|
| Re-run analysis | Parameters and date ranges are stored in meta.parameters for every slot |
| Audit data source | meta.source and meta.retrieved_at identify the exact data API and retrieval time |
| Generate methods paragraph | export_session reads all meta fields and produces a citable text block |
| Compare across basins | search_experiments queries data fields across all sessions in a project |
| Version tracking | meta.version records which aihydro-tools release produced each result |
Side-by-Side: Conversation vs. Provenance¶
{
"watershed": {
"data": { "area_km2": 1247.3, ... },
"meta": {
"tool": "delineate_watershed",
"source": "USGS NLDI / NHDPlus",
"retrieved_at": "2026-04-10T09:14:22Z",
"parameters": { "gauge_id": "01031500" }
}
},
"signatures": {
"data": { "baseflow_index": 0.52, ... },
"meta": {
"tool": "extract_hydrological_signatures",
"source": "Derived from USGS NWIS streamflow",
"retrieved_at": "2026-04-10T09:18:30Z",
"parameters": { "gauge_id": "01031500" }
}
}
}
Watershed boundaries for USGS gauge 01031500 were delineated using
the NHDPlus dataset via the USGS NLDI API (accessed 2026-04-10).
Hydrological signatures were computed from daily discharge records
retrieved from the USGS National Water Information System (2000–2024).
Baseflow was separated using the Eckhardt recursive digital filter,
yielding a baseflow index of 0.52.
The conversation is ephemeral. The provenance record is permanent and machine-readable.