.. _technical_description_hydroland:

Technical Description
=====================

HydroLand is the Climate DT hydrology application maintained as the standalone
`hydroland <https://github.com/DestinE-Climate-DT/hydroland>`_ package. It
wraps the unmodified `mHM` and `mRM` executables with Python-based orchestration
for global operational runs. The standalone package exposes the workflow steps
through the subcommands `initialisation`, `preprocess`, `mhm`, `mrm`, and
`completion`, while Climate DT runs the same sequence through the workflow
driver `run_hydroland.py`.

Workflow in Climate DT
----------------------

The operational workflow is organized into five main steps that match both the
package structure and the Climate DT wrapper script:

- `initialisation`: creates the HydroLand directory tree, prepares namelists,
  and links the appropriate cold-start or warm-start restart files;
- `preprocess`: reads Climate DT forcing from GSV/OPA, reformats precipitation
  and temperature to `mHM` input conventions, and computes an internal PET
  forcing;
- `mhm`: runs the land-surface hydrology model and writes the `mHM` flux/state
  output together with the next restart file;
- `mrm`: routes the gridded runoff across the river network and assembles a
  global discharge product from multiple routing subdomains;
- `completion`: removes temporary files, keeps the restart and log material
  required for robust restarts, and can preserve forcing files for downstream
  indicator calculations.

This separation is visible directly in the repository CLI and mirrors the way
HydroLand is launched inside the Climate DT workflow.

Forcing and preprocessing
-------------------------

The current operational setup relies on precipitation and near-surface air
temperature supplied through the Climate DT interfaces. HydroLand currently
expects one OPA file per variable and period. Typical examples are:

- daily aggregated OPA files such as
  `1990_01_01_2t_timestep_60_daily_mean.nc` and
  `1990_01_01_avg_tprate_timestep_60_daily_mean.nc`;
- hourly raw OPA files such as
  `1990_01_01_T00_00_to_1990_01_01_T23_00_2t_raw_data.nc` and
  `1990_01_01_T00_00_to_1990_01_01_T23_00_avg_tprate_raw_data.nc`.

In `run-frequency = day`, HydroLand looks up one temperature file and one
precipitation file for the selected day. In `run-frequency = month`, the code
still expects the same per-day OPA files for every day in the requested month
and creates one forcing triplet per day for `mHM`.

The preprocessing step converts the input into `mHM` forcing files such as:

- `mHM_<date>_to_<date>_pre.nc`
- `mHM_<date>_to_<date>_tavg.nc`
- `mHM_<date>_to_<date>_pet.nc`

For example, the hourly OPA pair
`1990_01_01_T00_00_to_1990_01_01_T23_00_2t_raw_data.nc` and
`1990_01_01_T00_00_to_1990_01_01_T23_00_avg_tprate_raw_data.nc`
becomes
`mHM_1990_01_01_to_1990_01_01_tavg.nc`,
`mHM_1990_01_01_to_1990_01_01_pre.nc`, and
`mHM_1990_01_01_to_1990_01_01_pet.nc`.

During this conversion, HydroLand standardizes variable names to `tavg` and
`pre`, converts temperature from Kelvin to degree Celsius, converts
precipitation to millimeters, and normalizes coordinate ordering. PET is
currently estimated internally from processed temperature and latitude using a
temperature-based formulation.

For monthly runs, the Climate DT workflow can also produce monthly HydroLand
deliverables instead of only day-by-day outputs. Typical monthly products are:

- `1990_01_01_to_1990_01_31_mHM_Fluxes_States.nc`;
- `1990_01_01_to_1990_01_31_mRM_Fluxes_States.nc`;
- `1990_01_01_T00_00_to_1990_01_31_T23_00_mHM_Fluxes_States.nc`;
- `1990_01_01_T00_00_to_1990_01_31_T23_00_mRM_Fluxes_States.nc`.

Hydrological core
-----------------

`mHM` resolves land-surface hydrology, including soil water storage,
evapotranspiration, and runoff generation. `mRM` takes the resulting runoff and
routes it through the river network to produce global routed discharge.

The implementation supports both `0.1` and `0.05` degree configurations. In the
current workflow, this subdivision applies specifically to `mRM`, not to
`mHM`. River routing is the most time-consuming part of the application because
the runoff transport cannot simply be evaluated independently for every grid
cell. To reduce runtime, HydroLand therefore splits the routing problem into
large basin-based subdomains, runs them separately, and merges the resulting
subdomain outputs back into one global river-discharge file at the end. In the
current routing scripts, the `0.1` degree setup uses `53` routing subdomains,
while the `0.05` degree setup uses `26`. The number of saved states and fluxes
can be expanded further through `mhm_setup.py`, which is how additional output
variables are enabled for specialized experiments.

The published HydroLand flux outputs use date-stamped names. Typical
single-period `mHM` examples are:

- `1990_01_01_to_1990_01_01_mHM_Fluxes_States.nc` for a one-day daily run;
- `1990_01_01_T00_00_to_1990_01_01_T23_00_mHM_Fluxes_States.nc` for a one-day
  hourly run.

`mRM` prepares the `mHM` runoff output for routing, runs one subdomain job per
river-mask partition, and merges the resulting subdomain files into one global
river-discharge product. The published `mRM` outputs follow the same naming
logic, for example:

- `1990_01_01_to_1990_01_01_mRM_Fluxes_States.nc`;
- `1990_01_01_T00_00_to_1990_01_01_T23_00_mRM_Fluxes_States.nc`.

For consistency and traceability, both the delivered `mHM` and `mRM` flux files
preserve metadata inherited from the driving climate variables, which in the
current setup are temperature and precipitation. This keeps attributes such as
`activity`, `experiment`, `generation`, `model`, `realization`, `stream`, and
`resolution` available in the HydroLand products, together with the global
attribute `application = "HydroLand"`.

Illustrative Climate DT invocation
----------------------------------

If the required OPA input files and HydroLand initial files are already
available, the Climate DT workflow driver can be called with arguments like the
following monthly hourly example:

.. code-block:: bash

    python /path/to/workflow/runscripts/hydroland/run_hydroland.py \
      --hydroland-opa /path/to/hydroland_opa \
      --init-files /path/to/init_files \
      --app-outpath /path/to/experiment/output \
      --ini-year-chunk 1990 --ini-month-chunk 01 --ini-day-chunk 01 \
      --end-year-chunk 1990 --end-month-chunk 01 --end-day-chunk 31 \
      --ini-year-split 1990 --ini-month-split 01 --ini-day-split 01 \
      --stat-freq hourly \
      --pre avg_tprate \
      --temp 2t \
      --grid 0.1/0.1 \
      --run-frequency month \
      --apply-cdo-mergetime

With this setup, HydroLand expects the daily OPA hourly raw files for January
1990 to be present in `--hydroland-opa` and produces the delivered outputs
`1990_01_01_T00_00_to_1990_01_31_T23_00_mHM_Fluxes_States.nc` and
`1990_01_01_T00_00_to_1990_01_31_T23_00_mRM_Fluxes_States.nc`.

For a single daily run, the same wrapper is typically called with
`--run-frequency day --stat-freq daily`; the corresponding delivered outputs
then follow the shorter naming pattern
`1990_01_01_to_1990_01_01_mHM_Fluxes_States.nc` and
`1990_01_01_to_1990_01_01_mRM_Fluxes_States.nc`.

Operational robustness and restarts
-----------------------------------

HydroLand is designed to resume long simulations cleanly:

- if the expected previous `mHM` restart file is missing, the run starts as a
  cold start; otherwise it resumes as a warm start;
- cold starts can pull restart files either from static initialisation material
  or from a previous HydroLand execution using `--restart-from-prior-run` and
  `--prior-app-outpath`;
- when bias adjustment is active, the code backs up and restores the monthly
  `ba` pickle files so that restart behavior stays consistent across monthly
  boundaries;
- if `--delete-files` is enabled, the `completion` step retains restart and log
  files for the first, second-to-last, and last day of each completed month;
- if `--apply-indicators` is enabled, forcing files are preserved so that the
  downstream HydroLand indicator routines can reuse them;
- log files are written for `mHM` and for every `mRM` subdomain, which makes
  failed steps easy to trace.

In Climate DT, this is especially useful because historical and projection
simulations are often executed as separate experiments. For example, a
historical HydroLand run for `1990-2014` may already exist under a previous
experiment output tree, while a projection experiment starts in `2015` and
should continue from that simulated hydrological state rather than from a fresh
cold start. In that situation, `--restart-from-prior-run` tells HydroLand to
take the restart files from a previous HydroLand execution, and
`--prior-app-outpath` points to that earlier experiment's
`output/hydroland/` directory. This keeps the projection run physically
continuous with the preceding historical run even though the two are stored as
separate Climate DT experiments.

The resulting output tree follows the structure below:

.. code-block:: text

    hydroland/
    ├── forcings/
    ├── mhm/
    │   ├── current_run/
    │   │   ├── input/
    │   │   │   ├── meteo/
    │   │   │   └── restart/
    │   │   └── output/
    │   ├── fluxes/
    │   ├── log_files/
    │   └── restart_files/
    └── mrm/
        ├── current_run/
        ├── fluxes/
        ├── log_files/
        │   └── subdomain_<n>/
        └── restart_files/
            └── subdomain_<n>/

Indicators
----------

HydroLand can derive indicator layers from both routed discharge (`mRM`) and
land-surface variables (`mHM`). Indicators are computed for the period that is
provided to the workflow, so historical and projection experiments can be
processed separately and compared afterwards.

Discharge-based indicators
~~~~~~~~~~~~~~~~~~~~~~~~~~

These products are based on routed discharge from `mRM` (`Qrouted`). High
percentiles such as :math:`p90`, :math:`p95`, and
:math:`p99` are used for flood-type conditions, while low percentiles such as
:math:`p10`, :math:`p5`, and :math:`p1` are used for drought-type conditions.
HydroLand accepts the percentile list as user input. In the current test
publication, the exposed summaries use the `90th`, `95th`, `10th`, and `5th`
percentiles.

For each selected threshold, HydroLand reports three summary indicators per
grid cell:

- `count`: number of distinct flood or drought events in the analysed period;
- `duration`: number of event time steps, interpreted as days or hours
  depending on the temporal resolution of the routed discharge input;
- `intensity`: accumulated exceedance or deficit relative to the selected
  threshold.

In simplified form, the intensity is:

.. math::

   I = \sum_t |Q(t) - Q_p|

evaluated only over the event time steps. The resulting outputs are written
with names such as
`1990_2014_90th_percentile_discharge_indicators.nc` or
`2020_2039_5th_percentile_discharge_indicators.nc`.

Dryness indicators
~~~~~~~~~~~~~~~~~~

HydroLand also supports land-surface dryness indicators derived from `mHM`
fluxes and soil-water states.

The `aridity_index` routine combines actual evapotranspiration and
precipitation. It first builds monthly sums, aggregates them to annual sums,
and then computes the ratio of mean annual actual evapotranspiration to mean
annual precipitation:

.. math::

   AI = \frac{\overline{aET_{\mathrm{annual}}}}{\overline{P_{\mathrm{annual}}}}

The output is written as a single period file such as
`1990_2014_aridity_index.nc`.

The `soil_moisture_deficit` routine focuses on persistent deficits relative to
the local long-term soil moisture tendency. By default, it uses the upper soil
layer `SWC_L01` and retains only the deficit below the fitted trend:

.. math::

   \epsilon(t) = \max \left( \widehat{SM}(t) - SM(t), 0 \right)

The reported field is the time-mean deficit for the supplied period. In the
current local test publication, the exposed indicator items are centered on
aridity and discharge summaries; the soil moisture deficit workflow is
implemented in HydroLand but is not yet part of the six published example
items.