{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Daily statistics from hourly ERA5 data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# If first time running, uncomment the line below to install any additional dependencies\n",
    "# !bash requirements-for-notebooks.sh"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "from earthkit.data.testing import earthkit_remote_test_data_file\n",
    "\n",
    "from earthkit import data as ekd\n",
    "from earthkit.transforms import aggregate as ekt\n",
    "\n",
    "ekd.settings.set(\"cache-policy\", \"user\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load some test data\n",
    "\n",
    "All `earthkit-transforms` methods can be called with `earthkit-data` objects (Readers and Wrappers) or with the \n",
    "pre-loaded `xarray`.\n",
    "\n",
    "In this example we will use hourly ERA5 2m temperature data on a 0.5x0.5 spatial grid for the year 2015 as\n",
    "our physical data.\n",
    "\n",
    "First we download (if not already cached) lazily load the ERA5 data (please see tutorials in `earthkit-data` for more details in cache management).\n",
    "\n",
    "We convert the data to an `xarray.Dataset` with some additional options which are better suited for the data we are working with."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get some demonstration ERA5 data, this could be any url or path to an ERA5 grib or netCDF file.\n",
    "remote_era5_file = earthkit_remote_test_data_file(\"era5_temperature_europe_2015.grib\")\n",
    "era5_xr = ekd.from_source(\"url\", remote_era5_file)\n",
    "era5_xr = era5_xr.to_xarray(time_dim_mode=\"valid_time\").rename({\"2t\": \"t2m\"})\n",
    "era5_xr"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate the daily mean and standard deviation of the ERA5 data\n",
    "\n",
    "We can calculate the daily mean using daily_mean method in the temporal module. There are similar daily aggregation methods for the `daily_median`, `daily_min`, `daily_max`, `daily_std`, `daily_sum`, and all these again for monthly aggregations in the form `monthly_XXX`.\n",
    "\n",
    "**earthkit-aggregate** is able to understand any data object understood by **earthkit-data** as input. The **earthkit-aggregate** computation is based on xarray datacubes, therefore the returned object is an `xarray.Dataset`. To convert this to an EarthKit object you could use the **earthkit-data** method, `from_object`.\n",
    "\n",
    "If the input data is provided an `xarray.Dataset` then the return object is `xarray.Dataset` and if the input is an `xarray.DataArray` then the return object is an `xarray.DataArray`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "era5_daily_mean = ekt.temporal.daily_mean(era5_xr)\n",
    "era5_daily_std = ekt.temporal.daily_std(era5_xr)\n",
    "era5_daily_mean\n",
    "# ekd.from_object(era5_daily_mean)\n",
    "\n",
    "# Note that the daily_mean and daily_std are convenience wrappers. It is also possible to achieve the same\n",
    "# result using the following syntax:\n",
    "# era5_daily_mean = ekt.temporal.daily_reduce(era5_xr, how=\"mean\")\n",
    "\n",
    "# It is also possible to achieve a similar result using the following syntax:\n",
    "# era5_daily_mean = ekt.temporal.daily_reduce(era5_xr, how=\"mean\")\n",
    "# However, the daily_reduce methods have additional options for handling metadata data,\n",
    "# therefore are recommended"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate the monthly mean and standard deviation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "era5_monthly_mean = ekt.temporal.monthly_mean(era5_xr)\n",
    "era5_monthly_std = ekt.temporal.monthly_std(era5_xr)\n",
    "era5_monthly_std"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Calculate a rolling mean with a 50 timestep window\n",
    "\n",
    "To calculate a rolling mean along the time dimension you can use the rolling_reduce function.\n",
    "\n",
    "NOTE: An improved API to the rolling_reduce method is an ongoing task"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "era5_rolling = ekt.temporal.rolling_reduce(\n",
    "    era5_xr,\n",
    "    50,\n",
    "    how_reduce=\"mean\",\n",
    "    center=True,\n",
    ")\n",
    "era5_rolling"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Account for non-UTC timezones\n",
    "\n",
    "There is a time_shift argument which can be used to account for non-UTC time zones:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "era5_daily_mean_p12 = ekt.temporal.daily_mean(era5_xr, time_shift={\"hours\": 12})\n",
    "era5_daily_mean_p12"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Plot a random point location to see the different aggregation methods"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "isel_kwargs = {\"latitude\": 100, \"longitude\": 100}\n",
    "\n",
    "fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(10, 5))\n",
    "\n",
    "era5_xr.t2m.isel(**isel_kwargs).plot(label=\"Raw data\", ax=ax, color=\"blue\")\n",
    "era5_daily_mean.t2m.isel(**isel_kwargs).plot(label=\"Daily mean\", ax=ax, color=\"orange\")\n",
    "\n",
    "era5_daily_mean_p12.t2m.isel(**isel_kwargs).plot(label=\"Daily mean (12h shift)\", ax=ax, color=\"purple\")\n",
    "\n",
    "# # To add the daily spread as orange dotted lines:\n",
    "# upper_d = era5_daily_mean.t2m + era5_daily_std.t2m\n",
    "# lower_d = era5_daily_mean.t2m - era5_daily_std.t2m\n",
    "# upper_d.isel(\n",
    "#     **isel_kwargs).plot(ax=ax, label='Daily standard deviation spread', linestyle='--', color='orange'\n",
    "#\n",
    "# lower_d.isel(**isel_kwargs).plot(ax=ax, linestyle='--', color='orange')\n",
    "\n",
    "# Add the rolling mean as green line:\n",
    "era5_rolling.t2m.isel(**isel_kwargs).plot(label=\"Rolling mean\", ax=ax, color=\"green\")\n",
    "\n",
    "# Add the monthly mean as a black solid line and spread as black dotted lines:\n",
    "era5_monthly_mean.t2m.isel(**isel_kwargs).plot(label=\"Monthly mean\", ax=ax, color=\"black\")\n",
    "upper_m = era5_monthly_mean.t2m + era5_monthly_std.t2m\n",
    "lower_m = era5_monthly_mean.t2m - era5_monthly_std.t2m\n",
    "upper_m.isel(**isel_kwargs).plot(ax=ax, label=\"Monthly standard deviation spread\", linestyle=\"--\", color=\"black\")\n",
    "lower_m.isel(**isel_kwargs).plot(ax=ax, linestyle=\"--\", color=\"black\")\n",
    "\n",
    "\n",
    "# figure = fig[0].get_figure()\n",
    "ax.legend(loc=2)\n",
    "ax.set_title(\"Aggregation methods\")\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".conda",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}