{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Daily statistics from hourly ERA5 data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# If first time running, uncomment the line below to install any additional dependencies\n", "# !bash requirements-for-notebooks.sh" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "from earthkit.data.testing import earthkit_remote_test_data_file\n", "\n", "from earthkit import data as ekd\n", "from earthkit.transforms import aggregate as ekt\n", "\n", "ekd.settings.set(\"cache-policy\", \"user\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load some test data\n", "\n", "All `earthkit-transforms` methods can be called with `earthkit-data` objects (Readers and Wrappers) or with the \n", "pre-loaded `xarray`.\n", "\n", "In this example we will use hourly ERA5 2m temperature data on a 0.5x0.5 spatial grid for the year 2015 as\n", "our physical data.\n", "\n", "First we download (if not already cached) lazily load the ERA5 data (please see tutorials in `earthkit-data` for more details in cache management).\n", "\n", "We convert the data to an `xarray.Dataset` with some additional options which are better suited for the data we are working with." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get some demonstration ERA5 data, this could be any url or path to an ERA5 grib or netCDF file.\n", "remote_era5_file = earthkit_remote_test_data_file(\"era5_temperature_europe_2015.grib\")\n", "era5_xr = ekd.from_source(\"url\", remote_era5_file)\n", "era5_xr = era5_xr.to_xarray(time_dim_mode=\"valid_time\").rename({\"2t\": \"t2m\"})\n", "era5_xr" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate the daily mean and standard deviation of the ERA5 data\n", "\n", "We can calculate the daily mean using daily_mean method in the temporal module. There are similar daily aggregation methods for the `daily_median`, `daily_min`, `daily_max`, `daily_std`, `daily_sum`, and all these again for monthly aggregations in the form `monthly_XXX`.\n", "\n", "**earthkit-aggregate** is able to understand any data object understood by **earthkit-data** as input. The **earthkit-aggregate** computation is based on xarray datacubes, therefore the returned object is an `xarray.Dataset`. To convert this to an EarthKit object you could use the **earthkit-data** method, `from_object`.\n", "\n", "If the input data is provided an `xarray.Dataset` then the return object is `xarray.Dataset` and if the input is an `xarray.DataArray` then the return object is an `xarray.DataArray`.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "era5_daily_mean = ekt.temporal.daily_mean(era5_xr)\n", "era5_daily_std = ekt.temporal.daily_std(era5_xr)\n", "era5_daily_mean\n", "# ekd.from_object(era5_daily_mean)\n", "\n", "# Note that the daily_mean and daily_std are convenience wrappers. It is also possible to achieve the same\n", "# result using the following syntax:\n", "# era5_daily_mean = ekt.temporal.daily_reduce(era5_xr, how=\"mean\")\n", "\n", "# It is also possible to achieve a similar result using the following syntax:\n", "# era5_daily_mean = ekt.temporal.daily_reduce(era5_xr, how=\"mean\")\n", "# However, the daily_reduce methods have additional options for handling metadata data,\n", "# therefore are recommended" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate the monthly mean and standard deviation" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "era5_monthly_mean = ekt.temporal.monthly_mean(era5_xr)\n", "era5_monthly_std = ekt.temporal.monthly_std(era5_xr)\n", "era5_monthly_std" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Calculate a rolling mean with a 50 timestep window\n", "\n", "To calculate a rolling mean along the time dimension you can use the rolling_reduce function.\n", "\n", "NOTE: An improved API to the rolling_reduce method is an ongoing task" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "era5_rolling = ekt.temporal.rolling_reduce(\n", " era5_xr,\n", " 50,\n", " how_reduce=\"mean\",\n", " center=True,\n", ")\n", "era5_rolling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Account for non-UTC timezones\n", "\n", "There is a time_shift argument which can be used to account for non-UTC time zones:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "era5_daily_mean_p12 = ekt.temporal.daily_mean(era5_xr, time_shift={\"hours\": 12})\n", "era5_daily_mean_p12" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Plot a random point location to see the different aggregation methods" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "isel_kwargs = {\"latitude\": 100, \"longitude\": 100}\n", "\n", "fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(10, 5))\n", "\n", "era5_xr.t2m.isel(**isel_kwargs).plot(label=\"Raw data\", ax=ax, color=\"blue\")\n", "era5_daily_mean.t2m.isel(**isel_kwargs).plot(label=\"Daily mean\", ax=ax, color=\"orange\")\n", "\n", "era5_daily_mean_p12.t2m.isel(**isel_kwargs).plot(label=\"Daily mean (12h shift)\", ax=ax, color=\"purple\")\n", "\n", "# # To add the daily spread as orange dotted lines:\n", "# upper_d = era5_daily_mean.t2m + era5_daily_std.t2m\n", "# lower_d = era5_daily_mean.t2m - era5_daily_std.t2m\n", "# upper_d.isel(\n", "# **isel_kwargs).plot(ax=ax, label='Daily standard deviation spread', linestyle='--', color='orange'\n", "#\n", "# lower_d.isel(**isel_kwargs).plot(ax=ax, linestyle='--', color='orange')\n", "\n", "# Add the rolling mean as green line:\n", "era5_rolling.t2m.isel(**isel_kwargs).plot(label=\"Rolling mean\", ax=ax, color=\"green\")\n", "\n", "# Add the monthly mean as a black solid line and spread as black dotted lines:\n", "era5_monthly_mean.t2m.isel(**isel_kwargs).plot(label=\"Monthly mean\", ax=ax, color=\"black\")\n", "upper_m = era5_monthly_mean.t2m + era5_monthly_std.t2m\n", "lower_m = era5_monthly_mean.t2m - era5_monthly_std.t2m\n", "upper_m.isel(**isel_kwargs).plot(ax=ax, label=\"Monthly standard deviation spread\", linestyle=\"--\", color=\"black\")\n", "lower_m.isel(**isel_kwargs).plot(ax=ax, linestyle=\"--\", color=\"black\")\n", "\n", "\n", "# figure = fig[0].get_figure()\n", "ax.legend(loc=2)\n", "ax.set_title(\"Aggregation methods\")\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": ".conda", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.12" } }, "nbformat": 4, "nbformat_minor": 2 }