{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": {}, "source": [ "# Accumulation to rate examples" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from earthkit.data.testing import earthkit_remote_test_data_file\n", "\n", "from earthkit import data as ekd\n", "from earthkit import transforms as ekt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The examples demonstrated here use sample data that was extracted from the CDS for testing purposes.\n", "\n", "We provide several examples which attempt to cover the range of accumulation types that you may come accross when working with ECMWF data. Please refer to the [ECMWF documentation](https://confluence.ecmwf.int/x/hrTICw) for a detailed summary of the accumulation types." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Accumulated from start of step (e.g. ERA5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ERA5 accumulates data from the start of each time step, therefore to convert to a rate we need to divide by the length of the time step.\n", "\n", "In this example we use some sample total precipitation data from the [ERA5 single levels dataset](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels). It is hourly data for January 2024 over Europe at 3˚ⅹ3˚spatial resolution.\n", "\n", "As this is reanalysis data, we open the data using `time_dim_mode=\"valid_time\"` which provides a more intuitive representation of data which can be described with a single time dimension. If required. the accumulation_to_rate function can handle start of step accumulated data if it is opened using the default `time_dim_mode=\"forecast\"`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Open the demonstration ERA5 single levels data with earthkit data\n", "remote_era5_file = earthkit_remote_test_data_file(\"era5-sfc-precip-3deg-202401.grib\")\n", "era5_data = ekd.from_source(\"url\", remote_era5_file)\n", "\n", "# Open the data with xarray using the valid_time time dimension mode.\n", "ds_era5 = era5_data.to_xarray(time_dim_mode=\"valid_time\")\n", "ds_era5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ERA5 single levels data is accumulated from the start of each time step, therefore for the accumulation_to_rate conversion, we set `accumulation_type=\"start_of_step\"`. This is the default value, we include here only for demonstration purposes.\n", "\n", "The accumulation_to_rate function assumes that the data contains consecutive time steps and calculates the step length as the difference between time steps. If this is not the case, then you can provide the `step` as keyword argument." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_era5_rate = ekt.temporal.accumulation_to_rate(ds_era5, accumulation_type=\"start_of_step\")\n", "ds_era5_rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The returned `xarray.Dataset` has a `tp_rate` data variable, this automatic suffix can be changed or suppressed using the `rate_label` kwarg in the function call.\n", "\n", "By inspecting the attributes of the `tp_rate` variable above you can see that the units and long_name have been updated to describe the rate data.\n", "\n", "Additionally, a history attribute has been added describing the that data has been converted from an accumulation to a rate. This can be surpressed by setting `provenance=False` in the function call.\n", "\n", "All other variable and dataset attributes have been removed, this is intentional as it cannot be guarenteed that they are still valid for the rate data.\n", "\n", "To see the effect of the accumulation to rate conversion, we plot the data on the same axis. We plot the accumulated data as a bar chart (blue) and the rate data as line (orange). As the data is accumulated since the start of step, this change is essentially just a change in units, which is evident by the identical shapes of the bars and line (with a small offset due to the plotting axis range)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot the raw accumulation and the rate converted data for comparison.\n", "import matplotlib.pyplot as plt\n", "\n", "plot_point = {\"latitude\": 5, \"longitude\": 5}\n", "fig, ax1 = plt.subplots(figsize=(8, 3))\n", "color = \"tab:blue\"\n", "ax1.set_xlabel(\"Time\")\n", "ax1.set_ylabel(f\"Accumulation ({ds_era5['tp'].units})\", color=color)\n", "ax1.bar(ds_era5[\"valid_time\"], ds_era5[\"tp\"].isel(plot_point), color=color, label=\"Accumulation\", width=0.1, lw=0)\n", "ax1.tick_params(axis=\"y\", labelcolor=color)\n", "\n", "ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis\n", "color = \"tab:orange\"\n", "ax2.set_ylabel(f\"Rate ({ds_era5_rate['tp_rate'].units})\", color=color) # we already handled the\n", "ax2.plot(\n", " ds_era5_rate[\"valid_time\"],\n", " ds_era5_rate[\"tp_rate\"].isel(plot_point),\n", " color=color,\n", " label=\"Rate\", # ls='dotted',\n", ")\n", "ax2.tick_params(axis=\"y\", labelcolor=color)\n", "plt.title(\"Comparison of Accumulation and Rate Converted Data\")\n", "\n", "fig.tight_layout() # otherwise the right y-label is slightly clipped\n", "fig.autofmt_xdate()\n", "\n", "# Plot legend with handles from both axes\n", "handles1, labels1 = ax1.get_legend_handles_labels()\n", "handles2, labels2 = ax2.get_legend_handles_labels()\n", "plt.legend(handles1 + handles2, labels1 + labels2)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Daily accumulation (e.g. ERA5-Land)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ERA5-land is accumulated from the start of each day, which makes it a particularly tricky example as we must intricately deaccumulate the data then convert to a rate. Luckily, all this is simplified using the accumulation_to_rate function.\n", "\n", "In this example we use some sample total precipitation data from the\n", "[ERA5-land dataset](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-land).\n", "It is hourly data for January 2024 over Europe at 3˚ⅹ3˚spatial resolution.\n", "\n", "As this is reanalysis data, we open the data using `time_dim_mode=\"valid_time\"` which provides a more intuitive representation of data which can be described with a single time dimension. If required. the accumulation_to_rate function can handle daily accumulated data if it is opened using the default `time_dim_mode=\"forecast\"`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Open the demonstration ERA5-land data with earthkit data\n", "remote_era5land_file = earthkit_remote_test_data_file(\"era5-land-precip-3deg-202401.grib\")\n", "era5land_data = ekd.from_source(\"url\", remote_era5land_file)\n", "\n", "# Open the data with xarray using the valid_time time dimension mode.\n", "ds_era5land = era5land_data.to_xarray(time_dim_mode=\"valid_time\")\n", "ds_era5land" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ERA5-land is accumulated from the start of each day, therefore we set `accumulation_type=\"start_of_day\"`.\n", "\n", "The accumulation_to_rate function assumes that the data contains consecutive time steps and calculates the step length as the difference along the detected time dimension.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_era5land_rate = ekt.temporal.accumulation_to_rate(ds_era5land, accumulation_type=\"start_of_day\")\n", "ds_era5land_rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with the previous example, the output contains a `tp_rate` variable where the units and long_name attributes have been updated to describe the rate data, and a history attribute added.\n", "\n", "You will also see that the `valid_time` dimension is one less than provided in the input. This is because the function does not know what came before this time step. If we know that the first step starts from zero, then we can set `from_first_step=True`, we demonstrate this in the next, seasonal forecast, example as it is not valid here.\n", "\n", "Below we plot the accumulation and rate data on the same axis using the same styling as the previous example. This time the effect of the accumulation to rate is more evident as there is both a deaccumulation and a unit conversion applied. Dashed vertical lines at the start of each day have been included to demonstrate the restart of the accumulated precipitation.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.dates as mdates\n", "import matplotlib.pyplot as plt\n", "\n", "# Plot the raw accumulation and the rate converted data for comparison.\n", "plot_point = {\"latitude\": 10, \"longitude\": 18}\n", "fig, ax1 = plt.subplots(figsize=(8, 3))\n", "color = \"tab:blue\"\n", "ax1.set_xlabel(\"Time\")\n", "ax1.set_ylabel(f\"Accumulation ({ds_era5land['tp'].units})\", color=color)\n", "ax1.bar(\n", " ds_era5land[\"valid_time\"],\n", " ds_era5land[\"tp\"].isel(plot_point),\n", " color=color,\n", " label=\"Accumulation\",\n", " width=0.1,\n", " lw=0,\n", ")\n", "ax1.tick_params(axis=\"y\", labelcolor=color)\n", "\n", "ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis\n", "color = \"tab:orange\"\n", "ax2.set_ylabel(f\"Rate ({ds_era5land_rate['tp_rate'].units})\", color=color) # we already handled the\n", "ax2.plot(\n", " ds_era5land_rate[\"valid_time\"],\n", " ds_era5land_rate[\"tp_rate\"].isel(plot_point),\n", " color=color,\n", " label=\"Rate\", # ls='dotted',\n", ")\n", "ax2.tick_params(axis=\"y\", labelcolor=color)\n", "plt.title(\"Comparison of Accumulation and Rate Converted Data\")\n", "fig.tight_layout() # otherwise the right y-label is slightly clipped\n", "fig.autofmt_xdate()\n", "\n", "# Add minor ticks for each day\n", "ax1.xaxis.set_minor_locator(mdates.DayLocator(interval=1))\n", "# Gridlines on major and minor ticks (daily)\n", "ax1.xaxis.grid(True, which=\"both\", linestyle=\"--\", alpha=0.5)\n", "\n", "# Plot legend with handles from both axes\n", "handles1, labels1 = ax1.get_legend_handles_labels()\n", "handles2, labels2 = ax2.get_legend_handles_labels()\n", "plt.legend(handles1 + handles2, labels1 + labels2)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Accumulated from start of forecast (e.g. Seasonal forecasts)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Seasonal forecast data is accumulated from the start of each forecast initialisation, therefore we need to deaccumulate the data and then convert to a rate. It is common place for forecast data to include multiple forecast initialisations in a single file, therefore in the example below we will use the `time_dim_mode=\"forecast\"` representation of the data.\n", "\n", "In this example we use some sample total precipitation data from the\n", "[Seasonal forecast daily and subdaily dataset](https://cds.climate.copernicus.eu/datasets/seasonal-original-single-levels).\n", "It provides 60 days of daily data for two forecasts, initialised January 1st and February 1st 2024, over Europe at 3˚ⅹ3˚spatial resolution." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Open the demonstration Seasonal forecast data with earthkit data\n", "remote_seas5_file = earthkit_remote_test_data_file(\"seas5-precip-3deg-202401-202402.grib\")\n", "seas5_data = ekd.from_source(\"url\", remote_seas5_file)\n", "\n", "# Open the data with xarray using the valid_time time dimension mode.\n", "ds_seas5 = seas5_data.to_xarray(time_dim_mode=\"forecast\", add_valid_time_coord=True)\n", "ds_seas5" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can now caculate the rate by setting the accumulation type to \"start_of_forecast\". " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_seas5_rate = ekt.temporal.accumulation_to_rate(\n", " ds_seas5,\n", " accumulation_type=\"start_of_forecast\",\n", ")\n", "ds_seas5_rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Comparing the original data to the rate data we see that there is one step less in the rate data, this is because earthkit-transforms does not assume that the data starts from zero. This can be set manually by setting `from_first_step=True` where it assume the first step accumulated from zero." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_seas5_rate = ekt.temporal.accumulation_to_rate(ds_seas5, accumulation_type=\"start_of_forecast\", from_first_step=True)\n", "ds_seas5_rate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is a plot of the accumulated vs rate data for the two forecast initialisations in the file, January and February 2024." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "n_forecasts = len(ds_seas5.forecast_reference_time)\n", "\n", "# Plot the raw accumulation and the rate converted data for comparison.\n", "plot_point = {\"latitude\": 10, \"longitude\": 18, \"number\": 0}\n", "fig, axes = plt.subplots(figsize=(8, 3 * n_forecasts), nrows=n_forecasts)\n", "\n", "for i in range(n_forecasts):\n", " ax1 = axes[i]\n", " _ds_accum = ds_seas5.isel(**plot_point, forecast_reference_time=i)\n", " _ds_rate = ds_seas5_rate.isel(**plot_point, forecast_reference_time=i)\n", " color = \"tab:blue\"\n", " ax1.set_xlabel(\"Time\")\n", " ax1.set_ylabel(f\"Accumulation ({_ds_accum['tp'].units})\", color=color)\n", " ax1.bar(\n", " _ds_accum[\"valid_time\"],\n", " _ds_accum[\"tp\"],\n", " color=color,\n", " label=\"Accumulation\",\n", " width=1.0,\n", " )\n", " ax1.tick_params(axis=\"y\", labelcolor=color)\n", "\n", " ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis\n", " color = \"tab:orange\"\n", " ax2.set_ylabel(f\"Rate ({_ds_rate['tp_rate'].units})\", color=color) # we already handled the\n", " ax2.plot(\n", " _ds_rate[\"valid_time\"],\n", " _ds_rate[\"tp_rate\"],\n", " color=color,\n", " label=\"Rate\", # ls='dotted',\n", " )\n", " ax2.tick_params(axis=\"y\", labelcolor=color)\n", "\n", "fig.suptitle(\"Comparison of Accumulation and Rate Converted Data\")\n", "fig.tight_layout() # otherwise the right y-label is slightly clipped\n", "fig.autofmt_xdate()\n", "\n", "# Plot legend with handles from both axes\n", "handles1, labels1 = ax1.get_legend_handles_labels()\n", "handles2, labels2 = ax2.get_legend_handles_labels()\n", "fig.legend(handles1 + handles2, labels1 + labels2, loc=\"lower left\")\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deaccumulate only" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Deaccumulation of data is akin to converting to a rate where the units of the rate are equivalent to the length of the time step. This can be done using the `earthkit.transforms.temporal.deaccumulate` function, as demonstrated below.\n", "\n", "ℹ️ **Note:** The deaccumulation function is a wrapper of the accumulation_to_rate function with `rate_units=\"step_length\"`.\n", "\n", "The example below uses the same seasonal forecast data as the previous example, and deaccumulates the total precipitation such that is represents the accumulation of a single time-step. Comparable with the native ERA5 data in the first example." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "ds_seas5_deaccumulate = ekt.temporal.deaccumulate(ds_seas5, accumulation_type=\"start_of_forecast\", from_first_step=True)\n", "ds_seas5_deaccumulate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is a plot of the accumulated and de-accumulated data for the two forecast initialisations in the file, January and February 2024. \n", "The plot demonstrates that the deaccumulation is the same as the rate calculated with the previous example but with differing units." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "\n", "n_forecasts = len(ds_seas5.forecast_reference_time)\n", "\n", "# Plot the raw accumulation and the rate converted data for comparison.\n", "plot_point = {\"latitude\": 10, \"longitude\": 18, \"number\": 0}\n", "fig, axes = plt.subplots(figsize=(8, 3 * n_forecasts), nrows=n_forecasts)\n", "\n", "for i in range(n_forecasts):\n", " ax1 = axes[i]\n", " _ds_accum = ds_seas5.isel(**plot_point, forecast_reference_time=i)\n", " _ds_deaccum = ds_seas5_deaccumulate.isel(**plot_point, forecast_reference_time=i)\n", " color = \"tab:blue\"\n", " ax1.set_xlabel(\"Time\")\n", " ax1.set_ylabel(f\"Accumulation ({_ds_accum['tp'].units})\", color=color)\n", " ax1.bar(\n", " _ds_accum[\"valid_time\"],\n", " _ds_accum[\"tp\"],\n", " color=color,\n", " label=\"Accumulation\",\n", " width=1.0,\n", " )\n", " ax1.tick_params(axis=\"y\", labelcolor=color)\n", "\n", " ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis\n", " color = \"tab:orange\"\n", " ax2.set_ylabel(f\"Deaccumulation ({_ds_deaccum['tp'].units})\", color=color) # we already handled the\n", " ax2.plot(\n", " _ds_deaccum[\"valid_time\"],\n", " _ds_deaccum[\"tp\"],\n", " color=color,\n", " label=\"Deaccumulation\", # ls='dotted',\n", " )\n", " ax2.tick_params(axis=\"y\", labelcolor=color)\n", "\n", "fig.suptitle(\"Comparison of Accumulation and Deaccumulated Data\")\n", "fig.tight_layout() # otherwise the right y-label is slightly clipped\n", "fig.autofmt_xdate()\n", "\n", "# Plot legend with handles from both axes\n", "handles1, labels1 = ax1.get_legend_handles_labels()\n", "handles2, labels2 = ax2.get_legend_handles_labels()\n", "fig.legend(handles1 + handles2, labels1 + labels2, loc=\"lower left\")\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": ".conda", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.12" } }, "nbformat": 4, "nbformat_minor": 2 }