General temporal aggregation methods

[1]:
# If first time running, uncomment the line below to install any additional dependencies
# !bash requirements-for-notebooks.sh
[2]:
from earthkit.data.testing import earthkit_remote_test_data_file

from earthkit import data as ekd
from earthkit.transforms import aggregate as ekt

ekd.settings.set("cache-policy", "user")

Load some test data

All earthkit-transforms methods can be called with earthkit-data objects (Readers and Wrappers) or with the pre-loaded xarray.

In this example we will use hourly ERA5 2m temperature data on a 0.5x0.5 spatial grid for the year 2015 as our physical data.

First we download (if not already cached) lazily load the ERA5 data (please see tutorials in earthkit-data for more details in cache management).

We convert the data to an xarray dataset using some options which are preferred for our handling of the data we are working with. The earthkit transforms methods can handle out of the box earthkit data objects, but for clarity we create the xarray objects here.

[3]:
# Get some demonstration ERA5 data, this could be any url or path to an ERA5 grib or netCDF file.
remote_era5_file = earthkit_remote_test_data_file("era5_temperature_europe_2015.grib")
era5_data = ekd.from_source("url", remote_era5_file)
era5_xr = era5_data.to_xarray(time_dim_mode="valid_time").rename({"2t": "t2m"})

Reduce the ERA5 data over the time dimension

The default reduction method is mean, other methods can be applied using the how kwarg.

Note that we do not need to worry about the data format of the input array, earthkit will convert it to the required xarray format internally.

The returned object is an xarray dataset, however this may change in future version of the package.

The mean over the time dimension

[4]:
era5_t_mean = ekt.temporal.reduce(era5_xr)  # how="mean"
era5_t_mean
/tmp/ipykernel_1012/1140177696.py:1: DeprecationWarning: The function 'reduce' from the legacy aggregate module is deprecated and will be removed in version 2.X of earthkit.transforms. Use 'earthkit.transforms.temporal.reduce' instead.
  era5_t_mean = ekt.temporal.reduce(era5_xr)  # how="mean"
[4]:
<xarray.Dataset> Size: 456kB
Dimensions:    (latitude: 201, longitude: 281)
Coordinates:
  * latitude   (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude  (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m        (latitude, longitude) float64 452kB 262.5 262.6 ... 297.4 294.4
Attributes: (12/13)
    param:        2t
    paramId:      167
    class:        ea
    stream:       oper
    levtype:      sfc
    type:         an
    ...           ...
    date:         20150101
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF
[5]:
# A simple matplotlib plot to view the data:
era5_t_mean.t2m.plot()
[5]:
<matplotlib.collections.QuadMesh at 0x7bcc5ec9dad0>
../../_images/notebooks_temporal_01-era5-general-methods_7_1.png

The median over the time dimension

[6]:
era5_t_median = ekt.temporal.reduce(era5_xr, how="median")
era5_t_median
/tmp/ipykernel_1012/1856356623.py:1: DeprecationWarning: The function 'reduce' from the legacy aggregate module is deprecated and will be removed in version 2.X of earthkit.transforms. Use 'earthkit.transforms.temporal.reduce' instead.
  era5_t_median = ekt.temporal.reduce(era5_xr, how="median")
[6]:
<xarray.Dataset> Size: 456kB
Dimensions:    (latitude: 201, longitude: 281)
Coordinates:
  * latitude   (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude  (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m        (latitude, longitude) float64 452kB 262.2 262.2 ... 298.2 294.9
Attributes: (12/13)
    param:        2t
    paramId:      167
    class:        ea
    stream:       oper
    levtype:      sfc
    type:         an
    ...           ...
    date:         20150101
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF
[7]:
# A simple matplotlib plot to view the data:
era5_t_median.t2m.plot()
[7]:
<matplotlib.collections.QuadMesh at 0x7bcc9091b7d0>
../../_images/notebooks_temporal_01-era5-general-methods_10_1.png

Calling the temporal reduce method with an arbitary function

The temporal.reduce method can take any method which is accepted by the xarray reduce method, typically this means it must take axis as an argument. See the xarray.Dataset.reduce documentation for more details.

[8]:
import numpy as np


def my_method(array, axis=None, **kwargs):
    return np.mean(array, axis=axis, **kwargs) * np.std(array, axis=axis, **kwargs)


era5_t_my_method = ekt.temporal.reduce(era5_xr, how=my_method, how_label="random")
era5_t_my_method
/tmp/ipykernel_1012/4022870570.py:8: DeprecationWarning: The function 'reduce' from the legacy aggregate module is deprecated and will be removed in version 2.X of earthkit.transforms. Use 'earthkit.transforms.temporal.reduce' instead.
  era5_t_my_method = ekt.temporal.reduce(era5_xr, how=my_method, how_label="random")
[8]:
<xarray.Dataset> Size: 456kB
Dimensions:     (latitude: 201, longitude: 281)
Coordinates:
  * latitude    (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude   (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m_random  (latitude, longitude) float64 452kB 2.244e+03 ... 2.904e+03
Attributes: (12/13)
    param:        2t
    paramId:      167
    class:        ea
    stream:       oper
    levtype:      sfc
    type:         an
    ...           ...
    date:         20150101
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF
[9]:
# A simple matplotlib plot to view the data:
era5_t_my_method.t2m_random.plot()
[9]:
<matplotlib.collections.QuadMesh at 0x7bcc5cb87c50>
../../_images/notebooks_temporal_01-era5-general-methods_13_1.png

Calculate a rolling mean with a 50 timestep window

There is no temporal specific method for a rolling reduction. The general rolling_reduce method can do this calculation by specifying the dimension over which you would like to reduce.

[10]:
era5_rolling = ekt.rolling_reduce(
    era5_xr,
    valid_time=50,
    center=True,
)
era5_rolling
[10]:
<xarray.Dataset> Size: 660MB
Dimensions:     (valid_time: 1460, latitude: 201, longitude: 281)
Coordinates:
  * valid_time  (valid_time) datetime64[us] 12kB 2015-01-01 ... 2015-12-31T18...
  * latitude    (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude   (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m         (valid_time, latitude, longitude) float64 660MB dask.array<chunksize=(1459, 26, 36), meta=np.ndarray>
Attributes: (12/13)
    param:        2t
    paramId:      167
    class:        ea
    stream:       oper
    levtype:      sfc
    type:         an
    ...           ...
    date:         20150101
    time:         0
    domain:       g
    number:       0
    Conventions:  CF-1.8
    institution:  ECMWF
[ ]: