General temporal aggregation methods

[1]:
# If first time running, uncomment the line below to install any additional dependencies
# !bash requirements-for-notebooks.sh
[2]:
from earthkit.transforms import aggregate as ek_aggregate
from earthkit import data as ek_data

from earthkit.data.testing import earthkit_remote_test_data_file
ek_data.settings.set("cache-policy", "user")

Load some test data

All earthkit-transforms methods can be called with earthkit-data objects (Readers and Wrappers) or with the pre-loaded xarray.

In this example we will use hourly ERA5 2m temperature data on a 0.5x0.5 spatial grid for the year 2015 as our physical data.

First we download (if not already cached) lazily load the ERA5 data (please see tutorials in earthkit-data for more details in cache management).

We inspect the data using the describe method and see we have some 2m air temperature data. For a more detailed representation of the data you can use the to_xarray method.

[3]:
# Get some demonstration ERA5 data, this could be any url or path to an ERA5 grib or netCDF file.
remote_era5_file = earthkit_remote_test_data_file("test-data", "era5_temperature_europe_2015.grib")
era5_data = ek_data.from_source("url", remote_era5_file)
era5_data.describe()
# era5_data.to_xarray()
[3]:
    level date time step paramId class stream type experimentVersionNumber
shortName typeOfLevel                  
2t surface 0 20150301,20150302,... 0,1800,... 0 167 ea oper an 0001

Reduce the ERA5 data over the time dimension

The default reduction method is mean, other methods can be applied using the how kwarg.

Note that we do not need to worry about the data format of the input array, earthkit will convert it to the required xarray format internally.

The returned object is an xarray dataset, however this may change in future version of the package.

The mean over the time dimension

[4]:
era5_t_mean = ek_aggregate.temporal.reduce(era5_data)  # how="mean"
era5_t_mean
[4]:
<xarray.Dataset> Size: 230kB
Dimensions:    (number: 1, step: 1, surface: 1, latitude: 201, longitude: 281)
Coordinates:
  * number     (number) int64 8B 0
  * step       (step) timedelta64[ns] 8B 00:00:00
  * surface    (surface) float64 8B 0.0
  * latitude   (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude  (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m        (number, step, surface, latitude, longitude) float32 226kB 262...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-07-29T08:52 GRIB to CDM+CF via cfgrib-0.9.1...
[5]:
# A simple matplotlib plot to view the data:
era5_t_mean.t2m.plot()
[5]:
<matplotlib.collections.QuadMesh at 0x14ff47450>
../../../_images/notebooks_aggregate_temporal_01-era5-general-methods_7_1.png

The median over the time dimension

[6]:
era5_t_median = ek_aggregate.temporal.reduce(era5_data, how="median")
era5_t_median
[6]:
<xarray.Dataset> Size: 230kB
Dimensions:    (number: 1, step: 1, surface: 1, latitude: 201, longitude: 281)
Coordinates:
  * number     (number) int64 8B 0
  * step       (step) timedelta64[ns] 8B 00:00:00
  * surface    (surface) float64 8B 0.0
  * latitude   (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude  (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m        (number, step, surface, latitude, longitude) float32 226kB 262...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-07-29T08:52 GRIB to CDM+CF via cfgrib-0.9.1...
[7]:
# A simple matplotlib plot to view the data:
era5_t_median.t2m.plot()
[7]:
<matplotlib.collections.QuadMesh at 0x1533b1e10>
../../../_images/notebooks_aggregate_temporal_01-era5-general-methods_10_1.png

Calling the temporal reduce method with an arbitary function

The temporal.reduce method can take any method which is accepted by the xarray reduce method, typically this means it must take axis as an argument. See the xarray.Dataset.reduce documentation for more details.

[15]:
import numpy as np
def my_method(array, axis=None, **kwargs):
    return np.mean(array, axis=axis, **kwargs) * np.std(array, axis=axis, **kwargs)

era5_t_my_method = ek_aggregate.temporal.reduce(era5_data, how=my_method, how_label="random")
era5_t_my_method

[15]:
<xarray.Dataset> Size: 230kB
Dimensions:       (number: 1, step: 1, surface: 1, latitude: 201, longitude: 281)
Coordinates:
  * number        (number) int64 8B 0
  * step          (step) timedelta64[ns] 8B 00:00:00
  * surface       (surface) float64 8B 0.0
  * latitude      (latitude) float64 2kB 80.0 79.75 79.5 ... 30.5 30.25 30.0
  * longitude     (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m_made_big  (number, step, surface, latitude, longitude) float32 226kB ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-07-29T08:56 GRIB to CDM+CF via cfgrib-0.9.1...
[16]:
# A simple matplotlib plot to view the data:
era5_t_my_method.t2m_random.plot()
[16]:
<matplotlib.collections.QuadMesh at 0x15625c210>
../../../_images/notebooks_aggregate_temporal_01-era5-general-methods_13_1.png

Calculate a rolling mean with a 50 timestep window

There is no temporal specific method for a rolling reduction. The general rolling_reduce method can do this calculation by specifying the dimension over which you would like to reduce.

[9]:
era5_rolling = ek_aggregate.rolling_reduce(
    era5_data, time=50, center=True,
)
era5_rolling
[9]:
<xarray.Dataset> Size: 330MB
Dimensions:     (number: 1, time: 1460, step: 1, surface: 1, latitude: 201,
                 longitude: 281)
Coordinates:
  * number      (number) int64 8B 0
  * time        (time) datetime64[ns] 12kB 2015-01-01 ... 2015-12-31T18:00:00
  * step        (step) timedelta64[ns] 8B 00:00:00
  * surface     (surface) float64 8B 0.0
  * latitude    (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude   (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
    valid_time  (time, step) datetime64[ns] 12kB dask.array<chunksize=(1460, 1), meta=np.ndarray>
Data variables:
    t2m         (number, time, step, surface, latitude, longitude) float32 330MB dask.array<chunksize=(1, 1459, 1, 1, 201, 281), meta=np.ndarray>
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-07-29T08:52 GRIB to CDM+CF via cfgrib-0.9.1...
[ ]: