Daily statistics from hourly ERA5 data

[1]:
# If first time running, uncomment the line below to install any additional dependencies
# !bash requirements-for-notebooks.sh
[2]:
from earthkit.transforms import aggregate as ek_aggregate
from earthkit import data as ek_data

from earthkit.data.testing import earthkit_remote_test_data_file
ek_data.settings.set("cache-policy", "user")

import matplotlib.pyplot as plt

Load some test data

All earthkit-transforms methods can be called with earthkit-data objects (Readers and Wrappers) or with the pre-loaded xarray.

In this example we will use hourly ERA5 2m temperature data on a 0.5x0.5 spatial grid for the year 2015 as our physical data.

First we download (if not already cached) lazily load the ERA5 data (please see tutorials in earthkit-data for more details in cache management).

We inspect the data using the describe method and see we have some 2m air temperature data. For a more detailed representation of the data you can use the to_xarray method.

[3]:
# Get some demonstration ERA5 data, this could be any url or path to an ERA5 grib or netCDF file.
remote_era5_file = earthkit_remote_test_data_file("test-data", "era5_temperature_europe_2015.grib")
era5_data = ek_data.from_source("url", remote_era5_file)
# era5_data.to_xarray()
era5_data.describe()
[3]:
    level date time step paramId class stream type experimentVersionNumber
shortName typeOfLevel                  
2t surface 0 20150301,20150302,... 0,1800,... 0 167 ea oper an 0001

Calculate the daily mean and standard deviation of the ERA5 data

We can calculate the daily mean using daily_mean method in the temporal module. There are similar daily aggregation methods for the daily_median, daily_min, daily_max, daily_std, daily_sum, and all these again for monthly aggregations in the form monthly_XXX.

earthkit-aggregate is able to understand any data object understood by earthkit-data as input. The earthkit-aggregate computation is based on xarray datacubes, therefore the returned object is an xarray.Dataset. To convert this to an EarthKit object you could use the earthkit-data method, from_object.

If the input data is provided an xarray.Dataset then the return object is xarray.Dataset and if the input is an xarray.DataArray then the return object is an xarray.DataArray.

[4]:
era5_daily_mean = ek_aggregate.temporal.daily_mean(era5_data)
era5_daily_std = ek_aggregate.temporal.daily_std(era5_data)
era5_daily_mean
# ek_data.from_object(era5_daily_mean)

# Note that the daily_mean and daily_std are convenience wrappers. It is also possible to achieve the same result using the following syntax:
# era5_daily_mean = ek_aggregate.temporal.daily_reduce(era5_data, how="mean")

# It is also possible to achieve a similar result using the following syntax:
# era5_daily_mean = ek_aggregate.temporal.daily_reduce(era5_data, how="mean")
# However, the daily_reduce methods have additional options for handling metadata data, therefore are recommended
[4]:
<xarray.Dataset> Size: 82MB
Dimensions:    (time: 365, number: 1, step: 1, surface: 1, latitude: 201,
                longitude: 281)
Coordinates:
  * number     (number) int64 8B 0
  * step       (step) timedelta64[ns] 8B 00:00:00
  * surface    (surface) float64 8B 0.0
  * latitude   (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude  (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
  * time       (time) datetime64[ns] 3kB 2015-01-01 2015-01-02 ... 2015-12-31
Data variables:
    t2m        (time, number, step, surface, latitude, longitude) float32 82MB ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-07-29T09:06 GRIB to CDM+CF via cfgrib-0.9.1...

Calculate the monthly mean and standard deviation

[5]:
era5_monthly_mean = ek_aggregate.temporal.monthly_mean(era5_data)
era5_monthly_std = ek_aggregate.temporal.monthly_std(era5_data)
era5_monthly_std
[5]:
<xarray.Dataset> Size: 3MB
Dimensions:    (time: 12, number: 1, step: 1, surface: 1, latitude: 201,
                longitude: 281)
Coordinates:
  * number     (number) int64 8B 0
  * step       (step) timedelta64[ns] 8B 00:00:00
  * surface    (surface) float64 8B 0.0
  * latitude   (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude  (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
  * time       (time) datetime64[ns] 96B 2015-01-01 2015-02-01 ... 2015-12-01
Data variables:
    t2m        (time, number, step, surface, latitude, longitude) float32 3MB ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-07-29T09:06 GRIB to CDM+CF via cfgrib-0.9.1...

Calculate a rolling mean with a 50 timestep window

To calculate a rolling mean along the time dimension you can use the rolling_reduce function.

NOTE: An improved API to the rolling_reduce method is an ongoing task

[6]:
era5_rolling = ek_aggregate.temporal.rolling_reduce(
    era5_data, 50, how_reduce="mean", center=True,
)
era5_rolling
[6]:
<xarray.Dataset> Size: 330MB
Dimensions:     (number: 1, time: 1460, step: 1, surface: 1, latitude: 201,
                 longitude: 281)
Coordinates:
  * number      (number) int64 8B 0
  * time        (time) datetime64[ns] 12kB 2015-01-01 ... 2015-12-31T18:00:00
  * step        (step) timedelta64[ns] 8B 00:00:00
  * surface     (surface) float64 8B 0.0
  * latitude    (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude   (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
    valid_time  (time, step) datetime64[ns] 12kB dask.array<chunksize=(1460, 1), meta=np.ndarray>
Data variables:
    t2m         (number, time, step, surface, latitude, longitude) float32 330MB dask.array<chunksize=(1, 1459, 1, 1, 201, 281), meta=np.ndarray>
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-07-29T09:06 GRIB to CDM+CF via cfgrib-0.9.1...

Account for non-UTC timezones

There is a time_shift argument which can be used to account for non-UTC time zones:

[7]:
era5_daily_mean_p12 = ek_aggregate.temporal.daily_mean(era5_data, time_shift={"hours": 12})
era5_daily_mean_p12
[7]:
<xarray.Dataset> Size: 83MB
Dimensions:    (time: 366, number: 1, step: 1, surface: 1, latitude: 201,
                longitude: 281)
Coordinates:
  * number     (number) int64 8B 0
  * step       (step) timedelta64[ns] 8B 00:00:00
  * surface    (surface) float64 8B 0.0
  * latitude   (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude  (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
  * time       (time) datetime64[ns] 3kB 2015-01-01 2015-01-02 ... 2016-01-01
Data variables:
    t2m        (time, number, step, surface, latitude, longitude) float32 83MB ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2024-07-29T09:06 GRIB to CDM+CF via cfgrib-0.9.1...

Use an arbitary method for the reduction

The temporal.daily_reduce and temporal.monthly_reduce methods can take any method which is accepted by the xarray reduce method, typically this means it must take axis as an argument. See the xarray.Dataset.reduce documentation for more details.

[ ]:

Plot a random point location to see the different aggregation methods

[8]:
isel_kwargs = {"latitude":100, "longitude":100}

fig, ax = plt.subplots(ncols=1, nrows=1, figsize=(10,5))

era5_data.to_xarray().t2m.isel(**isel_kwargs).plot(label='Raw data', ax=ax, color='blue')
era5_daily_mean.t2m.isel(**isel_kwargs).plot(
    label='Daily mean', ax=ax, color='orange'
)

era5_daily_mean_p12.t2m.isel(**isel_kwargs).plot(
    label='Daily mean (12h shift)', ax=ax, color='purple'
)

# # To add the daily spread as orange dotted lines:
# upper_d = era5_daily_mean.t2m + era5_daily_std.t2m
# lower_d = era5_daily_mean.t2m - era5_daily_std.t2m
# upper_d.isel(**isel_kwargs).plot(ax=ax, label='Daily standard deviation spread', linestyle='--', color='orange')
# lower_d.isel(**isel_kwargs).plot(ax=ax, linestyle='--', color='orange')

# Add the rolling mean as green line:
era5_rolling.t2m.isel(**isel_kwargs).plot(label='Rolling mean', ax=ax, color='green')

# Add the monthly mean as a black solid line and spread as black dotted lines:
era5_monthly_mean.t2m.isel(**isel_kwargs).plot(label='Monthly mean', ax=ax, color='black')
upper_m = era5_monthly_mean.t2m + era5_monthly_std.t2m
lower_m = era5_monthly_mean.t2m - era5_monthly_std.t2m
upper_m.isel(**isel_kwargs).plot(ax=ax, label='Monthly standard deviation spread', linestyle='--', color='black')
lower_m.isel(**isel_kwargs).plot(ax=ax, linestyle='--', color='black')


# figure = fig[0].get_figure()
ax.legend(loc=2)
ax.set_title("Aggregation methods")
plt.show()
../../../_images/notebooks_aggregate_temporal_02-era5-daily-monthly-statistics_16_0.png
[ ]: