Daily statistics from hourly ERA5 data

[1]:
from earthkit import data as ekd
from earthkit import plots as ekp
from earthkit import transforms as ekt
from earthkit.transforms._tools import earthkit_remote_test_data_file

Load some test data

All earthkit-transforms methods can be called with earthkit-data objects (Readers and Wrappers) or with the pre-loaded xarray.

In this example we will use hourly ERA5 2m temperature data on a 0.5x0.5 spatial grid for the year 2015 as our physical data.

First we download (if not already cached) lazily load the ERA5 data (please see tutorials in earthkit-data for more details in cache management).

We convert the data to an xarray.Dataset with some additional options which are better suited for the data we are working with.

[2]:
# Get some demonstration ERA5 data, this could be any url or path to an ERA5 grib or netCDF file.
remote_era5_file = earthkit_remote_test_data_file("era5_temperature_europe_2015.grib")
era5_xr = ekd.from_source("url", remote_era5_file)
era5_xr = era5_xr.to_xarray(time_dims=["valid_time"]).rename({"2t": "t2m"})
era5_xr

[2]:
<xarray.Dataset> Size: 660MB
Dimensions:     (valid_time: 1460, latitude: 201, longitude: 281)
Coordinates:
  * valid_time  (valid_time) datetime64[us] 12kB 2015-01-01 ... 2015-12-31T18...
  * latitude    (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude   (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m         (valid_time, latitude, longitude) float64 660MB ...
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

Calculate the daily mean and standard deviation of the ERA5 data

We can calculate the daily mean using daily_mean method in the temporal module. There are similar daily aggregation methods for the daily_median, daily_min, daily_max, daily_std, daily_sum, and all these again for monthly aggregations in the form monthly_XXX.

earthkit-aggregate is able to understand any data object understood by earthkit-data as input. The earthkit-aggregate computation is based on xarray datacubes, therefore the returned object is an xarray.Dataset. To convert this to an EarthKit object you could use the earthkit-data method, from_object.

If the input data is provided an xarray.Dataset then the return object is xarray.Dataset and if the input is an xarray.DataArray then the return object is an xarray.DataArray.

[3]:
era5_daily_mean = ekt.temporal.daily_mean(era5_xr)
era5_daily_std = ekt.temporal.daily_std(era5_xr)
era5_daily_mean
[3]:
<xarray.Dataset> Size: 165MB
Dimensions:     (valid_time: 365, latitude: 201, longitude: 281)
Coordinates:
  * valid_time  (valid_time) datetime64[us] 3kB 2015-01-01 ... 2015-12-31
  * latitude    (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude   (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m         (valid_time, latitude, longitude) float64 165MB 254.4 ... 285.8
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

Calculate the monthly mean and standard deviation

[4]:
era5_monthly_mean = ekt.temporal.monthly_mean(era5_xr)
era5_monthly_std = ekt.temporal.monthly_std(era5_xr)
era5_monthly_std
[4]:
<xarray.Dataset> Size: 5MB
Dimensions:     (valid_time: 12, latitude: 201, longitude: 281)
Coordinates:
  * valid_time  (valid_time) datetime64[us] 96B 2015-01-01 ... 2015-12-01
  * latitude    (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude   (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m         (valid_time, latitude, longitude) float64 5MB 6.104 ... 6.375
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

Calculate a rolling mean with a 50 timestep window

To calculate a rolling mean along the time dimension you can use the rolling_reduce function.

NOTE: An improved API to the rolling_reduce method is an ongoing task

[5]:
era5_rolling = ekt.temporal.rolling_reduce(
    era5_xr,
    50,
    how_reduce="mean",
    center=True,
)
era5_rolling
[5]:
<xarray.Dataset> Size: 660MB
Dimensions:     (valid_time: 1460, latitude: 201, longitude: 281)
Coordinates:
  * valid_time  (valid_time) datetime64[us] 12kB 2015-01-01 ... 2015-12-31T18...
  * latitude    (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude   (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m         (valid_time, latitude, longitude) float64 660MB dask.array<chunksize=(1459, 26, 36), meta=np.ndarray>
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

Account for non-UTC timezones

There is a time_shift argument which can be used to account for non-UTC time zones:

[6]:
era5_daily_mean_p12 = ekt.temporal.daily_mean(era5_xr, time_shift={"hours": 12})
era5_daily_mean_p12
[6]:
<xarray.Dataset> Size: 165MB
Dimensions:     (valid_time: 366, latitude: 201, longitude: 281)
Coordinates:
  * valid_time  (valid_time) datetime64[us] 3kB 2015-01-01 ... 2016-01-01
  * latitude    (latitude) float64 2kB 80.0 79.75 79.5 79.25 ... 30.5 30.25 30.0
  * longitude   (longitude) float64 2kB -10.0 -9.75 -9.5 ... 59.5 59.75 60.0
Data variables:
    t2m         (valid_time, latitude, longitude) float64 165MB 253.5 ... 285.6
Attributes:
    Conventions:  CF-1.8
    institution:  ECMWF

Plot a random point location to see the different aggregation methods

[7]:
latitude = 51.5
longitude = -1.0
sel_kwargs = {"latitude": latitude, "longitude": longitude}

chart = ekp.TimeSeries()

# Plot the raw data and various aggregations on the same chart for comparison:
# Raw data:
chart.line(era5_xr.sel(**sel_kwargs), units="celsius", label="Raw data")
# Daily mean:
chart.line(era5_daily_mean.sel(**sel_kwargs), units="celsius", label="Daily mean")
# Daily mean with 12h shift:
chart.line(era5_daily_mean_p12.sel(**sel_kwargs), units="celsius", label="Daily mean (12h shift)")
# Rolling mean:
chart.line(era5_rolling.sel(**sel_kwargs), units="celsius", label="Rolling mean")

# Add the monthly mean as a black solid line and spread as black dotted lines:
chart.line(era5_monthly_mean.sel(**sel_kwargs), units="celsius", label="Monthly mean", color="black")
upper_m = era5_monthly_mean + era5_monthly_std
lower_m = era5_monthly_mean - era5_monthly_std
chart.line(
    upper_m.sel(**sel_kwargs),
    units="celsius",
    label="Monthly standard deviation spread",
    linestyle="--",
    color="black",
)
chart.line(lower_m.sel(**sel_kwargs), units="celsius", linestyle="--", color="black")

chart.title(
    "Comparison of aggregation methods {variable_name}\n{location:%c}, {location:%C} ({latitude:%Lt}, {longitude:%Ln})"
)

chart.ylabel()

chart.xticks(frequency="M", format="%b", period=True)

chart.legend()

chart.show()
../../_images/tutorials_temporal_02-era5-daily-monthly-statistics_13_0.png
[ ]: