earthkit.transforms.temporal

The transforms.temporal module includes methods for transforming data with respect to the temporal coordinate(s). This includes aggregating the data in time dimension to a single value, daily values, or monthly values and calculating rates from accumulated data.

Aggregation methods

To aggregate the data in time dimension to a single value you can use the temporal.reduce, temporal.mean, temporal.sum, temporal.min, temporal.max functions. These functions take an xarray data object and return the aggregated value of the data. The time dimension is automatically detected based on the metadata of the data object, to override this you can use the time_dim parameter.

To calculate calculate daily/monthly values you can use the temporal.daily_reduce or temporal.monthly_reduce functions. These functions reduce the data in time dimension to daily or monthly values respectively. The how parameter can be used to specify the aggregation method. The default is mean.

earthkit.transforms.temporal.daily_reduce(dataarray, how='mean', time_dim=None, **kwargs)

Group data by day and reduce using the given how method.

Parameters:
  • dataarray (xr.DataArray) – DataArray containing a time dimension.

  • how (str or callable) – Method used to reduce data. Default=’mean’, which will implement the xarray in-built mean. If string, it must be an in-built xarray reduce method, a earthkit how method or any numpy method. In the case of duplicate names, method selection is first in the order: xarray, earthkit, numpy. Otherwise it can be any function which can be called in the form f(x, axis=axis, **kwargs) to return the result of reducing an array over an integer valued axis

  • time_dim (str) – Name of the time dimension, or coordinate, in the xarray object, default behaviour is to deduce time dimension from attributes of coordinates, then fall back to “time”.

  • time_shift ((optional) None, timedelta or dict) – A time shift to apply to the data prior to calculation, e.g. to change the local time zone. It can be provided as any object that can be understood by pandas.Timedelta, a dictionary is passed as kwargs to pandas.Timedelta. Default is None.

  • remove_partial_periods (bool) – If True and a time_shift has been applied, the first and last time steps are removed to ensure equality in sampling periods. Default is False.

  • how_label (str) – Label to append to the name of the variable in the reduced object, default is _daily_{how}

  • extra_reduce_dims (str or list of str) – Additional dimensions to reduce over (in addition to the grouping dimension), for example to calculate a daily global mean you would set this to “longitude” and “latitude”. Default is None.

  • **kwargs – Keyword arguments to be passed to reduce().

Returns:

A dataarray reduced to daily values using the specified method

Return type:

xr.DataArray | xr.Dataset

earthkit.transforms.temporal.monthly_reduce(dataarray, how='mean', time_dim=None, **kwargs)

Group data by day and reduce using the given how method.

Parameters:
  • dataarray (xr.DataArray) – DataArray containing a time dimension.

  • how (str or callable) – Method used to reduce data. Default=’mean’, which will implement the xarray in-built mean. If string, it must be an in-built xarray reduce method, a earthkit how method or any numpy method. In the case of duplicate names, method selection is first in the order: xarray, earthkit, numpy. Otherwise it can be any function which can be called in the form f(x, axis=axis, **kwargs) to return the result of reducing an array over an integer valued axis

  • time_dim (str) – Name of the time dimension, or coordinate, in the xarray object to use for the calculation, default behaviour is to deduce time dimension from attributes of coordinates, then fall back to “time”.

  • time_shift ((optional) None, timedelta or dict) – A time shift to apply to the data prior to calculation, e.g. to change the local time zone. It can be provided as any object that can be understood by pandas.Timedelta, a dictionary is passed as kwargs to pandas.Timedelta. Default is None.

  • remove_partial_periods (bool) – If True and a time_shift has been applied, the first and last time steps are removed to ensure equality in sampling periods. Default is False.

  • how_label (str) – Label to append to the name of the variable in the reduced object, default is _monthly_{how}

  • extra_reduce_dims (str or list of str) – Additional dimensions to reduce over (in addition to the grouping dimension), for example to calculate a monthly global mean you would set this to “longitude” and “latitude”. Default is None.

  • **kwargs – Keyword arguments to be passed to reduce().

Returns:

A dataarray reduced to monthly values using the specified method

Return type:

xr.DataArray | xr.Dataset

In addition to the XXX_reduce functions, the temporal module also includes several methods which calculate the desired reduction, without the “how” parameter. These methods are wrappers of the daily_reduce and monthly_reduce methods and are documented in the API reference guide: transforms.temporal:

  • temporal.daily_mean

  • temporal.daily_median

  • temporal.daily_min

  • temporal.daily_max

  • temporal.daily_sum

  • temporal.daily_std

  • temporal.monthly_mean

  • temporal.monthly_median

  • temporal.monthly_min

  • temporal.monthly_max

  • temporal.monthly_sum

  • temporal.monthly_std

Rate calculations

To calculate rates from accumulated data you can use the temporal.accumulation_to_rate function. This function takes an xarray data object and returns the rate of change of the data. The time dimension is automatically detected based on the metadata of the data object, to override this you can use the time_dim parameter. Similarly, the step between time points is automatically detected, but can be overridden using the step parameter. If using the step parameter, the value should be provided in hours unless specifying a different unit with the step_units parameter. step_units can take any time units recognised by pandas, e.g. “minutes”, “days”, “15min”, “3H”, etc.

By default the function will calculate the rate per second, but this can be changed using the rate_units parameter. rate_units can take any time units recognised by pandas, e.g. “minutes”, “days”, “15min”, “3H”, etc. If you are only interested in “deaccumulating” the data, i.e. converting accumulated values to step values, you can set rate_units to “step_length” such that the rate_units will be equal to the step length between time points.

The accumulation_type parameter is used to specify the type of accumulation used in the input data. The options are: - “start_of_step”: accumulation restarts at the beginning of each time step, e.g. ERA5. - “start_of_forecast”: accumulation restarts at the beginning of each forecast, e.g. Seeason forecasts. - “start_of_day”: accumulation restarts at the beginning of each day, e.g. ERA5-land.

earthkit.transforms.temporal.accumulation_to_rate(dataarray, *_args, **_kwargs)

Convert a variable accumulated from the beginning of the forecast to a rate.

The rate is computed by considering first-order discrete differences in data along the inferred, or specified, time dimension. The difference are converted to a rate by dividing by the time step duration, unless specified otherwise.

Parameters:
  • dataarray (xr.DataArray | xr.Dataset) – Data accumulated along time to be converted into rate (per second).

  • step (timedelta | str , optional) – Interval between consecutive time steps. If a string, it should be a valid pandas time frequency string (e.g., ‘15min’, ‘3h’, ‘1 day’). If not provided, the will be inferred from the data.

  • rate_units (timedelta | str, optional) – Units for the output rate. If a string, it must be a valid pandas time frequency string (e.g., ‘15min’, ‘3h’, ‘1 day’) or simple units like ‘seconds’, ‘minutes’, ‘hours’, ‘days’. If set to ‘step_length’, the rate will be accumulation per time step (“deaccumulated”) and the returned object will preserve the units and long_name attributes of the input dataarray. The default is ‘seconds’.

  • rate_label (str | None = None, optional) – Suffix to append to the name of the output dataarray. If None, defaults to ‘rate’ or ‘per_step’ depending on the rate_units.

  • xp (T.Any) – The array namespace to use for the reduction. If None, it will be inferred from the dataarray.

  • time_dim (str, optional) – Name of the time dimension, or coordinate, in the xarray object to use for the calculation, default behaviour is to deduce time dimension from attributes of coordinates, then fall back to “time”.

  • accumulation_type (str, optional) –

    Type of accumulation used in the input data. Default is “start_of_step”. Options are: - “start_of_step”: accumulation restarts at the beginning of each time step. - “start_of_forecast”: accumulation starts at the beginning of the forecast and continues

    throughout the forecast period.

    • ”start_of_day”: accumulation restarts at the beginning of each day (00:00 UTC).

    Default is “start_of_step”.

  • from_first_step (bool, optional) – Only used if accumulation_type is “start_of_forecast”. If True, the first time step’s rate is calculated by dividing the first accumulation value by the step duration. Default is False.

  • provenance (bool, optional) – If True, appends a history entry to the output dataarray’s attributes indicating that the transformation was applied. Default is True.

Returns:

Data object with rate calculated based on the accumulation data.

Return type:

xr.DataArray | xr.Dataset