Temporal computations¶
The earthkit.transforms.temporal package module includes methods for transforming data with respect to the temporal coordinate(s). This includes aggregating the data in time dimension to a single value, daily values, or monthly values and calculating rates from accumulated data.
Aggregation methods¶
To aggregate the data in time dimension to a single value you can use the temporal.reduce, temporal.mean, temporal.sum, temporal.min, temporal.max functions. These functions take an xarray data object and return the aggregated value of the data. The time dimension is automatically detected based on the metadata of the data object, to override this you can use the time_dim parameter.
Show API documentation for reduce
- earthkit.transforms.temporal.reduce(dataarray: Dataset | DataArray, *_args, time_dim: str | None = None, **kwargs) Dataset | DataArray[source]
Reduce an xarray.dataarray/dataset along the time/date dimension using a specified how method.
With the option to apply weights either directly or using a specified weights method.
- Parameters:
dataarray (xarray.DataArray or xarray.Dataset) – Data object to reduce
time_dim (str) – Name of the time dimension, or coordinate, in the xarray object, default behaviour is to deduce time dimension from attributes of coordinates, then fall back to “time”. If you do not want to aggregate along the time dimension use earthkit.transforms.aggregate.reduce
how (str or callable) – Method used to reduce data. Default=’mean’, which will implement the xarray in-built mean. If string, it must be an in-built xarray reduce method, a earthkit how method or any numpy method. In the case of duplicate names, method selection is first in the order: xarray, earthkit, numpy. Otherwise it can be any function which can be called in the form f(x, axis=axis, **kwargs) to return the result of reducing an array over an integer valued axis
weights (str) – Choose a recognised method to apply weighting. Currently available methods are; ‘latitude’
how_label (str) – Label to append to the name of the variable in the reduced object, default is _{how}
how_dropna (str) – Choose how to drop nan values. Default is None and na values are preserved. Options are ‘any’ and ‘all’.
**kwargs – kwargs recognised by the how :func: earthkit.transforms.aggregate.reduce
- Returns:
A dataarray reduced in the time dimension using the specified method
- Return type:
Daily and monthly aggregations
To calculate calculate daily/monthly values you can use the temporal.daily_reduce or temporal.monthly_reduce functions. These functions reduce the data in time dimension to daily or monthly values respectively. The how parameter can be used to specify the aggregation method. The default is mean.
Show API documentation for daily_reduce
- earthkit.transforms.temporal.daily_reduce(dataarray: Dataset | DataArray, how: str | Callable = 'mean', time_dim: str | None = None, **kwargs) Dataset | DataArray[source]
Group data by day and reduce using the given how method.
- Parameters:
dataarray (xarray.DataArray) – DataArray containing a time dimension.
how (str or callable) – Method used to reduce data. Default=’mean’, which will implement the xarray in-built mean. If string, it must be an in-built xarray reduce method, a earthkit how method or any numpy method. In the case of duplicate names, method selection is first in the order: xarray, earthkit, numpy. Otherwise it can be any function which can be called in the form f(x, axis=axis, **kwargs) to return the result of reducing an array over an integer valued axis
time_dim (str) – Name of the time dimension, or coordinate, in the xarray object, default behaviour is to deduce time dimension from attributes of coordinates, then fall back to “time”.
time_shift (None, timedelta, dict, str or xarray.DataArray, optional) – A time shift to apply to the data prior to calculation, e.g. to change the local time zone. It can be provided as any object that can be understood by pandas.Timedelta, a dictionary is passed as kwargs to pandas.Timedelta. A string that cannot be parsed as a timedelta is interpreted as a reference to a coordinate of the input dataarray, allowing, e.g., for spatially-varying per-gridpoint time zone offsets. An xarray.DataArray can also be provided directly. Default is None.
remove_partial_periods (bool) – If True and a time_shift has been applied, the first and last time steps are removed to ensure equality in sampling periods. Default is False.
how_label (str) – Label to append to the name of the variable in the reduced object, default is _daily_{how}
extra_reduce_dims (str or list of str) – Additional dimensions to reduce over (in addition to the grouping dimension), for example to calculate a daily global mean you would set this to “longitude” and “latitude”. Default is None.
**kwargs – Keyword arguments to be passed to
reduce().
- Returns:
A dataarray reduced to daily values using the specified method
- Return type:
Show API documentation for monthly_reduce
- earthkit.transforms.temporal.monthly_reduce(dataarray: Dataset | DataArray, how: str | Callable = 'mean', time_dim: str | None = None, **kwargs)[source]
Group data by month and reduce using the given how method.
- Parameters:
dataarray (xarray.DataArray) – DataArray containing a time dimension.
how (str or callable) – Method used to reduce data. Default=’mean’, which will implement the xarray in-built mean. If string, it must be an in-built xarray reduce method, a earthkit how method or any numpy method. In the case of duplicate names, method selection is first in the order: xarray, earthkit, numpy. Otherwise it can be any function which can be called in the form f(x, axis=axis, **kwargs) to return the result of reducing an array over an integer valued axis
time_dim (str) – Name of the time dimension, or coordinate, in the xarray object to use for the calculation, default behaviour is to deduce time dimension from attributes of coordinates, then fall back to “time”.
time_shift (None, timedelta, dict, str or xarray.DataArray, optional) – A time shift to apply to the data prior to calculation, e.g. to change the local time zone. It can be provided as any object that can be understood by pandas.Timedelta, a dictionary is passed as kwargs to pandas.Timedelta. A string that cannot be parsed as a timedelta is interpreted as a reference to a coordinate of the input dataarray, allowing, e.g., for spatially-varying per-gridpoint time zone offsets. An xarray.DataArray can also be provided directly. Default is None.
remove_partial_periods (bool) – If True and a time_shift has been applied, the first and last time steps are removed to ensure equality in sampling periods. Default is False.
how_label (str) – Label to append to the name of the variable in the reduced object, default is _monthly_{how}
extra_reduce_dims (str or list of str) – Additional dimensions to reduce over (in addition to the grouping dimension), for example to calculate a monthly global mean you would set this to “longitude” and “latitude”. Default is None.
**kwargs – Keyword arguments to be passed to
reduce().
- Returns:
A dataarray reduced to monthly values using the specified method
- Return type:
In addition to the XXX_reduce functions, the temporal module also includes several methods which calculate the desired reduction, without the “how” parameter. These methods are wrappers of the daily_reduce and monthly_reduce methods and are documented in the API reference guide: earthkit.transforms.temporal package:
temporal.daily_mean
temporal.daily_median
temporal.daily_min
temporal.daily_max
temporal.daily_sum
temporal.daily_std
temporal.monthly_mean
temporal.monthly_median
temporal.monthly_min
temporal.monthly_max
temporal.monthly_sum
temporal.monthly_std
Rate calculations¶
To calculate rates from accumulated data you can use the temporal.accumulation_to_rate function. This function takes an xarray data object and returns the rate of change of the data. The time dimension is automatically detected based on the metadata of the data object, to override this you can use the time_dim parameter. Similarly, the step between time points is automatically detected, but can be overridden using the step parameter. If using the step parameter, the value should be provided in hours unless specifying a different unit with the step_units parameter. step_units can take any time units recognised by pandas, e.g. “minutes”, “days”, “15min”, “3H”, etc.
By default the function will calculate the rate per second, but this can be changed using the rate_units parameter. rate_units can take any time units recognised by pandas, e.g. “minutes”, “days”, “15min”, “3H”, etc. If you are only interested in “deaccumulating” the data, i.e. converting accumulated values to step values, you can set rate_units to “step_length” such that the rate_units will be equal to the step length between time points.
The accumulation_type parameter is used to specify the type of accumulation used in the input data.
The options are:
“start_of_step”: accumulation restarts at the beginning of each time step, e.g. ERA5.
“start_of_forecast”: accumulation restarts at the beginning of each forecast, e.g. Seeason forecasts.
“start_of_day”: accumulation restarts at the beginning of each day, e.g. ERA5-land.
Show API documentation for accumulation_to_rate
- earthkit.transforms.temporal.accumulation_to_rate(dataarray: Dataset | DataArray, *_args, **_kwargs) Dataset | DataArray[source]
Convert a variable accumulated from the beginning of the forecast to a rate.
The rate is computed by considering first-order discrete differences in data along the inferred, or specified, time dimension. The difference are converted to a rate by dividing by the time step duration, unless specified otherwise.
- Parameters:
dataarray (xarray.DataArray | xarray.Dataset) – Data accumulated along time to be converted into rate (per second).
step (timedelta | str, optional) – Interval between consecutive time steps. If a string, it should be a valid pandas time frequency string (e.g., ‘15min’, ‘3h’, ‘1 day’). If not provided, it will be inferred from the data.
rate_units (timedelta | str, optional) – Units for the output rate. If a string, it must be a valid pandas time frequency string (e.g., ‘15min’, ‘3h’, ‘1 day’) or simple units like ‘seconds’, ‘minutes’, ‘hours’, ‘days’. If set to ‘step_length’, the rate will be accumulation per time step (“deaccumulated”) and the returned object will preserve the units and long_name attributes of the input dataarray. The default is ‘seconds’.
rate_label (str or None, optional) – Suffix to append to the name of the output dataarray. If None, defaults to ‘rate’ or ‘per_step’ depending on the rate_units.
xp (T.Any) – The array namespace to use for the reduction. If None, it will be inferred from the dataarray.
time_dim (str, optional) – Name of the time dimension, or coordinate, in the xarray object to use for the calculation, default behaviour is to deduce time dimension from attributes of coordinates, then fall back to “time”.
accumulation_type (str, optional) –
Type of accumulation used in the input data. Default is “start_of_step”.
Options are:
”start_of_step”: accumulation restarts at the beginning of each time step.
”start_of_forecast”: accumulation starts at the beginning of the forecast and continues throughout the forecast period.
”start_of_day”: accumulation restarts at the beginning of each day (00:00 UTC).
from_first_step (bool, optional) – Only used if accumulation_type is “start_of_forecast”. If True, the first time step’s rate is calculated by dividing the first accumulation value by the step duration. Default is False.
provenance (bool, optional) – If True, appends a history entry to the output dataarray’s attributes indicating that the transformation was applied. Default is True.
- Returns:
Data object with rate calculated based on the accumulation data.
- Return type: