Spatial aggregations and masking¶
The earthkit.transforms.spatial package module includes methods for aggregating data in space. This includes masking and aggregating data with geometries.
To mask data you can use the mask() function. This function takes an xarray
data object and a geometry object and returns the data object with the values
outside the geometry masked.
Show API documentation for mask
- earthkit.transforms.spatial.mask(dataarray: Dataset | DataArray, geodataframe: GeoDataFrame | None = None, mask_dim: str | None = None, lat_key: str | None = None, lon_key: str | None = None, chunk: bool = True, union_geometries: bool = False, area: dict | None = None, **mask_kwargs) Dataset | DataArray[source]
Apply multiple shape masks to some gridded data.
Each feature in shape is treated as an individual mask to apply to data. The data provided is returned with an additional dimension equal in length to the number of features in the shape object, this can result in very large files which will slow down your script. It may be better to loop over individual features, or directly apply the mask with the shapes.reduce.
- Parameters:
dataarray – Xarray data object (must have geospatial coordinates).
geodataframe – Geopandas Dataframe containing the polygons for aggregations. Either
geodataframeorareamust be provided, but not both.area (dict, optional) – Dictionary with keys
"north","south","east","west"defining a bounding box. Converted to a single-polygon GeoDataFrame internally. Areas that cross the anti-meridian (west > east) are not currently supported.mask_dim – dimension that will be created to accommodate the masked arrays, default is the index of the geodataframe
all_touched – If True, all pixels touched by geometries will be considered in, if False, only pixels whose center is within. Default is False. Only valid for regular data.
lat_key – key for latitude variable, default behaviour is to detect variable keys.
lon_key – key for longitude variable, default behaviour is to detect variable keys.
chunk (bool) – Boolean to indicate whether to use chunking, default = True. This is advised as spatial.masks can create large results. If you are working with small arrays, or you have implemented you own chunking rules you may wish to disable it.
union_geometries (bool) – Boolean to indicate whether to union all geometries before masking. Default is False, which will apply each geometry in the geodataframe as a separate mask.
mask_kwargs – Any kwargs to pass into the mask method
- Returns:
A masked data array with dimensions [feature_id] + [data.dims]. Each slice of layer corresponds to a feature in layer.
- Return type:
To calculate an aggregated value for the data within a geometry you can use the
reduce() function. This function takes an xarray data object and a geometry
object and returns the aggregated value of the data within the geometry. The how
parameter can be used to specify the aggregation method. The default is mean.
Show API documentation for reduce
- earthkit.transforms.spatial.reduce(dataarray: Dataset | DataArray, geodataframe: GeoDataFrame | None = None, mask_arrays: DataArray | list[DataArray] | None = None, area: dict | None = None, **kwargs) Dataset | DataArray[source]
Apply a shape object to an xarray.DataArray object using the specified ‘how’ method.
Geospatial coordinates are reduced to a dimension representing the list of features in the shape object.
- Parameters:
dataarray – Xarray data object (must have geospatial coordinates).
geodataframe – Geopandas Dataframe containing the polygons for aggregations. Cannot be provided together with
area.area (dict, optional) – Dictionary with keys
"north","south","east","west"defining a bounding box. Converted to a single-polygon GeoDataFrame internally. Areas that cross the anti-meridian (west > east) are not currently supported.mask_arrays – precomputed mask array[s], if provided this will be used instead of creating a new mask. They must be on the same spatial grid as the dataarray.
how – method used to apply mask. Default=’mean’, which calls xp.nanmean
weights – Provide weights for aggregation, also accepts recognised keys for weights, e.g. ‘latitude’
lat_key/lon_key – key for latitude/longitude variable, default behaviour is to detect variable keys.
extra_reduce_dims – any additional dimensions to aggregate over when reducing over spatial dimensions
mask_dim – dimension that will be created after the reduction of the spatial dimensions, default is the index of the dataframe
all_touched – If True, all pixels touched by geometries will be considered in, if False, only pixels whose center is within. Default is False. Only valid for regular data.
mask_kwargs – Any kwargs to pass into the mask method
mask_arrays – precomputed mask array[s], if provided this will be used instead of creating a new mask. They must be on the same spatial grid as the dataarray.
return_as – what format to return the data object, pandas or xarray. Work In Progress
compact – If True, return a compact pandas.DataFrame with the reduced data as a new column. If False, return a fully expanded pandas.DataFrame. Only valid if return_as is pandas
how_label – label to append to variable name in returned object, default is not to append
kwargs – kwargs recognised by the how function
- Returns:
A data array with dimensions features + data.dims not in ‘lat’,’lon’. Each slice of layer corresponds to a feature in layer.
- Return type:
Using a bounding box instead of a geometry¶
Both mask() and reduce() accept an area keyword argument as an
alternative to providing a geodataframe. area is a dictionary with keys
"north", "south", "east" and "west" that defines a simple
bounding box:
ekt.spatial.reduce(
ds, area={"north": 60, "south": 30, "east": 40, "west": -10}, how="mean"
)
The bounding box is converted internally to a single-polygon GeoDataFrame.
Providing both area and geodataframe raises a ValueError.
Note
Areas that cross the anti-meridian (where west > east) are not currently
supported.
In addition to the above functions, the spatial module also includes several methods for computing the intermedieate steps of the aggregation process. These methods are documented in the API reference guide: earthkit.transforms.spatial package