{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Masking data-cubes using geometry objects" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "from earthkit.data.testing import earthkit_remote_test_data_file\n", "\n", "from earthkit import data as ekd\n", "from earthkit import transforms as ekt" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Load some test data\n", "\n", "All `earthkit-transforms` methods can be called with `earthkit-data` objects (Readers and Wrappers) or with the \n", "pre-loaded `xarray` or `geopandas` objects.\n", "\n", "In this example we will use hourly ERA5 2m temperature data on a 0.5x0.5 spatial grid for the year 2015 as\n", "our physical data; and we will use the NUTS geometries which are stored in a geojson file.\n", "\n", "First we lazily load the ERA5 data and NUTS geometries from our test-data repository.\n", "\n", "Note the data is only downloaded when\n", "we use it, e.g. at the `.to_xarray` line, additionally, the download is cached so the next time you run this\n", "cell you will not need to re-download the file (unless it has been a very long time since you have run the\n", "code, please see tutorials in `earthkit-data` for more details in cache management)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Get some demonstration ERA5 data, this could be any url or path to an ERA5 grib or netCDF file.\n", "# remote_era5_file = earthkit_remote_test_data_file(\"era5_temperature_europe_2015.grib\") # Large file\n", "remote_era5_file = earthkit_remote_test_data_file(\"era5_temperature_europe_20150101.grib\")\n", "era5_data = ekd.from_source(\"url\", remote_era5_file)\n", "\n", "# Open as an xarray dataset, renaming the 2m temperature variable to something more manageable\n", "era5_xr = era5_data.to_xarray(time_dim_mode=\"valid_time\").rename({\"2t\": \"t2m\"})\n", "era5_xr" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Use some demonstration polygons stored, this could be any url or path to geojson file\n", "remote_nuts_url = earthkit_remote_test_data_file(\"NUTS_RG_60M_2021_4326_LEVL_0.geojson\")\n", "nuts_data = ekd.from_source(\"url\", remote_nuts_url)\n", "\n", "nuts_data.to_pandas()[:5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Mask dataarray with geodataframe\n", "\n", "`shapes.mask` applies all the features in the geometry object (`nuts_data`) to the data object (`era5_data`).\n", "It returns an xarray object the same shape and type as the input xarray object with all points outside of\n", "the geometry masked" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "single_masked_data = ekt.spatial.mask(era5_xr, nuts_data, union_geometries=True)\n", "single_masked_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axes = plt.subplots(1, 2, figsize=(10, 4))\n", "era5_xr.t2m.mean(dim=\"valid_time\").plot(ax=axes[0])\n", "axes[0].set_title(\"Original data\")\n", "# Single masked data\n", "single_masked_data.t2m.mean(dim=\"valid_time\").plot(ax=axes[1])\n", "axes[1].set_title(\"Masked data\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`shapes.masks` applies the features in the geometry object (`nuts_data`) to the data object (`era5_data`).\n", "It returns an xarray object with an additional dimension, and coordinate variable, corresponding to the \n", "features in the geometry object.\n", "By default this is the index of the input geodataframe, in this example the index is just an integer\n", "count so it takes the default name `index`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "masked_data = ekt.spatial.mask(era5_xr, nuts_data)\n", "masked_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is possible to specify a column in the geodataframe to use for the new dimension, for example in NUTS the\n", "`FID` (= feature id) which contains the two letter identier code for each feature:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "masked_data = ekt.spatial.mask(era5_xr, nuts_data, mask_dim=\"FID\")\n", "masked_data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we demonstrate what we have done by plotting the masked objects we have produced" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axes = plt.subplots(1, 3, figsize=(15, 3))\n", "era5_xr.t2m.mean(dim=\"valid_time\").plot(ax=axes[0])\n", "axes[0].set_title(\"Original data\")\n", "masked_data.t2m.sel(FID=\"DE\").mean(dim=\"valid_time\").plot(ax=axes[1])\n", "axes[1].set_title(\"Masked for Germany\")\n", "germany_data = masked_data.sel(FID=\"DE\").dropna(dim=\"latitude\", how=\"all\").dropna(dim=\"longitude\", how=\"all\")\n", "germany_data.t2m.mean(dim=\"valid_time\").plot(ax=axes[2])\n", "axes[2].set_title(\"Masked Germany Zoom\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": ".conda", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.7" }, "orig_nbformat": 4 }, "nbformat": 4, "nbformat_minor": 2 }