Skip to content
Snippets Groups Projects
nodata_management.ipynb 8.37 KiB
Newer Older
  • Learn to ignore specific revisions
  • mhagdorn's avatar
    mhagdorn committed
    {
     "cells": [
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "# Nodata Management"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
    
        "If you have opened a dataset and the nodata value can be determined, you can access it via the `rio.nodata` or `rio.encoded_nodata` accessors.\n",
    
    mhagdorn's avatar
    mhagdorn committed
        "\n",
        "If your dataset's nodata value cannot be determined, you can use the `rio.write_nodata` method.\n",
        "\n",
        "### Search order for nodata (DataArray only):\n",
        "1. Look in attributes (`attrs`) of your data array for the `_FillValue` then `missing_value` then `fill_value` and finally `nodata`.\n",
        "2. Look in the `nodatavals` attribute. This is for backwards compatibility with `xarray.open_rasterio`. We recommend using `rioxarray.open_rasterio` instead.\n",
        "\n",
        "### API Documentation\n",
        "\n",
    
        "- [rio.write_nodata()](../rioxarray.rst#rioxarray.raster_array.RasterArray.write_nodata)\n",
        "- [rio.nodata](../rioxarray.rst#rioxarray.raster_array.RasterArray.nodata)\n",
        "- [rio.encoded_nodata](../rioxarray.rst#rioxarray.raster_array.RasterArray.encoded_nodata)"
    
    mhagdorn's avatar
    mhagdorn committed
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 1,
       "metadata": {},
       "outputs": [],
       "source": [
        "import rioxarray\n",
        "import xarray\n",
        "\n",
        "file_path = \"../../test/test_data/input/tmmx_20190121.nc\""
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Example of loading unmaksed data\n",
        "\n",
        "In this case, the nodata value is in the attributes."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 2,
       "metadata": {},
       "outputs": [],
       "source": [
        "xds = xarray.open_dataset(file_path, mask_and_scale=False) # performs mask_and_scale by default\n",
        "rds = rioxarray.open_rasterio(file_path)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 3,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "nodata:\n",
          "- xarray.open_dataset: 32767\n",
          "- rioxarray.open_rasterio: 32767.0\n",
          "\n",
          "encoded_nodata:\n",
          "- xarray.open_dataset: None\n",
          "- rioxarray.open_rasterio: None\n"
         ]
        }
       ],
       "source": [
        "print(\"nodata:\")\n",
        "print(f\"- xarray.open_dataset: {xds.air_temperature.rio.nodata}\")\n",
        "print(f\"- rioxarray.open_rasterio: {rds.air_temperature.rio.nodata}\")\n",
        "print(\"\\nencoded_nodata:\")\n",
        "print(f\"- xarray.open_dataset: {xds.air_temperature.rio.encoded_nodata}\")\n",
        "print(f\"- rioxarray.open_rasterio: {rds.air_temperature.rio.encoded_nodata}\")"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 4,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "attributes:\n",
          "\n",
          "- xarray.open_dataset:\n",
          "    {'_FillValue': 32767, 'units': 'K', 'description': 'Daily Maximum Temperature', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'missing_value': 32767, 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'coordinate_system': 'WGS84,EPSG:4326', 'scale_factor': 0.1, 'add_offset': 220.0, '_Unsigned': 'true'}\n",
          "\n",
          "- rioxarray.open_rasterio:\n",
          "    {'add_offset': 220.0, 'coordinates': 'day', 'coordinate_system': 'WGS84,EPSG:4326', 'description': 'Daily Maximum Temperature', 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'long_name': 'tmmx', 'missing_value': 32767, 'scale_factor': 0.1, 'standard_name': 'tmmx', 'units': 'K', '_FillValue': 32767.0, '_Unsigned': 'true'}\n"
         ]
        }
       ],
       "source": [
        "print(\"attributes:\")\n",
        "print(f\"\\n- xarray.open_dataset:\\n    {xds.air_temperature.attrs}\")\n",
        "print(f\"\\n- rioxarray.open_rasterio:\\n    {rds.air_temperature.attrs}\")"
       ]
      },
      {
       "cell_type": "markdown",
       "metadata": {},
       "source": [
        "## Example of data loaded in with mask_and_scale=True\n",
        "\n",
        "When the dataset is opened with `mask_and_scale=True` with `rioxarray.open_rasterio` or `xarray.open_dataset`, the\n",
        "nodata metadata is written to the encoding attribute. Then, when the dataset is written using\n",
        "`to_netcdf` or `rio.to_raster` the data is decoded and it writes the original nodata value to the raster.\n",
        "\n",
        "When this happens, `rio.nodata` returns `numpy.nan` and `rio.encoded_nodata` contains the original value."
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 5,
       "metadata": {},
       "outputs": [
        {
         "name": "stderr",
         "output_type": "stream",
         "text": [
          "~/miniconda/envs/rioxarray/lib/python3.8/site-packages/xarray/conventions.py:487: SerializationWarning: variable 'air_temperature' has _Unsigned attribute but is not of integer type. Ignoring attribute.\n",
          "  new_vars[k] = decode_cf_variable(\n",
          "~/scripts/rioxarray/rioxarray/_io.py:501: SerializationWarning: variable 'air_temperature' has _Unsigned attribute but is not of integer type. Ignoring attribute.\n",
          "  rioda = open_rasterio(\n"
         ]
        }
       ],
       "source": [
        "xds = xarray.open_dataset(file_path) # performs mask_and_scale by default\n",
        "rds = rioxarray.open_rasterio(file_path, mask_and_scale=True)"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 6,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "nodata:\n",
          "- xarray.open_dataset: nan\n",
          "- rioxarray.open_rasterio: nan\n",
          "\n",
          "encoded_nodata:\n",
          "- xarray.open_dataset: 32767\n",
          "- rioxarray.open_rasterio: 32767.0\n"
         ]
        }
       ],
       "source": [
        "print(\"nodata:\")\n",
        "print(f\"- xarray.open_dataset: {xds.air_temperature.rio.nodata}\")\n",
        "print(f\"- rioxarray.open_rasterio: {rds.air_temperature.rio.nodata}\")\n",
        "print(\"\\nencoded_nodata:\")\n",
        "print(f\"- xarray.open_dataset: {xds.air_temperature.rio.encoded_nodata}\")\n",
        "print(f\"- rioxarray.open_rasterio: {rds.air_temperature.rio.encoded_nodata}\")"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 7,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "attributes:\n",
          "\n",
          "- xarray.open_dataset:\n",
          "    {'units': 'K', 'description': 'Daily Maximum Temperature', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'coordinate_system': 'WGS84,EPSG:4326'}\n",
          "\n",
          "- rioxarray.open_rasterio:\n",
          "    {'coordinates': 'day', 'coordinate_system': 'WGS84,EPSG:4326', 'description': 'Daily Maximum Temperature', 'dimensions': 'lon lat time', 'grid_mapping': 'crs', 'long_name': 'tmmx', 'standard_name': 'tmmx', 'units': 'K'}\n"
         ]
        }
       ],
       "source": [
        "print(\"attributes:\")\n",
        "print(f\"\\n- xarray.open_dataset:\\n    {xds.air_temperature.attrs}\")\n",
        "print(f\"\\n- rioxarray.open_rasterio:\\n    {rds.air_temperature.attrs}\")"
       ]
      },
      {
       "cell_type": "code",
       "execution_count": 8,
       "metadata": {},
       "outputs": [
        {
         "name": "stdout",
         "output_type": "stream",
         "text": [
          "encoding:\n",
          "\n",
          "- xarray.open_dataset:\n",
          "    {'zlib': True, 'shuffle': True, 'complevel': 5, 'fletcher32': False, 'contiguous': False, 'chunksizes': (585, 1386), 'source': '~/scripts/rioxarray/test/test_data/input/tmmx_20190121.nc', 'original_shape': (585, 1386), 'dtype': dtype('uint16'), '_Unsigned': 'true', 'missing_value': 32767, '_FillValue': 32767, 'scale_factor': 0.1, 'add_offset': 220.0, 'coordinates': 'day'}\n",
          "\n",
          "- rioxarray.open_rasterio:\n",
          "    {'_Unsigned': 'true', 'scale_factor': 0.1, 'add_offset': 220.0, '_FillValue': 32767.0, 'missing_value': 32767}\n"
         ]
        }
       ],
       "source": [
        "print(\"encoding:\")\n",
        "print(f\"\\n- xarray.open_dataset:\\n    {xds.air_temperature.encoding}\")\n",
        "print(f\"\\n- rioxarray.open_rasterio:\\n    {rds.air_temperature.encoding}\")"
       ]
      }
     ],
     "metadata": {
      "kernelspec": {
       "display_name": "Python 3",
       "language": "python",
       "name": "python3"
      },
      "language_info": {
       "codemirror_mode": {
        "name": "ipython",
        "version": 3
       },
       "file_extension": ".py",
       "mimetype": "text/x-python",
       "name": "python",
       "nbconvert_exporter": "python",
       "pygments_lexer": "ipython3",
       "version": "3.8.3"
      }
     },
     "nbformat": 4,
     "nbformat_minor": 4
    }