When you want to process raster data (arrays with geo-coordinates) with Python, there are several options. It was the most straightforward for me to use Python, read all the data in xarray and process it with rasterio using rioxarray (tutorial by University of Colorado Boulder here).
Rioxarray requires data arrays, x-y coordinate array, and coordinate reference system information (CRS) to function. However, it is not always easy to find out and set the CRS, as different files stores CRS in different places (I wish for more advancement of data-standardization!).
In this article, I summarize 4 ways to find out and set the CRS in a xarray dataset.
Disclaimer: I am not the super expert on the projection system, so please use the code at your own risk and let me know when you find any mistakes!
To start off, I import libraries as follows.
# Import library
import rioxarray as rio
import xarray as xr
from osgeo import gdal
from pyproj import CRS
PythonMethod 1. Get CRS info with gdal (the easiest option)
If the data is well-prepared for geo-processing, you will not need to dig into the data file. Gdal will do the job.
# Open data with Gdal and get the CRS information
ds0 = gdal.Open(file_path)
crs = CRS.from_string(ds0.GetProjection())
# Open data with xarray and set the CRS information obtained above
ds = xr.open_dataset(file_path)
ds.rio.write_crs(crs.to_string(), inplace = True)
PythonMethod 2. Read a wkt or proj4 string from a file
If the Method 1 does not work, you have to dig into the data file and find out where the CRS information is located.
# Open data with xarray
ds = xr.open_dataset(file_path)
# Locate CRS info in the file manually
cc = ds.TEMP.attrs.crs
# Set the CRS information obtained above
ds.rio.write_crs(cc.to_string(), inplace = True)
PythonMethod 3. Read CF from a file, and convert it to CRS
Similar to the second method. However, sometimes in the NetCDF file, the coordinate information is stored in the CF convention, which need conversion to CRS (details in pyproj website).
# Open data with xarray
ds = xr.open_dataset(file_path)
# Locate CRS info in CF convention in the variable file, convert it to CRS format, and read the string as cc
cc = CRS.from_cf(ds.crs.attrs)
# Set the CRS information obtained above
ds.rio.write_crs(cc.to_string(), inplace = True)
PythonMethod 4. Write a wkt or proj4 string
Last, if the above all did not work, you can create a CRS string by yourself. Find out where the projection, the parameters for the projection, the datum, and the spheroid are located.
Here are some useful links for this method.
To find out the compatibility of different parameter names for common projection: https://cfconventions.org/wkt-proj-4.html
To find out the formatting template for wkt or proj4 strings (you will only need to tweak the parameters): http://epsg.io/4326
# Open data with xarray
ds = xr.open_dataset(file_path)
# Create a CRS string
crs =
'GEOGCS["HRAP_Sphere",DATUM["",SPHEROID["",6371200.0,0.0],PRIMEM["Greenwich",0.0], PARAMETER["false_easting",1905000],PARAMETER["false_northing",7620000], PARAMETER["latitude_of_origin",60], PARAMETER["longitude_of_origin",-105], UNIT["Degree",0.0174532925199433]]'
# Set the CRS information obtained above
ds.rio.write_crs(crs, inplace=True)
Python
By the way, specifically for the WRF-Hydro grid, I wrote as follows (referring to this blogpost). WRF-Hydro is national hydrologic model in the U.S., and it uses their own grid definitions.
# Open data with xarray
ds0 = xr.open_dataset(file_path)
# Create a CRS string
crs = '+proj=lcc +lat_1=39 +lat_2=50 +lat_0=39 +lon_0=-135 +a=6370 +b=6370 +towgs84=0,0,0 +no_defs'
# Set the CRS information obtained above
ds0.rio.write_crs(crs1, inplace=True)
# Re-create an xarray object
ds = xr.DataArray(
data= ds0.values,
dims=['y', 'x'],
coords={
'y': ds0.y.values,
'x': ds0.x.values
},
attrs={'_FillValue': np.nan}
)
# Re-project the xarray to WGS1984
ds.rio.write_crs("EPSG:4326", inplace=True)<br><br><br>
PythonHope it helps!!