netcdf tutorial

NetCDF

Overview:

NetCDF stands for Network Common Data Format. Scientists decided it is important to have standards for data storage and produced NetCDF to this end. NetCDF provides data structure using 3 components:

  • attributes - used to store data about the data; often used to describe information about a specific variable (local attribute), but can be used to provide information about a whole dataset (global attritbute)
  • dimensions - may be used to represent physical dimensions (e.g. height, time, latitude); dimensions must be named and given a length
  • variables - used to store the bulk of the data; must have name, data type, and shape described by dimensions

For a more detailed look at NetCDF, see the Unidata NetCDF tutorial: http://www.unidata.ucar.edu/software/netcdf/docs_beta/netcdf-tutorial.html\

A simple example would be a file that contained terrain elevation [note: model vertical coordinates are above ground level (AGL)] for a model domain. Each grid cell in the modeled domain would have an above sea level value for the terrain. For this example, our modeled domain will have 6 rows, 10 columns, and 1 vertical layer. Rows are numbered ascendingly from south to north and columns are numbered ascendingly from west to east.

Example 1: Writing

(You will need the Python netCDF4 library to continue. Open Python and type: import netCDF4. If you don’t get an error, you’re all set. If an error appears, ask for help.)

a. Task

Use Common Data form Language (CDL - a convenient way of describing NetCDF datasets; see section 2.1.3 in the Unidata NetCDF tutorial) and Python to create two identical NetCDF files.

b. Steps

  1. Copy and paste CDL from Listing 1 (below) into a text editor (e.g. BBedit, Nano, TextEdit) and save as terrain.cdl.
  2. Open Terminal and type: ncgen -o terrain.ncgen.nc terrain.cdl
  3. Copy and paste Listing 2 (below) into a text editor and save as terrain.py.
  4. Run the python file in Terminal by typing: python terrain.py

c. Discussion

In steps 2 and 4, you should have created two identical netCDF files. How can you be sure the .nc files are in fact identical? There are two ways; one involves the Unix "diff" command, the other is a tedious line-by-line comparison of two CDL files.\

Let’s start with diff. Open a terminal window and navigate to whatever directory contains terrain.ncgen.nc and terrain.python.nc. In short, diff will find the difference between two files. Type: diff -s file1 file2. Why do the binary .nc files differ? The files differ because the default writing format of netCDF4 is NETCDF4, but the default format of ncgen is NETCDF3_CLASSIC. Edit the python script to add the "format" keyword with a value of "NETCDF3_CLASSIC" to the Dataset statement (i.e., after the ’w’). Re-run the script and re-diff the files. They should now be identical. Even though the original files were different, their data was identical. We can confirm this be converting the data to text and comparing. (Note that this next method is generally not how you would ever compare to files. But this exercise will introduce a new command line skill - redirecting.)\

Because netCDF files are written in binary, the data isn’t directly human readable. We can, however, use the ncdump command to convert the file from netCDF to CDL, which is a very human readable format. ncdump is the inverse of the ncgen command used in step 2 above. Type this at your command prompt: ncdump terrain.ncgen.nc. What happened? Now try this: ncdump terrain.python.nc. Do the two outputs look the same? It isn’t easy to compare the two different files in the terminal. We’ll now "redirect" ncgen’s outputs to a new file so we can open them. Type these two commands:\

ncdump terrain.ncgen.nc $>$ terrain.ncgen.cdl\

ncdump terrain.python.nc $>$ terrain.python.cdl\

Think of the inequality sign as an arrow that "directs" the output of ncdump to a new file. Now you can locate the two new .cdl files in the Finder and compare them directly. They should be identical so far as you can tell. So why did the diff command say the files were different? Don’t worry about it...or ask a PhD student.

Example 2: Reading

a. Task

Use Python to read and compare two identical NetCDF files. For this example, we will use NetCDF files from Example 1.

b. Steps

  1. Create a Python file that reads in xy data and compares values (see Listing 4 below).
  2. Run the python file interactively (i.e. python -i filename.py).
  3. Explore NetCDF file in interactive mode.

Review Exercises

Exercise 1:

From scratch, create a NetCDF file for the instantaneous hours of the day (0-24) using CDL.

  1. Create a CDL text file with one dimension, one variable, and a data statement that fills in the data.
  2. Use ncgen to create a NetCDF file.
  3. Use ncdump to create a copy of the CDL file. Type: ncdump filepath $>$ new filepath
  4. Compare your CDL and ncdump CDL using the UNIX "diff" command (or just compare the files in a text editor).

Questions 1:

  1. How does the ncdump version of CDL differ from yours (if it does)?
  2. Can you add a global attribute called “creator” with your name as the value?

Exercise 2:

Add data for a second time to a CDL dataset.

  1. Create a CDL file from this . (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf/The-NetCDF-Data-Model.html#The-NetCDF-Data-Model)
  2. Convert unmodified CDL to NetCDF.
  3. Modify the CDL to include rh values for a second hour (use 2 times the first).

Questions 2:

  1. Is there anything wrong with the temp variable?
  2. What is wrong with the 2nd hour relative humidity values?

Exercise 3:

Create identical NetCDF files using CDL/ncgen and Python. Use “diff -s file1.nc file2.nc” to check that files are identical. Follow steps from Example 1 for these other urls:

  1. . (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-tutorial.html#sfc_005fpres_005ftemp)
  2. . (http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-tutorial.html#pres_005ftemp_005f4D)

*Numpy arrays have reshape and swapaxes methods that will be very useful. try arange(9).reshape((3,3)) and then arange(9).reshape((3,3)).swapaxes(0,1) to see why.