pseudonetcdf tutorial

Getting Started with PseudoNetCDF

Problem Statement

Air quality data is usually to big and complex for Excel. That means your data is difficult to interact with. For starters, it is 4-dimensional which isn't well represented by simple rows and columns. To add insult to injury, the data isn't human readable. Instead you need tools just to make the data comprehensible. Of course, you'll also want features like sum, count, and average. Normally, these would come from Excel... we already talked about that.

The Solution

Scientists got together many years ago and decided they needed to standardize data so that common tools could be made. They agreed on a common data language and created visualization tools.

Still a Problem

We still have lots of other formats.

New Solution: Make believe

Overview

PseudoNetCDF makes everything act like NetCDF. That's useful because there are lots of NetCDF tools that you could leverage. In addition, PseudoNetCDF makes several otherwise difficult things very easy.

  • Converting your data to NetCDF
  • Spatial/Temporal subsetting
  • Spatial/Temporal aggregation
  • Plotting

Look at the data

pncdump -f bpch ctm.bpch

The pncdump command is called with the -f bpch option. This tells PseudoNetCDF that the file being read (i.e., ctm.bpch) is a Binary Punch File. The result is returned to the Linux standard output. By default, this is you screen. Thus, the result will spew out for you to see.

If you want to capture this result in a file, you can redirect the output (see below). The output is much larger than the original file, because numbers have been converted from binary 32-bit to text -- 8-bits per character.

pncdump -f bpch ctm.bpch > ctm.bpch.txt

More commonly, viewing results are used to take a quick look. Maybe you are just looking at the header (not all the data). Then use the --header option.

Or you want to look at the data for variable X. Use just that variable with the -v VARNAME option. VARNAME would be the full variable name. Not sure what it will be called? Look for it in the --header first.

Convert data to NetCDF

Converting the data to NetCDF is a good option because NetCDF is a standard file format and is viewable using many tools. For example, this can be viewed with Panoply (free from NASA) or ArcGIS (ESRI).

How do I do it?

pncgen -f bpch ctm.bpch ctm.bpch.nc

Slicing and Aggregating

It is common to be interested in just a particular part of the model. You may only be interested in the surface, just the pbl, just the troposphere, etc. PseudoNetCDF supplies tools to extract a subset of any dimension (time, layer, latitude, longitude). The -s dim,start[,stop[,step]] option allows you to select a "slice" of data for dimension dim that starts at "element" start ends before stop and only selects every step item.

One tricky thing is that start and stop are 0-based index. A 0-based says that year 1 is indexed using 0. That's conceptually because from day 1 to day 365, you are 0 years complete. The same is true for spatial dimensions. From 0 degrees to 3.999 degrees north, you are within the 0th 4 degree grid cell.

Assume ctm.bpch has monthly data for 5 years. The following command would would output March for all 5 years.

pncdump -f bpch ctm.bpch -s time,2,60,12 | less

and the next command would save that output to a new file.

pncgen -f bpch ctm.bpch -s time,2,60,12 ctm.bpch.marches.nc

The next option you have is to reduce a dimension (-r dim,aggfunc)by applying an aggregate statistic. Aggregation can be done with min, max, mean, std, etc. This reduces the dimension by replacing the individual elements with a statistic of all elements. Aggregation is always done after slicing.

So, the following comand would output an average march

pncgen -f bpch ctm.bpch -s time,2,60,12 -r time,mean ctm.bpch.march.nc

The -s and -r options can be applied to any dimension, and they can be applied many times in one call. Multiples are applied sequentially (slices first).

The following command would take marches, and then take only the surface layer. Next, it would average time. Finally, it would average across longitude.

pncgen -f bpch ctm.bpch -s time,2,60,12 -r time,mean -s layer,0 -r longitude,mean ctm.bpch.march.surface.zonalmean.nc

Order of operations matters!

Do math in dumped output or save result to new file

This requires name mangling. Mangling is to "severely mutilate, disfigure, or damage by cutting, tearing, or crushing". I didn't make this name up. Some names in GEOS-Chem use symbols that are reserved for math. For instance A+B, is an expression and can be evaluted to yield a result. So, A+B cannot be a variable name. If it was, we'd convert it to AplusB. A-B becomes AhyphenB. etc.

The key symbols for GEOS-Chem that will be the hyphen (-)and the dollar sign ($). These symbols are used in "diagnostic group names." These symbols, however, are not allowed in Python (the language of PseudoNetCDF). So, we must convert hyphens and dollar signs to words (-: hyphen, $: dollar).

So, calculating O3 from Ox and NOx that are both in the IJ-AVG-$ group:

To see:

pncdump -f "bpch,nogroup=True" ctm.bpch --expr="O3=Ox-NOx"

To save:

pncdump -f "bpch,nogroup=True" ctm.bpch ctm.bpch_withO3.nc --expr="O3=Ox-NOx"

Exercise

Using the test file (/scratch/lfs/barronh/geos-chem/simulations/4x5/geos5/SOA/ctm.bpch.2007010100). This file has data from 2007-01-01 00:00:00 to 2007-07-01 00:00:00 where each time element is a 1 month average.

  1. Use pncdump to calculate the mean dimethyl sulfide surface concentration (CH3SCH3; aka DMS).
  2. Calculate the total atmospheric concentration

C_i [kg] = X_i [ppb] * 1e-9 [ppb/vmr] * C_a [molec/cm3] / Avogadro [molec/mole] * 1e6 [cm3 / m3] * AREA [m2] * HEIGHT [m] * 0.064 [kgS/mole DMS]

KGDMS = IJ-AVG-$_DMS * 1e-9 * BXHGHT-$_-N(AIR) / Avogadro * 1e6 * AREA * BXHGHT-$_BXHEIGHT * 0.064

KGDMS = DMS * 1e-9 * N(AIR) / Avogadro * 1e6 * AREA * BXHEIGHT * 0.064

-