cdms2_101 Tutorial

download iPython Notebook

Basic CDMS Tutorial

Preparing your environment

Installing cdms2

conda create -n cdms -c conda-forge cdms
source activate cdms

Bringing in some data

from __future__ import print_function
import cdat_info
import os, sys

data_path = cdat_info.get_sampledata_path()
cdat_info.download_sample_data_files(os.path.join(sys.prefix,"share","cdms2","test_data_files.txt"),data_path)

Opening and querying a file for reading

# Open a sample file
import cdms2

filename = os.path.join(data_path,"clt.nc")
f = cdms2.open(filename)
# Query variables in the file
var = f.listvariable()
print("variables in the file:",var)
variables in the file: ['clt', 'u', 'v']
# Query dimensions in the file
dims = f.listdimension()
print("Dimensions in the file:",dims)
Dimensions in the file: ['plev', 'latitude1', 'latitude2', 'time1', 'longitude1', 'longitude2', 'longitude', 'time', 'latitude', 'plev1', 'time2']
# Query file attributes
attr = f.listglobal()
print("File attributes:",attr)
File attributes: ['model', 'center', 'comments', 'Conventions']

Querying Variables (in file)

You can further query the variables in the file without having to read them in memory

To create a file variable simply use square bracket: [ and ]

clt = f["clt"]  # This is a file variable, not in memory
# Print variable info to screen
clt.info()
*** Description of Slab clt ***
id: clt
shape: (120, 46, 72)
filename: /Users/doutriaux1/anaconda2/envs/2.12-nox/share/uvcdat/sample_data/clt.nc
missing_value: None
comments: YONU_AMIP1
grid_name: YONU4X5
grid_type: gaussian
time_statistic: average
long_name: Total cloudiness
units: %
Grid has Python id 0x119056c50.
Gridtype: gaussian
Grid shape: (46, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x119056750
** Dimension 2 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      long_name: Latitude
   Python id:  0x119056850
** Dimension 3 **
   id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      long_name: Longitude
   Python id:  0x1190a2450
*** End of description for clt ***
# Variable shape
sh = clt.shape
print("The variable shape is:",sh)
The variable shape is: (120, 46, 72)
# Variable id
name = clt.id
print("Variable id/name:",name)
Variable id/name: clt
# The variable dimensions
axes = clt.getAxisList()
print("variable dimensions:",axes)
variable dimensions: [   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x119056750
,    id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      long_name: Latitude
   Python id:  0x119056850
,    id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      long_name: Longitude
   Python id:  0x1190a2450
]
# Variable attributes
attributes = clt.attributes
print("Variable attributes:",attributes.keys())
Variable attributes: ['time_statistic', 'comments', 'long_name', 'grid_name', 'units', 'missing_value', 'grid_type']

Dimensions

# Determine if an axis is time
for a in axes:
    if a.isTime():
        print("Axes %s is a time axis" % a.id)
    else:
        print("Axes %s is not a time axis" % a.id)
Axes time is a time axis
Axes latitude is not a time axis
Axes longitude is not a time axis
# Similar functions exist for level, latitude and longitude
for a in axes:
    print(a.isLatitude(), a.isLongitude(), a.isLevel())
False False False
True False False
False True False
# Similarly we can get one of these 4 types of dimension automatically
aTime = clt.getTime()
lat = clt.getLatitude()
lon = clt.getLongitude()
# if such dimension does not exists None is returned
lev = clt.getLevel()
print("Level dim:",lev)
Level dim: None
# Any dimension can also by retrieved by its index
dim0 = clt.getAxis(0)
print("The first dim name is:",dim0.id)
The first dim name is: time
# Dimension information
dim0.info()
   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x119056750
# Accessing axis values
print("Latitude values:",clt.getLatitude()[:])
Latitude values: [-90. -86. -82. -78. -74. -70. -66. -62. -58. -54. -50. -46. -42. -38. -34.
 -30. -26. -22. -18. -14. -10.  -6.  -2.   2.   6.  10.  14.  18.  22.  26.
  30.  34.  38.  42.  46.  50.  54.  58.  62.  66.  70.  74.  78.  82.  86.
  90.]

Time dimensions

cdms is really good at dealing with times (see decdicated cdtime jupyter notebook for more on time)

# Rather than raw (in file) values or indices it can be usefull to show/manipulate time 
# as 'component time'
tim = clt.getTime()
tc = tim.asComponentTime()
print("First 2 times are:",tc[:2])
# or 'relative times'
tr = tim.asRelativeTime("days since 2017")
print("first 2 times in days since 2017:", tr[:2])
First 2 times are: [1979-1-1 0:0:0.0, 1979-2-1 0:0:0.0]
first 2 times in days since 2017: [-13880.000000 days since 2017, -13849.000000 days since 2017]

Retrieving data

# Whole
clt =f("clt") # parentheis means read in memory
print("Shape:",clt.shape)
Shape: (120, 46, 72)
# Partial, based on values in file
clt = f("clt",latitude=(0,90),longitude=(-180,180))
print("Shape:",clt.shape)
Shape: (120, 23, 73)
# Based on indices
clt = f("clt",time=slice(0,12))
print("Shape:",clt.shape)
Shape: (12, 46, 72)
# time can be retirieved based on actual dates (provided units are good in file)
clt = f("clt",time=("1980","1983-12-31"))
print("Shape:",clt.shape)
Shape: (48, 46, 72)
# Data can also be read directly from a file variable
CLT = f["clt"]
clt = CLT(time=("1980","1984-12-31"),latitude=(0,90),longitude=slice(0,None))
print("Shape:",clt.shape)
Shape: (60, 23, 72)
# Or from an exisitng variavle
clt2 = clt(time=slice(0,4))
print("Shape:",clt2.shape)
Shape: (4, 23, 72)
# data can also be reordered based on dimensions
clt = f("clt",order="xty")
print("Shape:",clt.shape)
Shape: (72, 120, 46)
# or use dimension indices
clt=f("clt", order="210")
print("Shape:",clt.shape)
Shape: (72, 46, 120)
# or use dimension names
clt = f("clt",order="(longitude)(time)(latitude)")
print("Shape:",clt.shape)
Shape: (72, 120, 46)

Manipulating Data

cdms variables are subclass of numpy, so for the most part anything you can do with numpy can be done with cdms variables

# Extract same month every years (from monthly data)
clt=f("clt")
subset = clt[::12]
print("Shape:",subset.shape)
Shape: (10, 46, 72)
# cdms variable can be converted to raw numpy
nparray = clt.filled()
print(type(clt),type(nparray))
<class 'cdms2.tvariable.TransientVariable'> <type 'numpy.ndarray'>
# or masked arrays
maarray = clt.asma()
print(type(clt),type(maarray))
<class 'cdms2.tvariable.TransientVariable'> <class 'numpy.ma.core.MaskedArray'>

Creating MV2 and storing them in files

import MV2
# Create a cdms variable from a numpy (or numpy.ma) array
myvar = MV2.array(nparray)
myvar.id = "newclt"
myvar.info()
*** Description of Slab newclt ***
id: newclt
shape: (120, 46, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: N/A
grid_type: N/A
time_statistic: 
long_name: 
units: 
tileIndex: None
No grid present.
** Dimension 1 **
   id: axis_0
   Length: 120
   First:  0.0
   Last:   119.0
   Python id:  0x119056f10
** Dimension 2 **
   id: axis_1
   Length: 46
   First:  0.0
   Last:   45.0
   Python id:  0x119056d10
** Dimension 3 **
   id: axis_2
   Length: 72
   First:  0.0
   Last:   71.0
   Python id:  0x119056a50
*** End of description for newclt ***
# We can . add axes from other variables
myvar.setAxisList(clt.getAxisList())
myvar.info()
*** Description of Slab newclt ***
id: newclt
shape: (120, 46, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: 
units: 
tileIndex: None
Grid has Python id 0x1190925d0.
Gridtype: generic
Grid shape: (46, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Other axis attributes:
      calendar: gregorian
      axis: T
      realtopology: linear
   Python id:  0x119056ed0
** Dimension 2 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      long_name: Latitude
      axis: Y
      realtopology: linear
   Python id:  0x119056550
** Dimension 3 **
   id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      modulo: 360.0
      realtopology: circular
      long_name: Longitude
      topology: circular
      axis: X
   Python id:  0x119056b90
*** End of description for newclt ***
# we can also add axes one at a time
for i in range(myvar.ndim):
    ax = clt.getAxis(i)
    print("Setting axis %i to %s" % (i,ax.id))
    myvar.setAxis(i,ax)
myvar.info()
Setting axis 0 to time
Setting axis 1 to latitude
Setting axis 2 to longitude
*** Description of Slab newclt ***
id: newclt
shape: (120, 46, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: 
units: 
tileIndex: None
Grid has Python id 0x1190925d0.
Gridtype: generic
Grid shape: (46, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  months since 1979-1-1 0
   Length: 120
   First:  0.0
   Last:   119.0
   Other axis attributes:
      calendar: gregorian
      axis: T
      realtopology: linear
   Python id:  0x119056ed0
** Dimension 2 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      long_name: Latitude
      axis: Y
      realtopology: linear
   Python id:  0x119056550
** Dimension 3 **
   id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      modulo: 360.0
      realtopology: circular
      long_name: Longitude
      topology: circular
      axis: X
   Python id:  0x119056b90
*** End of description for newclt ***
# We can also create axes manually
newtime = cdms2.createAxis(range(120))
newtime.id = "time" # name of dimension
newtime.designateTime()  # tell cdms to add attributes that make it time
newtime.units = "months since 2017"
myvar.setAxis(0,newtime)
myvar.info()  # Notice tikme changed
*** Description of Slab newclt ***
id: newclt
shape: (120, 46, 72)
filename: 
missing_value: 1e+20
comments: 
grid_name: <None>
grid_type: generic
time_statistic: 
long_name: 
units: 
tileIndex: None
Grid has Python id 0x1190925d0.
Gridtype: generic
Grid shape: (46, 72)
Order: yx
** Dimension 1 **
   id: time
   Designated a time axis.
   units:  months since 2017
   Length: 120
   First:  0
   Last:   119
   Other axis attributes:
      calendar: gregorian
      axis: T
   Python id:  0x1190a2150
** Dimension 2 **
   id: latitude
   Designated a latitude axis.
   units:  degrees_north
   Length: 46
   First:  -90.0
   Last:   90.0
   Other axis attributes:
      long_name: Latitude
      axis: Y
      realtopology: linear
   Python id:  0x119056550
** Dimension 3 **
   id: longitude
   Designated a longitude axis.
   units:  degrees_east
   Length: 72
   First:  -180.0
   Last:   175.0
   Other axis attributes:
      modulo: 360.0
      realtopology: circular
      long_name: Longitude
      topology: circular
      axis: X
   Python id:  0x119056b90
*** End of description for newclt ***

Saving data

 # By default cdms2 will save files in NetCDF4 compressed with no shuffle by defalted at level 1
print("Default Shuffle:",cdms2.getNetcdfShuffleFlag())
print("Default Deflate:",cdms2.getNetcdfDeflateFlag())
print("Default Deflate Level:",cdms2.getNetcdfDeflateLevelFlag())
Default Shuffle: 0
Default Deflate: 1
Default Deflate Level: 1
# Let's turn it all off so we get NetCDF3 classic files
value = 0
cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included
print("Shuffle:",cdms2.getNetcdfShuffleFlag())
print("Deflate:",cdms2.getNetcdfDeflateFlag())
print("Deflate Level:",cdms2.getNetcdfDeflateLevelFlag())
Shuffle: 0
Deflate: 0
Deflate Level: 0
# Let's open a file for writing
f2 = cdms2.open("mydata.nc","w") # "w" means open file for writing and erase if already here
f2.write(myvar)
f2.close()