# Using NetCDF4 Compression with CDMS<a id='top' class="tocSkip"> </a>


CDMS2 writes out data using the [NetCDF library](https://www.unidata.ucar.edu/software/netcdf/)

NetCDF4 allows for file compression, a good blog about NetCDF4 and compression can be found [here](http://www.unidata.ucar.edu/blogs/developer/entry/netcdf_compression)

From this blog:

*"The netCDF-4 libraries inherit the capability for data compression from the HDF5 storage layer underneath the netCDF-4 interface. Linking a program that uses netCDF to a netCDF-4 library allows the program to read compressed data without changing a single line of the program source code."*

and

*"Also, we're only dealing with lossless compression"*

This Notebook shows how to control NetCDF4 compression (shuffling/deflating) capabilities via cdms2.

The CDAT software was developed by LLNL. This tutorial was written by Charles Doutriaux. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

[Download the Jupyter Notebook](NetCDF4_Compression.ipynb)

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Preparing-The-Notebook" data-toc-modified-id="Preparing-The-Notebook-1">Preparing The Notebook<a id="prepare"></a></a></span></li><li><span><a href="#Default-Settings" data-toc-modified-id="Default-Settings-2">Default Settings<a id="defaults"></a></a></span></li><li><span><a href="#Turning-Off-Compression" data-toc-modified-id="Turning-Off-Compression-3">Turning Off Compression<a id="nocompress"></a></a></span></li><li><span><a href="#Pure-NetCDF3" data-toc-modified-id="Pure-NetCDF3-4">Pure NetCDF3<a id="netcdf3"></a></a></span></li><li><span><a href="#NetCDF4-non-classic" data-toc-modified-id="NetCDF4-non-classic-5">NetCDF4 non classic<a id="nc4_no_classic"></a></a></span></li><li><span><a href="#Using-Shuffling" data-toc-modified-id="Using-Shuffling-6">Using Shuffling<a id="shuffle"></a></a></span></li><li><span><a href="#Controling-Deflate-Level" data-toc-modified-id="Controling-Deflate-Level-7">Controling Deflate Level<a id="deflate"></a></a></span></li><li><span><a href="#Summarizing-All-Options" data-toc-modified-id="Summarizing-All-Options-8">Summarizing All Options<a id="summary"></a></a></span></li></ul></div>

# Preparing The Notebook<a id="prepare"></a>

In order to look at a NetCDF content the easiest way is to use [ncdump](https://www.unidata.ucar.edu/software/netcdf/netcdf-4/newdocs/netcdf/ncdump.html). The following function helps us do a line call within Python, for Notebook clarity.

We also prepare some random data

[Back To Top](#top)

In [None]:
from __future__ import print_function
import subprocess
import shlex
import numpy
import os
import io
import time

# Get file size
def size_it(filename):
    statinfo = os.stat(filename)
    return statinfo.st_size

# Write and return time
def dump(data,filename="example.nc"):
    start = time.time()
    f = cdms2.open(filename,"w")
    f.write(data,id="data")
    f.close()
    return time.time()-start,size_it(filename)

class HTML(object):
    def __init__(self,html):
        self.html = html
    def _repr_html_(self):
        return self.html


# Nice html output for ncdump
class NCINFO(object):
    def __init__(self, filename, variable=None, options=""):
        self.filename = filename
        self.variable = variable
        self.options = options
    def _repr_html_(self):
        out = self.nc_info()
        lines = []
        for l in out.split("\n"):
            for kw in ["chunk","deflate","classic","netcdf4","netcdf-4"]:
                if l.lower().find(kw)>-1:
                    l = "<b>{0}</b>".format(l)
            lines.append(l.replace("\t","&emsp;&emsp;"))
        return "{0}".format("<br>".join(lines))
    def nc_info(self):
        """calls ncdump on file
    Can opass a variable or optional ncdump arguments
    Default call `ncdump -hs filename`"""
        with io.BytesIO() as out:
            ncdumpOptions = "-hs {options}".format(options=self.options)
            if self.variable is not None:
                ncdumpOptions += "-v {variable}".format(self.variable)
            cmd = "ncdump {options} {file}".format(options=ncdumpOptions, file=self.filename)
            import pdb
            pdb.set_trace()
            out.write(bytes("Runnning {0}".format(cmd).encode("utf8")))
            cmd = shlex.split(cmd)
            p = subprocess.Popen(cmd,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
            o, e = p.communicate()
            out.write(b'-------')
            out.write(o)
            out.write(b'-------')
            out.write("File Size {0} bytes".format(size_it(self.filename)).encode("utf8"))
            return out.getvalue()
        
import requests
def download(fnm):
    r = requests.get("https://uvcdat.llnl.gov/cdat/sample_data/%s" % fnm,stream=True)
    with open(fnm,"wb") as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:  # filter local_filename keep-alive new chunks
                f.write(chunk)

download("clt.nc")
data = numpy.random.random((120,180,360))
# Random data do not compress well at all, switching to 0/1
data = numpy.greater(data,.5).astype(numpy.float)

# Default Settings<a id="defaults"></a>

By default cdms writes out data in NetCDF4 ***classic*** with no ***shuffling*** and a ***deflate*** level of 1

[Back To Top](#top)

To access the netcdf value used to write data out use the following commands:

In [None]:
import cdms2
print("NetCDF4? ",cdms2.getNetcdf4Flag())
print("NetCDF Classic?",cdms2.getNetcdfClassicFlag())
print("NetCDF4 Shuffling",cdms2.getNetcdfShuffleFlag())
print("NetCDF4 Deflate?",cdms2.getNetcdfDeflateFlag())
print("NetCDF4 Deflate Level?",cdms2.getNetcdfDeflateLevelFlag())

These values are read in at the time you **open** the file for writing

Note the **BOLD** lines

In [None]:
dump(data)
NCINFO("example.nc")

# Turning Off Compression<a id="nocompress"></a>

[Back to Top](#top)

We can use no compression by runnnig

In [None]:
value = 0
cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included
dump(data)
NCINFO("example.nc")

# Pure NetCDF3<a id="netcdf3"></a>

[Back To Top](#top)

All these options can either be turned to 0 to enable NetCDF3 (as the warning above shows). One can also use the single command:

In [None]:
cdms2.useNetcdf3()
# or for versions earlier than 2.12.2017.10.25
value = 0
cdms2.setNetcdfShuffleFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateFlag(value) ## where value is either 0 or 1
cdms2.setNetcdfDeflateLevelFlag(value) ## where value is a integer between 0 and 9 included
cdms2.setNetcdf4Flag(0)
dump(data)
NCINFO("example.nc")

# NetCDF4 non classic<a id="nc4_no_classic"></a>

[Back To TOp](#top)

We can also turn off the classic option for netcdf4

In [None]:
cdms2.setNetcdf4Flag(1)
cdms2.setNetcdfClassicFlag(0)
dump(data)
NCINFO("example.nc")

# Using Shuffling<a id="shuffle"></a>

[Back To Top](#top)

We can turn on/off shuffling

In [None]:
cdms2.setNetcdf4Flag(1)
cdms2.setNetcdfClassicFlag(0)
cdms2.setNetcdfShuffleFlag(1)
dump(data)
NCINFO("example.nc")

# Controling Deflate Level<a id="deflate"></a>

[Back To top](#top)

We can choose our deflate level (at the expense of time)

In [None]:
cdms2.setNetcdfShuffleFlag(0)
cdms2.setNetcdfDeflateFlag(1)
cdms2.setNetcdfDeflateLevelFlag(5)
dump(data)
NCINFO("example.nc")

# Summarizing All Options<a id="summary"></a>

[Back To Top](#top)

Let's try with a real life example

In [None]:
f=cdms2.open("clt.nc")
clt = f("clt")

html = "<table border='2'><tr><th>Deflate Level</th><th>NC3</th><th>NC4 Classic no shuffle</th><th>NC4 Classic shuffled</th><th>NC4 no shuffle</th><th>NC4 shuffled</th></tr>"

def addCell():
    t,s = dump(clt)
    return "<td align='center'>{:.2f}/{:d}</td>".format(t,s)

def nc4s():
    out = ""
    for classic in [1,0]:
        cdms2.setNetcdfClassicFlag(classic)
        for shuffle in [0,1]:
            cdms2.setNetcdfShuffleFlag(shuffle)
            out+=addCell()
    out+="</tr>"
    return out

# NetCDF3
html+="<tr><td align='center'>0</td>"
cdms2.useNetcdf3()
cdms2.setNetcdf4Flag(0)
html+=addCell()
cdms2.setNetcdf4Flag(1)
html+=nc4s()
cdms2.setNetcdfDeflateFlag(1)
for i in range(1,10):
    cdms2.setNetcdfDeflateLevelFlag(i)
    html += "<tr><td align='center'>{0}</td><td align='center'>N/A</td>".format(i)
    html += nc4s()
html+="<caption>Time To Write NetCDF File and size for various NC4 settings</caption></table>"
HTML(html)