# Exploring TC PRIMED, Chapter 1a: NetCDF Files
- Creators: Naufal Razin, Chris Slocum, and Kathy Haynes
- Affiliations: CIRA and NESDIS/STAR

---

## Overview
TC PRIMED uses NetCDF Groups to store data. Not all datasets stored in the NetCDF file format use NetCDF Groups. However, since TC PRIMED is a compilation of data from various sources, the NetCDF Groups functionality helps with data organization. In this notebook, you will learn how to read a TC PRIMED NetCDF file and retrieve the various information available in the different groups.

## Prerequisites
To successfully navigate and use this notebook, you should be familiar with:
- the basics of Python programming such as loading modules, assigning variables, and list/array indexing

## Learning Outcomes
By working through this notebook, you should be able to:
- understand the NetCDF file structure, particularly one that contains groups
- interact with (e.g., load and plot) data from a NetCDF file

## Background
[NetCDF](https://www.unidata.ucar.edu/software/netcdf/), or Network Common Data Form, "is a set of software libraries and machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data" (Unidata 2023). Scientific data stored in NetCDF files are meant to be self-describing in that the file should include information about the data it contains. This self-describing information is also known as attributes or metadata. NetCDF files are a type of multidimensional raster files, like GeoTIFF files.

[NetCDF Groups](https://www.unidata.ucar.edu/software/netcdf/workshops/2011/groups-types/GroupsIntro.html) are like directories or folders on your computer, except that they are contained within the NetCDF file. For example, you may have a file within a folder, and that folder is stored within a parent folder. The path to that file on your computer would be `parent_folder/child_folder/file`.

In this tutorial, you will learn how to access the attributes of the different groups in a TC PRIMED file, as well as learn how to load TC PRIMED variables within the groups.

## Software
This tutorial uses the Python programming language and packages. We will use:
- `netCDF4` to load the TC PRIMED file
- `numpy` for simple array operations

### Install Packages
Let's first check if we have the necessary Python packages to run this notebook. If we don't, let's install them.

In [None]:
import subprocess, sys
packages = ["netCDF4", "numpy"]
for package in packages:
    try:
        __import__(package)
    except ImportError:
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', package])

Now, let's load the modules in the packages (e.g., `Dataset`) or load the packages and assign a shorter object name for the packages (e.g., `import numpy as np`) for a cleaner use throughout the notebook.

In [None]:
# Load the Python packages we will use in this notebook
from netCDF4 import Dataset
import numpy as np

## Read File Online
Finally, let's retrieve information from the TC PRIMED file that we will use in this example. The TC PRIMED file will be from a Global Precipitation Measurment (GPM) satellite Microwave Imager (GMI) overpass of Hurricane Florence (2018). We will use the Python `netCDF4` and `requests` packages to read and retrieve the information directly from the TC PRIMED file available on an [Amazon Web Service S3 bucket](https://noaa-nesdis-tcprimed-pds.s3.amazonaws.com/index.html) as part of the [NOAA Open Data Dissemination program (NODD)](https://www.noaa.gov/information-technology/open-data-dissemination), without downloading the file, and store the information from the file in an "instance" type called `DS`.

Below, `NODD_URL` is a reflection of the TC PRIMED directory on NODD, and takes on the following form:

`<TC PRIMED AWS URL>/<TC PRIMED Version>/<TC PRIMED Version Type>/<tropical cyclone season>/<tropical cyclone basin>/<tropical cyclone number>/`

where

`<TC PRIMED AWS URL>` is the following URL https://noaa-nesdis-tcprimed-pds.s3.amazonaws.com

`<TC PRIMED Version>` is the version number, currently v01r00

`<TC PRIMED Type>` is the final or preliminary type

`<season>` is the four digit season. This is calendar year for the Northern Hemisphere. For the Southern Hemisphere, the year begins July 1, with calendar year plus one.

`<basin>` is the ocean basin.

AL – North Atlantic basin, north of the Equator;
SL – South Atlantic basin, south of the Equator;
EP – North East Pacific basin, eastward of 140 degrees west longitude;
CP – North Central Pacific basin, between the dateline and 140 degrees west longitude;
WP – North West Pacific basin, westward of the dateline;
IO – North Indian Ocean basin, north of the Equator between 40 and 100 degrees east longitude;
SH – South Pacific Ocean basin and South Indian Ocean basin.

`<annual number>` is the annual cyclone number from 01 to 49.

In [None]:
import requests

# Specify the URL to the TC PRIMED folder on NODD
NODD_URL = "https://noaa-nesdis-tcprimed-pds.s3.amazonaws.com/v01r00/final/2018/AL/06/"

# Specify the name of the file we will use from the TC PRIMED folder on NODD
FILE_NAME = "TCPRIMED_v01r00-final_AL062018_GMI_GPM_025677_20180905051405.nc"

# Join NODD_URL and FILE_NAME to produce a complete link
# Retrieve the contents of the TC PRIMED file from the complete link
url_response = requests.get(NODD_URL + FILE_NAME)

# Load the contents of the TC PRIMED file in an "instance" called DS
DS = Dataset(FILE_NAME, memory=url_response.content)

As we have mentioned above, when reading a NetCDF file in Python, information is stored in an "instance" type. Let's first look at the instance of the file at the "root" group.

The root group is the outer-most "directory" or "folder" in a NetCDF file. Its instance gets automatically loaded when you load a NetCDF file in Python. For datasets that do not use NetCDF Groups, all variables would be stored in the root group. However, in TC PRIMED, the root group stores only the "global" attributes &mdash; information about the whole file &mdash; and the different sub-groups.

Let's print the root group instance.

In [None]:
# DS would automatically contain information from the root group
# Print the instance of the file from the root group
print(DS)

The printout for the root group above shows all of the global attributes &mdash; e.g., dataset title, file ID, product version, etc. You can simply glean those information from the output above.

Having gleaned the global attributes above, you can directly load a particular attribute from the root group using the `getncattr` function.

In [None]:
# Load the "summary" attribute from the root group as root_summary
root_summary = DS.getncattr("summary")

# Print root_summary
print(root_summary)

<div class="alert alert-block alert-success">
<h3>Exercise 1</h3>
Using the example above, uncomment the template below, and change the code as necessary to load the "title" attribute from the root group.
</div>

<div class="alert alert-block alert-info">
<b>Hint:</b> If you are unsure about what attributes are available, print out the root group instance again.
</div>

In [None]:
# Load the "title" attribute from the root group as root_title
#root_title = DS.getncattr("insert_attr_here")

# Print root_title
#print(root_title)

Awesome! Now, let's step back and look at the root group again, printed out below

In [None]:
# Print the instance of the file from the root group
print(DS)

Notice at the end of the printout, the available "dimensions," "variables," and "groups" are listed for the root group. **These are not part of the TC PRIMED file metadata, but are part of the standard output of the Python netCDF4 package as additional information**. Nonetheless, we can use that information. Under groups, six groups are listed. They are:
- `overpass_metadata`
- `overpass_storm_metadata`
- `passive_microwave`
- `GPROF`
- `radar_radiometer`
- `infrared`

<div class="alert alert-block alert-danger">
<b>Be careful.</b> The <code>radar_radiometer</code> group is only available for satellites with precipitation radars, such as the TRMM and GPM. Other groups are available across the different sensors and satellites in TC PRIMED.
</div>

Now, let's access the instance of the `passive_microwave` group.

In [None]:
# Using the file instance, load the passive_microwave group as a
# group instance called passive_microwave_group
passive_microwave_group = DS["passive_microwave"]

# Print the passive_microwave group instance
print(passive_microwave_group)

Notice, as in the root group, the printout above indicates the dimensions, variables, and groups that are available in the `passive_microwave` group. In this example, you have
- a `time` dimension of length 1
- a `time` variable with a `time` dimension; therefore, a `time` variable of length 1
- two sub-groups: `S1` and `S2`

Let's load and print out the `time` variable instance.

In [None]:
# Using the file instance, load the variable instance for time
# in the passive_microwave group by specifying its "path" in the file
pm_time_instance = DS["passive_microwave/time"]

# Print the variable instance of passive microwave time
print(pm_time_instance)

From the printout of the time variable instance, you can see information on the `time` variable. As you have seen from printing out the passive microwave group instance above, `time` is a one-dimensional array with one element (`current shape = (1,)`). Let's first retrieve the `long_name` attribute of this time instance using a function we've already used above, `getncattr`.

In [None]:
# Using the file instance, load the variable instance for time
# in the passive_microwave group by specifying its "path" in the file
# Then, retrieve the long_name attribute using getncattr
pm_time_long_name = DS["passive_microwave/time"].getncattr("long_name")

# Print the time variable long_name
print(pm_time_long_name)

Finally, let's retrieve the `time` variable itself from the `time` variable instance. To do so, load the variable from the `time` variable instance using `[:]`

In [None]:
# Using the file instance, load the time variable from the
# passive_microwave group by specifying its "path" in the file and
# using [:] to load the variable
pm_time = DS["passive_microwave/time"][:]

# Print the passive microwave time variable
print(pm_time)

You've loaded a NetCDF variable! As you have seen in the cells above, it is a one-dimensional array with one element. The `time` variable in TC PRIMED files are in the units of seconds since 1970-01-01T00:00:00. We will briefly discuss this `time` unit in Chapter 1c.

For now, let's go back to the `passive_microwave` group instance.

In [None]:
# Using the file instance, load the passive_microwave group as a
# group instance called passive_microwave_group
passive_microwave_group = DS["passive_microwave"]

# Print the passive_microwave group instance
print(passive_microwave_group)

<div class="alert alert-block alert-success">
<h3>Exercise 2</h3> Notice that within the <code>passive_microwave</code> group, there are two additional groups called <code>S1</code> and <code>S2</code>. We will discuss these groups in more detail in Chapter 1b. But for now, using the knowledge you've obtained about NetCDF groups, instances, and variables, uncomment the code below and change the code as necessary to print out the <code>S1</code> group instance from the <code>passive_microwave</code> group.

In [None]:
# Using the file instance, load the S1 group within the passive_microwave
# group as an instance called S1_group
#S1_group = DS["insert_S1_group_here"]

# Print the S1 group instance
#print(S1_group)

From the printout of the `S1` group instance above, you should see two dimensions
- `scan` with length 171
- `pixel` with length 221

You should also see a list of variables such as `latitude` and `longitude`, all of which should have a `scan` dimension or both `scan` and `pixel` dimension.

<div class="alert alert-block alert-success">
<h3>Exercise 3</h3> Using the knowledge you've obtained about NetCDF groups, instances, and variables, uncomment the template below and change the code as necessary to load and print out the <code>latitude</code> variable.
</div>

<div class="alert alert-block alert-info">
<b>Hint:</b> When loading variables, don't forget to use <code>[:]</code>

In [None]:
# Using the file instance, load the latitude variable from the
# passive_microwave group and S1 sub-group by specifying its "path"
# in the file, and using [:] to load the variable
#S1_latitude = DS["insert_latitude_var_here"]

# Print the latitude variable from the passive_microwave group and
# the S1 sub-group
#print(S1_latitude)

When loading NetCDF variables, the `[:]` operator loads the full array of the variable. But, what if you only wanted the first 10 entries of the `scan` dimension? You can refer to the dimension information for each variable by first printing out the group or variable instance, like you've done in the cells above. Then, simply supply the read routine with the appropriate index. Let's look at an example by printing out the size of the `latitude` variable being loaded.

In [None]:
# Load the full array of the latitude variable using only [:]
S1_full_latitude = DS["passive_microwave/S1/latitude"][:]

# Print the shape of the full latitude array
print(S1_full_latitude.shape)

# Load a subset of the latitude variable for the first 10 entries of the
# scan dimension and all of the pixel dimension
S1_subset_latitude = DS["passive_microwave/S1/latitude"][np.arange(0,10,1),:]

# Print the shape of the latitude variable subset
print(S1_subset_latitude.shape)

<div class="alert alert-block alert-success">
<h3>Exercise 4</h3> Uncomment the template below and change the code as necessary to load the first 20 entries of the <code>scan</code> dimension and the first 10 entries of the <code>pixel</code> dimension for the latitude variable. Then, print out the shape of the variable array.

In [None]:
# Load a subset of the latitude variable for the first 20 entries of the
# scan dimension and the first 10 entries of the pixel dimension
#S1_subset_latitude = DS["insert_latitude_var_here"]

# Print the shape of the latitude variable subset
#print(S1_subset_latitude.shape)

## Close the File
When loading data from a NetCDF file, **always remember to close the file**. A best practice would be to close the file immediately after loading the variable or attribute of interest. However, since we're loading various variables and attributes throughout this notebook, we will close the file at the end of this notebook using the command below.

In [None]:
DS.close()

## Final Thoughts
There are other ways to query NetCDF files, like using [Unidata's NetCDF Utilities](https://docs.unidata.ucar.edu/nug/current/netcdf_utilities_guide.html). However, in this tutorial, you learned how to use the Python `netCDF4` package to:
- retrieve information from a remote NetCDF file
- print out NetCDF instances
- obtain attributes from NetCDF instances
- retrieve a variable from NetCDF instances
- navigate groups in a NetCDF file
- load only a subset of data from a NetCDF variable

These are crucial steps for you to get familiarized with the structure of TC PRIMED files before you move on to conducting analyses on TC PRIMED data.

## Data Statement
- Razin, Muhammad Naufal; Slocum, Christopher J.; Knaff, John A.; Brown, Paula J. 2023. Tropical Cyclone PRecipitation, Infrared, Microwave, and Environmental Dataset (TC PRIMED). v01r00. NOAA National Centers for Environmental Information. https://doi.org/10.25921/dmy1-0595.

## References
- Unidata, 2023: Network Common Data Form (NetCDF). Accessed 13 June 2023, https://www.unidata.ucar.edu/software/netcdf/.

## Metadata
- Language / package
    - Python
    - netCDF4
    - numpy
- Application keywords
    - NetCDF
    - NetCDF Groups
- Geophysical keywords
    - Tropical Cyclones