{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "uKpOdQvxbeB4" }, "source": [ "# MagNet: Model the Geomagnetic Field Chapter 2\n", "## Explainable AI (XAI)\n", "\n", "![HELIO_GRAPHIC_URL](https://ngdc.noaa.gov/geomag/img/challenge-banner.png \"HELIO\")\n", "\n", "* Creator(s): Rob.Redmon@noaa.gov (1,2), Manoj.C.Nair@noaa.gov (2,3), LiYin.Young@noaa.gov (2,3)\n", "* Affiliation(s):\n", " * 1. National Centers for Environmental Information ([NCEI](https://www.ncei.noaa.gov/)), National Oceanic and Atmospheric Administration (NOAA),\n", " * 2. NOAA Center for Artificial Intelligence ([NCAI](https://noaa.gov/ai)),\n", " * 3. Cooperative Institute for Research for Environmental Sciences [CIRES](https://cires.colorado.edu/).\n", "* History\n", " * 2023-08: Content reorganized for the [NCAI](https://noaa.gov/ai) Learning Journey library. No significant technical changes from previous version.\n", " * 2022-06: Initial notebook version developed for the [TAI4ES 2022 Summer School](https://www2.cisl.ucar.edu/events/tai4es-2022-summer-school).\n", "* Acknowledgements:\n", " * Original funding support was provided by the NCEI Innovates program.\n", " * Post-model inference and evaluation were created for the NCAR and [AI2ES](https://www.ai2es.org/) [TAI4ES 2022 Summer School](https://www2.cisl.ucar.edu/events/tai4es-2022-summer-school)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "Chapter 2 \"Explainable AI (XAI)\", of the two notebook series, focuses on evaluating the benchmark model developed in Chapter 1 \"Develop the LSTM Model\" for predicting the disturbance-storm-time (Dst) index space weather storm indicator." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Prerequisites\n", "\n", "* Chapter 1: The first chapter builds the model with benchmark or user defined hyper parameter settings.\n", "* Python intermediate proficiency for data science: SciPy, Pandas, NumPy, MatplotLib,\n", "* Machine Learning intermediate experience: ML for supervised modeling of time series data using neural networks. We use the Keras framework for TensorFlow in this notebook to create a Long Short-Term Memory (LSTM) recurrent neural network,\n", "* Space Weather introductory knowledge: Basic familiarity of the Solar Wind and the Disturbance Storm Time activity index (Dst). For introductory materials on space weather and its effects on the technological systems we rely on, two resources are:\n", " * [NASA's Space Place](https://spaceplace.nasa.gov/spaceweather/),\n", " * [NOAA's Space Weather Prediction Center (SWPC)](https://www.swpc.noaa.gov/), in particular their community dashboards." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Targeted Level\n", "This notebook is targeted towards learners with beginner to intermediate experience in space weather topics, and intermediate experience in modeling time series data with neural networks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning Outcomes\n", "\n", "By engaging in this notebook series, the learner will get introductory experience with Explainable AI (XAI) topics through:\n", "1. Evaluating input (feature) relative importance to model performance via the \"permutation importance\" technique,\n", "2. Evaluating the trained model on classical space weather events." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Info: \n", "In this notebook, you'll notice color-coded boxes, which provide hints, exercises, and warnings. Here is the color-coding breakdown:\n", "
\n", "\n", "* Hint/Tip/Info: Helpful context and guidance, as a blue alert-info box\n", "* Exercise: Interactive activity / exercise, as a green alert-success box\n", "* Be Aware: Caution / Caveat, as a yellow alert-warn box\n", "* Danger: Conditions under which code may create an error, as a red alert-danger box" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tutorial Material" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Info: See Chapter 1 in the MagNet LSTM series.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Software\n", "This notebook has been tested using the following environments:\n", "* Google Colaboratory (Python 3.10.12) with no need to install additional packages.\n", " * CPU, GPU, TPU tested\n", "* Anaconda (Python 3.9.16) with the following key package versions:\n", " * Keras TensorFlow 2.8.0\n", " * Pandas 1.5.3\n", " * Matplotlib 3.7.1\n", " * CPU, and GPU tested" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": { "id": "She8zOM6Evvk" }, "source": [ "### Acquire Data" ] }, { "cell_type": "markdown", "metadata": { "id": "hRXm1TTrEvvm" }, "source": [ "
\n", "Info: \n", "The competition discussed above used public data for development and the public leaderboard. A private dataset was kept internal during the competition for use in scoring by the organizers. Since the competition has passed, both datasets are publicly accessible from NOAA. We will build and evaluate the model using the competition's public data and evaluate storm event case studies using the competition's private data.\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "sEnsRahUEvvm" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloaded https://ngdc.noaa.gov/geomag/data/geomag/magnet/public.zip , now unzipping.\n", "Downloaded https://ngdc.noaa.gov/geomag/data/geomag/magnet/private.zip , now unzipping.\n", "\n", "Data files for input (features) and output Dst (labels):\n", "data/public/\n", "\t satellite_positions.csv\n", "\t dst_labels.csv\n", "\t solar_wind.csv\n", "\t sunspots.csv\n", "data/private/\n", "\t satellite_positions.csv\n", "\t dst_labels.csv\n", "\t solar_wind.csv\n", "\t sunspots.csv\n" ] } ], "source": [ "# Download data we need: If directory \"data/\" already exists, we'll assume the data are already downloaded.\n", "# The files are 381 MB zipped and 1.2 GB unzipped\n", "# Retrieving these files from NOAA takes 30-60 seconds on a home internet connection.\n", "\n", "import os, urllib, zipfile\n", "\n", "dir_data = 'data/'\n", "if not os.path.isdir(dir_data):\n", " os.mkdir(dir_data)\n", "\n", " # Zenodo URLs\n", " urls = ['https://zenodo.org/record/8197443/files/public.zip?download=1',\n", " 'https://zenodo.org/record/8197443/files/private.zip?download=1']\n", " \n", " # NOAA URLs (same exact data as on Zenodo) -- uncomment to download from NOAA\n", " # urls = ['https://ngdc.noaa.gov/geomag/data/geomag/magnet/public.zip',\n", " # 'https://ngdc.noaa.gov/geomag/data/geomag/magnet/private.zip']\n", "\n", " # Download and unzip each file\n", " for url in urls:\n", " zip_path, _ = urllib.request.urlretrieve(url)\n", " with zipfile.ZipFile(zip_path, \"r\") as f:\n", " print('Downloaded ', url, ', now unzipping.')\n", " f.extractall(dir_data)\n", "\n", "# Print list of data files:\n", "print('\\nData files for input (features) and output Dst (labels):')\n", "for dir_pubpriv in ['public/', 'private/']:\n", " print(dir_data+dir_pubpriv)\n", " for path, dirs, files in os.walk(dir_data+dir_pubpriv): \n", " for f in files: print('\\t', f)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "id": "9DmJY2oQEvvk" }, "outputs": [], "source": [ "# Modules we need to get started and Matplotlib configuration:\n", "import numpy as np, pandas as pd, pprint\n", "import matplotlib.pyplot as plt\n", "\n", "# The next two lines are nice for Jupyter, but not available for Colab:\n", "#%load_ext nb_black\n", "#%matplotlib inline\n", "\n", "# Matplotlib Configuration\n", "import matplotlib.pyplot as plt\n", "font = {'family' : 'sans-serif',\n", " 'weight' : 'normal',\n", " 'size' : 14}\n", "plt.rc('font', **font)" ] }, { "cell_type": "markdown", "metadata": { "id": "_p0ic5qPEvvm" }, "source": [ "#### Import Input (Features) and Output (Labels) as Pandas DataFrames\n", "
\n", "Info: As described above, the input data is a time series of solar wind measurements at L1 along with sunspot number, and the output data is a time series of Dst. Recall that for the past competition, the competitors did not have the real geophysical date/time. So here, we will recreate a new column of real geophysical date/time from our timedelta and the table shown in \"Data Summary\".\n", "
" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "id": "l6kkNSINEvvm" }, "outputs": [], "source": [ "# From our time range table in the \"Data Summary\" section:\n", "period_ranges = {\n", " 'train_a':[pd.Timestamp('1998/2/16 00:00:00'), pd.Timestamp('2001/5/31 23:59:00')],\n", " 'train_b':[pd.Timestamp('2013/6/1 00:00:00'), pd.Timestamp('2019/5/31 23:59:00')],\n", " 'train_c':[pd.Timestamp('2004/5/1 00:00:00'), pd.Timestamp('2010/12/31 23:59:00')],\n", " 'test_a' :[pd.Timestamp('2001/6/1 00:00:00'), pd.Timestamp('2004/4/30 23:59:00')],\n", " 'test_b' :[pd.Timestamp('2011/1/1 00:00:00'), pd.Timestamp('2013/5/31 23:59:00')],\n", " 'test_c' :[pd.Timestamp('2019/6/1 00:00:00'), pd.Timestamp('2020/10/31 23:59:00')]}\n", "\n", "def convert_timedelta_to_datetime( df ):\n", " \"\"\"Adds real geophysical datetimes to our DataFrame using the original \"index\" timestamps.\n", " \n", " The relative \"index\" timestamps were used in the MagNet competition datasets since all of the data\n", " were in the public domain.\n", " \n", " Parameters\n", " ----------\n", " df: pd.DataFrame\n", " Includes index time\n", " \n", " Returns\n", " -------\n", " df_datetimes: pd.DataFrame\n", " Adds datetimes to the input pd.DataFrame\n", " \"\"\"\n", " df_datetimes = pd.DataFrame(index=df.index)\n", " df_datetimes['datetime'] = pd.NaT # like Numpy NaN\n", "\n", " i = 0\n", " for period_name, timedelta in df.index:\n", " start_time = period_ranges[period_name][0]\n", " datetime = timedelta + start_time # add Pandas Timedelta to Pandas Timestamp\n", " df_datetimes['datetime'].values[i] = datetime\n", " i += 1\n", "\n", " #print('%s: %s + %s = %s' % (period_name, timedelta, start_time, df['datetime'].values[i]))\n", "\n", " return df_datetimes" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "id": "J6uYH3zGEvvm" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Reading in the Dst output data...\n", "Reading in the Sunspot input data...\n", "Reading in the Solarwind input data...\n", "Reading in the Satellite position input data...\n" ] } ], "source": [ "# Import as Pandas DataFrames\n", "from pathlib import Path\n", "DATA_PATH = Path(\"data/public/\")\n", "\n", "print('Reading in the Dst output data...')\n", "dst = pd.read_csv(DATA_PATH / \"dst_labels.csv\")\n", "dst.timedelta = pd.to_timedelta(dst.timedelta)\n", "dst.set_index([\"period\", \"timedelta\"], inplace=True)\n", "\n", "print('Reading in the Sunspot input data...')\n", "sunspots = pd.read_csv(DATA_PATH / \"sunspots.csv\")\n", "sunspots.timedelta = pd.to_timedelta(sunspots.timedelta)\n", "sunspots.set_index([\"period\", \"timedelta\"], inplace=True)\n", "\n", "print('Reading in the Solarwind input data...')\n", "solar_wind = pd.read_csv(DATA_PATH / \"solar_wind.csv\")\n", "solar_wind.timedelta = pd.to_timedelta(solar_wind.timedelta)\n", "solar_wind.set_index([\"period\", \"timedelta\"], inplace=True)\n", "\n", "print('Reading in the Satellite position input data...')\n", "satellite_positions = pd.read_csv(DATA_PATH / \"satellite_positions.csv\")\n", "satellite_positions.timedelta = pd.to_timedelta(satellite_positions.timedelta)\n", "satellite_positions.set_index([\"period\", \"timedelta\"], inplace=True)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "uB6NzWi0Evvr", "tags": [] }, "source": [ "### Feature Relationships" ] }, { "cell_type": "markdown", "metadata": { "id": "k0H4VUz2Evvq" }, "source": [ "Data gaps in the Solar Wind data are a common issue with real-time data\n", "\n", "
\n", "Be Aware: Gaps in our input (features) are something we'll need to deal carefully with, i.e. in the preprocessing steps below.\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "WTaLLrKFEvvr" }, "source": [ "
\n", "Info: There are several challenges when working with these \"operational\" observations of the solar wind will we need to solve before modeling (e.g. missing data).\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "aZXj0zgCEvv2" }, "source": [ "### Feature Generation" ] }, { "cell_type": "markdown", "metadata": { "id": "Me6lZ5KyEvv3" }, "source": [ "#### Set seeds for reproducibility" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "id": "KSkdq3U0Evv3" }, "outputs": [], "source": [ "from numpy.random import seed\n", "from tensorflow.random import set_seed\n", "\n", "seed(2020)\n", "set_seed(2021)" ] }, { "cell_type": "markdown", "metadata": { "id": "pSVefEirEvv3" }, "source": [ "#### Feature / Input Data we'll use to Train the Model" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Info: It's good to think about what features we'd recommend for use in developing our model. An additional exercise at the end of this notebook has learners try different sets of features. You can do so simply by adjusting the \"SOLAR_WIND_FEATURES\" list below. \n", "
" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "id": "gjL2Me5oEvv3" }, "outputs": [], "source": [ "# subset of solar wind features to use for modeling\n", "SOLAR_WIND_FEATURES = [\n", " \"bt\",\n", " \"temperature\",\n", " \"bx_gsm\",\n", " \"by_gsm\",\n", " \"bz_gsm\",\n", " \"speed\",\n", " \"density\",\n", "]" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "id": "Tx6RB1ZREvv4" }, "outputs": [], "source": [ "# The model will be built on feature statistics, mean and standard deviation\n", "XCOLS = (\n", " [col + \"_mean\" for col in SOLAR_WIND_FEATURES]\n", " + [col + \"_std\" for col in SOLAR_WIND_FEATURES]\n", " + [\"smoothed_ssn\"]\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "-c1rhKvzEvv4" }, "source": [ "
\n", "Info: As discussed above, we'll need to fill in gaps and create statistical summaries (hourly means and standard deviations) of our features before modeling. The following routines provide this \"preprocessing\" functionality of gap filling, and scaling by features' statistics.\n", "
" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "id": "BbhKxEukEvv4" }, "outputs": [], "source": [ "from sklearn.preprocessing import StandardScaler\n", "\n", "def impute_features(feature_df):\n", " \"\"\"Imputes (inplace) missing input (feature) data.\n", " \n", " Imputes using the following methods:\n", " `smoothed_ssn` - forward fill\n", " `solar_wind` - interpolation\n", " \n", " Parameters\n", " ----------\n", " feature_df : pd.DataFrame\n", " Our original input (feature) data which has gaps.\n", " \n", " Returns\n", " -------\n", " feature_df : pd.DataFrame\n", " Updated input (feature) data with gaps filled, inplace using the input DataFrame.\n", " \"\"\"\n", "\n", " # forward fill sunspot data for the rest of the month\n", " feature_df.smoothed_ssn = feature_df.smoothed_ssn.fillna(method=\"ffill\")\n", " # interpolate between missing solar wind values\n", " feature_df = feature_df.interpolate()\n", " return feature_df\n", "\n", "\n", "def aggregate_hourly(feature_df, aggs=[\"mean\", \"std\"]):\n", " \"\"\"Aggregates input (features) to the floor of each hour using mean and standard deviation.\n", " \n", " e.g. All values from \"11:00:00\" to \"11:59:00\" will be aggregated to \"11:00:00\".\n", " \n", " feature_df : pd.DataFrame\n", " Our original input (feature) data to be aggregated.\n", " \n", " aggs : [\"mean\", \"std\"] \n", " Specifies the desired method, either \"mean\" or \"std\".\n", " \n", " Returns\n", " -------\n", " agged : pd.DataFrame\n", " New input (feature) data aggregated per chosen method.\n", " \"\"\"\n", "\n", " # group by the floor of each hour use timedelta index\n", " agged = feature_df.groupby(\n", " [\"period\", feature_df.index.get_level_values(1).floor(\"H\")]\n", " ).agg(aggs)\n", " # flatten hierachical column index\n", " agged.columns = [\"_\".join(x) for x in agged.columns]\n", " return agged\n", "\n", "\n", "def preprocess_features(solar_wind, sunspots, scaler=None, subset=None):\n", " \"\"\"Preprocesses the input (feature) data.\n", "\n", " Preprocessing steps:\n", " - Subset the data\n", " - Aggregate hourly\n", " - Join solar wind and sunspot data\n", " - Scale using standard scaler\n", " - Impute missing values\n", " \n", " Parameters\n", " ----------\n", " solar_wind : pd.DataFrame\n", " Will be imputed (gap filled), aggregated (hourly), joined to sunspots, and scaled.\n", " \n", " sunspots : pd.DataFrame\n", " Will be scaled and joined to the imputed, aggregated, scaled solar_wind.\n", " \n", " scaler : sklearn.preprocessing.StandardScaler, None, optional\n", " If not provided, a StandardScaler() instance is created.\n", " \n", " subset: None, iterable, optional\n", " Subset of the \"solar_wind\" features we'd like processed.\n", "\n", " Returns\n", " -------\n", " imputed : pd.DataFrame\n", " This is the solar_wind hourly aggregated joined with \"sunspots\", and scaled.\n", "\n", " scaler : sklearn.preprocessing.StandardScaler\n", " The scaler that was used to normalize the solar_wind and sunspots.\n", " \n", " \"\"\"\n", "\n", " # select features we want to use\n", " if subset:\n", " solar_wind = solar_wind[subset]\n", "\n", " # aggregate solar wind data and join with sunspots\n", " hourly_features = aggregate_hourly(solar_wind).join(sunspots)\n", "\n", " # subtract mean and divide by standard deviation\n", " if scaler is None:\n", " scaler = StandardScaler()\n", " scaler.fit(hourly_features)\n", "\n", " normalized = pd.DataFrame(\n", " scaler.transform(hourly_features),\n", " index=hourly_features.index,\n", " columns=hourly_features.columns,\n", " )\n", "\n", " # impute missing values\n", " imputed = impute_features(normalized)\n", "\n", " # we want to return the scaler object as well to use later during prediction\n", " return imputed, scaler" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "id": "qrs16z0gEvv5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(139872, 15)\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
bt_meanbt_stdtemperature_meantemperature_stdbx_gsm_meanbx_gsm_stdby_gsm_meanby_gsm_stdbz_gsm_meanbz_gsm_stdspeed_meanspeed_stddensity_meandensity_stdsmoothed_ssn
periodtimedelta
train_a0 days 00:00:000.4997052.443614-0.3752670.383941-1.600307-0.3817270.4344240.0211560.292754-0.645095-0.7385460.862524-0.775827-0.2057240.139444
0 days 01:00:000.547177-0.224580-0.4794300.953178-1.759200-0.8680440.189021-0.2828450.433737-0.511040-0.9869040.995063-0.861692-0.0582150.139444
0 days 02:00:000.739905-0.770240-0.574831-0.192518-1.913422-1.1146490.193116-0.8315260.747220-0.870482-1.0135480.554085-0.846222-0.2200120.139444
\n", "
" ], "text/plain": [ " bt_mean bt_std temperature_mean \\\n", "period timedelta \n", "train_a 0 days 00:00:00 0.499705 2.443614 -0.375267 \n", " 0 days 01:00:00 0.547177 -0.224580 -0.479430 \n", " 0 days 02:00:00 0.739905 -0.770240 -0.574831 \n", "\n", " temperature_std bx_gsm_mean bx_gsm_std \\\n", "period timedelta \n", "train_a 0 days 00:00:00 0.383941 -1.600307 -0.381727 \n", " 0 days 01:00:00 0.953178 -1.759200 -0.868044 \n", " 0 days 02:00:00 -0.192518 -1.913422 -1.114649 \n", "\n", " by_gsm_mean by_gsm_std bz_gsm_mean bz_gsm_std \\\n", "period timedelta \n", "train_a 0 days 00:00:00 0.434424 0.021156 0.292754 -0.645095 \n", " 0 days 01:00:00 0.189021 -0.282845 0.433737 -0.511040 \n", " 0 days 02:00:00 0.193116 -0.831526 0.747220 -0.870482 \n", "\n", " speed_mean speed_std density_mean density_std \\\n", "period timedelta \n", "train_a 0 days 00:00:00 -0.738546 0.862524 -0.775827 -0.205724 \n", " 0 days 01:00:00 -0.986904 0.995063 -0.861692 -0.058215 \n", " 0 days 02:00:00 -1.013548 0.554085 -0.846222 -0.220012 \n", "\n", " smoothed_ssn \n", "period timedelta \n", "train_a 0 days 00:00:00 0.139444 \n", " 0 days 01:00:00 0.139444 \n", " 0 days 02:00:00 0.139444 " ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features, scaler = preprocess_features(solar_wind, sunspots, subset=SOLAR_WIND_FEATURES)\n", "print(features.shape)\n", "features.head(n=3)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "id": "4Os6-0W7Evv5" }, "outputs": [], "source": [ "# check to make sure missing values are filled\n", "assert (features.isna().sum() == 0).all()" ] }, { "cell_type": "markdown", "metadata": { "id": "2zLrR9YWEvv5" }, "source": [ "
\n", "Info: We also need to prepare our output (labels), i.e. our space weather storm index Dst, which is already a time series with an hourly cadence. The modeling task is to predict Dst at hour t0 and the next hour t1.\n", "
" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "id": "q_HKa3mqEvv5" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
t0t1
periodtimedelta
train_a0 days 00:00:00-7-10.0
0 days 01:00:00-10-10.0
0 days 02:00:00-10-6.0
0 days 03:00:00-6-2.0
0 days 04:00:00-23.0
\n", "
" ], "text/plain": [ " t0 t1\n", "period timedelta \n", "train_a 0 days 00:00:00 -7 -10.0\n", " 0 days 01:00:00 -10 -10.0\n", " 0 days 02:00:00 -10 -6.0\n", " 0 days 03:00:00 -6 -2.0\n", " 0 days 04:00:00 -2 3.0" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "YCOLS = [\"t0\", \"t1\"]\n", "\n", "\n", "def process_labels(dst):\n", " \"\"\"Create dst[t0] (current time) and dst[t1] (next hour) labels and group by training periods.\n", " \n", " This is needed because we wish to train the model on predicting Dst at the current time (t0)\n", " and for the next hour (t1). The method is a simple Pandas DataFrame array timeshift from dst[0:] to get dst[1:].\n", " \n", " Parameters\n", " ----------\n", " dst : pd.DataFrame\n", " \n", " Returns\n", " -------\n", " y : pd.DataFrame\n", " New copy of dst pd.DataFrame now including shifted Dst, and is grouped by training period.\n", " This is what we will train the model on.\n", " \"\"\"\n", "\n", " y = dst.copy()\n", " y[\"t0\"] = y.groupby(\"period\").dst.shift( 0)\n", " y[\"t1\"] = y.groupby(\"period\").dst.shift(-1)\n", " return y[YCOLS]\n", "\n", "\n", "labels = process_labels(dst)\n", "labels.head(n=5)" ] }, { "cell_type": "markdown", "metadata": { "id": "yi-13zy1Evv6" }, "source": [ "
\n", "Tip: For convenience, join our processed solar wind hourly inputs (features) and our Dst (labels) into one Pandas DataFrame.\n", "
" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "id": "3WrOK4diEvv6" }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
t0t1bt_meanbt_stdtemperature_meantemperature_stdbx_gsm_meanbx_gsm_stdby_gsm_meanby_gsm_stdbz_gsm_meanbz_gsm_stdspeed_meanspeed_stddensity_meandensity_stdsmoothed_ssn
periodtimedelta
train_a0 days 00:00:00-7-10.00.4997052.443614-0.3752670.383941-1.600307-0.3817270.4344240.0211560.292754-0.645095-0.7385460.862524-0.775827-0.2057240.139444
0 days 01:00:00-10-10.00.547177-0.224580-0.4794300.953178-1.759200-0.8680440.189021-0.2828450.433737-0.511040-0.9869040.995063-0.861692-0.0582150.139444
0 days 02:00:00-10-6.00.739905-0.770240-0.574831-0.192518-1.913422-1.1146490.193116-0.8315260.747220-0.870482-1.0135480.554085-0.846222-0.2200120.139444
\n", "
" ], "text/plain": [ " t0 t1 bt_mean bt_std temperature_mean \\\n", "period timedelta \n", "train_a 0 days 00:00:00 -7 -10.0 0.499705 2.443614 -0.375267 \n", " 0 days 01:00:00 -10 -10.0 0.547177 -0.224580 -0.479430 \n", " 0 days 02:00:00 -10 -6.0 0.739905 -0.770240 -0.574831 \n", "\n", " temperature_std bx_gsm_mean bx_gsm_std \\\n", "period timedelta \n", "train_a 0 days 00:00:00 0.383941 -1.600307 -0.381727 \n", " 0 days 01:00:00 0.953178 -1.759200 -0.868044 \n", " 0 days 02:00:00 -0.192518 -1.913422 -1.114649 \n", "\n", " by_gsm_mean by_gsm_std bz_gsm_mean bz_gsm_std \\\n", "period timedelta \n", "train_a 0 days 00:00:00 0.434424 0.021156 0.292754 -0.645095 \n", " 0 days 01:00:00 0.189021 -0.282845 0.433737 -0.511040 \n", " 0 days 02:00:00 0.193116 -0.831526 0.747220 -0.870482 \n", "\n", " speed_mean speed_std density_mean density_std \\\n", "period timedelta \n", "train_a 0 days 00:00:00 -0.738546 0.862524 -0.775827 -0.205724 \n", " 0 days 01:00:00 -0.986904 0.995063 -0.861692 -0.058215 \n", " 0 days 02:00:00 -1.013548 0.554085 -0.846222 -0.220012 \n", "\n", " smoothed_ssn \n", "period timedelta \n", "train_a 0 days 00:00:00 0.139444 \n", " 0 days 01:00:00 0.139444 \n", " 0 days 02:00:00 0.139444 " ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = labels.join(features)\n", "data.head(n=3)" ] }, { "cell_type": "markdown", "metadata": { "id": "XoQxxeShEvv6" }, "source": [ "### Splitting the Data\n", "\n", "
\n", "Info: We'll split our features and labels into Training, Testing and Validation sets for each of the 3 training periods, named train_a, train_b, train_c (see Data Summary for additional details).\n", "
" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "id": "uSdIiKQLOOEQ" }, "outputs": [], "source": [ "def get_train_test_val(data, test_per_period, val_per_period):\n", " \"\"\"Splits data across periods into train, test, and validation\n", " \n", " Parameters\n", " ----------\n", " data : pd.DataFrame\n", " This is our input (features) and output (labels) DataFrame.\n", " \n", " test_per_period : int\n", " The number of timestamps to use in test period.\n", " \n", " val_per_period : int\n", " The number of timestamps to use in validation period.\n", "\n", " Returns\n", " -------\n", " test : pd.DataFrame\n", " Test data grouped by the desired period size\n", "\n", " val : pd.DataFrame\n", " Validation data grouped by the desired period size\n", "\n", " train : pd.DataFrame\n", " Remaining data as Training data\n", "\n", " \"\"\"\n", " \n", " # assign the last `test_per_period` rows from each period to test\n", " test = data.groupby(\"period\").tail(test_per_period)\n", " interim = data[~data.index.isin(test.index)]\n", " # assign the last `val_per_period` from the remaining rows to validation\n", " val = interim.groupby(\"period\").tail(val_per_period)\n", " # the remaining rows are assigned to train\n", " train = interim[~interim.index.isin(val.index)]\n", " return train, test, val\n", "\n", "\n", "train, test, val = get_train_test_val(data, test_per_period=6_000, val_per_period=3_000)" ] }, { "cell_type": "markdown", "metadata": { "id": "U4FM-_JaTygx" }, "source": [ "### Load a Pre-Trained Model" ] }, { "cell_type": "markdown", "metadata": { "id": "OXK_ZR8TEvv8" }, "source": [ "#### Load Model, Scaler, History and Configuration" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Exercise: Choose from the following pre-trained models developed in the Chapter 1 notebook.\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "YTt6vnSWEvwA" }, "source": [ "
\n", "Be Aware: The smallest model size option in the Chapter 1 notebook is set for notebook execution speed and training will not fully converge. In this notebook it's recommended that you load a model from Chapter 1 that's at least as performant as the MagNet benchmark case for convergence and benchmark performance. \n", "
" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "id": "YpYjboipHS0w" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Here is a list of pre-trained models:\n", "\n", " 0: trained_models_lstm/model_lstm_nepochs-04_nneurons-0016/\n", " 1: trained_models_lstm/model_lstm_nepochs-20_nneurons-0512/\n" ] }, { "name": "stdin", "output_type": "stream", "text": [ "Enter number of pre-trained model: 1\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "\n", "Loading pre-trained model from: trained_models_lstm/model_lstm_nepochs-20_nneurons-0512/\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "2023-08-14 21:32:01.838162: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA\n", "To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Model: \"sequential\"\n", "_________________________________________________________________\n", " Layer (type) Output Shape Param # \n", "=================================================================\n", " lstm (LSTM) (None, 512) 1081344 \n", " \n", " dense (Dense) (None, 2) 1026 \n", " \n", "=================================================================\n", "Total params: 1,082,370\n", "Trainable params: 1,082,370\n", "Non-trainable params: 0\n", "_________________________________________________________________\n", "\n", "Scaler:\n", "StandardScaler()\n", "\n", "History:\n", "{'loss': [322.2149353027344,\n", " 270.9563903808594,\n", " 211.81797790527344,\n", " 176.59622192382812,\n", " 154.60304260253906,\n", " 140.24449157714844,\n", " 131.95513916015625,\n", " 128.2601318359375,\n", " 123.83348846435547,\n", " 119.80756378173828,\n", " 119.03910064697266,\n", " 116.14312744140625,\n", " 111.89155578613281,\n", " 109.86148071289062,\n", " 114.11012268066406,\n", " 112.21226501464844,\n", " 109.86665344238281,\n", " 104.17289733886719,\n", " 103.00347900390625,\n", " 100.32203674316406],\n", " 'val_loss': [528.2766723632812,\n", " 428.2550048828125,\n", " 365.3229064941406,\n", " 296.94671630859375,\n", " 269.1596984863281,\n", " 257.41168212890625,\n", " 245.89346313476562,\n", " 224.30567932128906,\n", " 217.13438415527344,\n", " 207.34120178222656,\n", " 200.94464111328125,\n", " 192.1378173828125,\n", " 180.00282287597656,\n", " 176.0775604248047,\n", " 197.2196502685547,\n", " 191.03514099121094,\n", " 183.9971160888672,\n", " 168.11105346679688,\n", " 169.42469787597656,\n", " 173.28759765625]}\n", "\n", "Configuration:\n", "{'batch_size': 32,\n", " 'solar_wind_subset': ['bt',\n", " 'temperature',\n", " 'bx_gsm',\n", " 'by_gsm',\n", " 'bz_gsm',\n", " 'speed',\n", " 'density'],\n", " 'timesteps': 32}\n" ] } ], "source": [ "import tensorflow.keras as keras\n", "\n", "import glob\n", "# List existing LSTM models:\n", "dir_list = glob.glob('trained_models_lstm/model_lstm_*/')\n", "print('Here is a list of pre-trained models:\\n')\n", "for i in range(len(dir_list)):\n", " print(' %d: %s' % (i, dir_list[i]))\n", "\n", "dir_model = dir_list[int(input('Enter number of pre-trained model: '))]\n", "\n", "import json\n", "import pickle\n", "\n", "# Load in serialized model, config, and scaler\n", "print('\\nLoading pre-trained model from: %s' % dir_model)\n", "model = keras.models.load_model(dir_model)\n", "model.summary()\n", "\n", "# Load Scaler\n", "with open(dir_model+\"/scaler.pck\", \"rb\") as f:\n", " scaler = pickle.load(f)\n", "print('\\nScaler:')\n", "pprint.pprint(scaler)\n", "\n", "# Load History\n", "with open(dir_model+\"/history.pck\", \"rb\") as f:\n", " history = pickle.load(f)\n", "print('\\nHistory:')\n", "pprint.pprint(history)\n", "\n", "# Load Configuration\n", "with open(dir_model+\"/config.json\", \"r\") as f:\n", " data_config = json.load(f)\n", "print('\\nConfiguration:')\n", "pprint.pprint(data_config)" ] }, { "cell_type": "markdown", "metadata": { "id": "xg8cMNbnEvv-" }, "source": [ "#### BatchDataset: Training, Validation and Test Data\n", "In order to evaluate / test our Pre-trained model, we'll create [tensorflow.python.data.ops.dataset_ops.BatchDataset](https://www.tensorflow.org/guide/data#batching_dataset_elements) structures for our Test DataFrames." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", " Info: Note that we only need our original \"test\" data for the model evaluations below, e.g. storm event prediction, and feature importance. You can uncomment the \"train_ds\" and \"val_ds\" lines of code if you'd like to look at those as well.\n", "
\n", "\n", "Additional information: The competition discussed in this notebook used public data for development and the public leaderboard. A private dataset was kept internal during the competition for use in scoring by the organizers. Since the competition has passed, both datasets are publicly accessible from NOAA. We built the model in the Chapter 1 notebook and wil evaluate the model here using the competition's public data. And in this notebook will evaluate input (feature) importance and storm event case studies using the competition's private data." ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "id": "XD7CJDs0PzQ4" }, "outputs": [], "source": [ "import tensorflow.keras as keras\n", "from keras import preprocessing\n", "\n", "def timeseries_dataset_from_df(df, batch_size):\n", " \"\"\"Provides a batched dataset as pd.DataFrame\n", " \n", " Parameters\n", " ----------\n", " df : pd.DataFrame\n", " batch_size : int\n", "\n", " Returns\n", " -------\n", " dataset : pd.DataFrame\n", " Batched data.\n", " \"\"\"\n", "\n", " dataset = None\n", " timesteps = data_config[\"timesteps\"]\n", "\n", " # iterate through periods\n", " for _, period_df in df.groupby(\"period\"):\n", " # realign features and labels so that first sequence of 32 is aligned with the 33rd target\n", " inputs = period_df[XCOLS][:-timesteps]\n", " outputs = period_df[YCOLS][timesteps:]\n", "\n", " period_ds = keras.preprocessing.timeseries_dataset_from_array(\n", " inputs,\n", " outputs,\n", " timesteps,\n", " batch_size=batch_size,\n", " )\n", "\n", " if dataset is None:\n", " dataset = period_ds\n", " else:\n", " dataset = dataset.concatenate(period_ds)\n", "\n", " return dataset\n", "\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": { "id": "G65lroOLEvv-" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of test batches: 558\n" ] } ], "source": [ "#train_ds = timeseries_dataset_from_df(train, data_config[\"batch_size\"])\n", "#val_ds = timeseries_dataset_from_df(val, data_config[\"batch_size\"])\n", "test_ds = timeseries_dataset_from_df(test, data_config[\"batch_size\"])\n", "\n", "#print(f\"Number of training batches: {len(train_ds)}\")\n", "#print(f\"Number of validation batches: {len(val_ds)}\")\n", "print(f\"Number of test batches: {len(test_ds)}\")" ] }, { "cell_type": "markdown", "metadata": { "id": "wkwxYtOqT8WT" }, "source": [ "### Evaluate Trained Model" ] }, { "cell_type": "code", "execution_count": 17, "metadata": { "id": "87GD27_tT-JY" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "for name, values in history.items():\n", " plt.plot(values, 's-', label=name)\n", "plt.legend(fontsize=14)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 18, "metadata": { "id": "rvj69RaTUoBw" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "558/558 [==============================] - 39s 69ms/step - loss: 163.6485\n", "Test RMSE: 12.79 nano-Tesla\n" ] } ], "source": [ "rmse = model.evaluate(test_ds)**0.5\n", "print(f\"Test RMSE: {rmse:.2f} nano-Tesla\")" ] }, { "cell_type": "markdown", "metadata": { "id": "X2o3MHneUD0M" }, "source": [ "### Model Performance Evaluation\n", "\n", "Here you'll get experience with:\n", "* Introductory explainable AI (XAI) via Permutation Importance\n", "* Model performance on user chosen storm events" ] }, { "cell_type": "markdown", "metadata": { "id": "ZxmhOeEBNnE2" }, "source": [ "#### Permutation Importance - Easy Approximation\n", "\n", "
\n", "Info: Based on Christoph Molnar's \"Interpretable Machine Learning\" section and Fisher, Rudin, and Dominici (2018), we will \"split the dataset in half and swap the values of feature j of the two halves instead of permuting feature j\". \n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "C6z_7ulNEvwB" }, "source": [ "Additional Resources:\n", "* Christoph Molnar's \"Interpretable Machine Learning\" section on [Permutation Feature Importance](https://christophm.github.io/interpretable-ml-book/feature-importance.html), and see also their argument for using Test data for Permutation Importance evaluation, which we have chosen to do here.\n", "* [Illustrative graphic demonstrating single- and multi-pass Permutation Importance](https://permutationimportance.readthedocs.io/en/latest/methods.html#permutation-importance)\n", "* [Permutation Feature Importance in the scikit-learn module](https://scikit-learn.org/stable/modules/permutation_importance.html)\n" ] }, { "cell_type": "markdown", "metadata": { "id": "o1I9DXilEvwC" }, "source": [ "Basically, we can split and swap the feature datasets one feature at a time and compare the resultant RMSE. We take a programming convenience shortcut and simply reverse each feature vector rather than split and swap and we expect the same results. We'll do this, i.e. permute each feature vector, one at a time." ] }, { "cell_type": "markdown", "metadata": { "id": "Qsx6AbePEvwC" }, "source": [ "
\n", "Tip: Recall that our test_ds which we used to evaluate the model performance is a tensorflow.python.data.ops.dataset_ops.BatchDataset and these are honestly kind of hard to work with. So we will recreate a deep copy of test_ds for each permutation and so we don't corrupt the original.\n", "
" ] }, { "cell_type": "code", "execution_count": 19, "metadata": { "id": "JhBr9xlT3IdZ" }, "outputs": [], "source": [ "# Note: We shouldn't need these two lines below but they seem needed generalizing to run w/o\n", "# issues on both Colaboratory (Python 3.7) and Jupyter server with Python 3.9.\n", "# Contact POCs if you get an error such as:\n", "# AttributeError: module 'keras.preprocessing' has no attribute 'timeseries_dataset_from_array'\n", "import tensorflow.keras as keras\n", "from keras import preprocessing" ] }, { "cell_type": "code", "execution_count": 20, "metadata": { "id": "K92V7O4cPCiK" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "558/558 [==============================] - 38s 68ms/step - loss: 190.4299\n", "bt_mean: 13.799636 rmse nano-Tesla\n", "558/558 [==============================] - 46s 82ms/step - loss: 161.5839\n", "temperature_mean: 12.711564 rmse nano-Tesla\n", "558/558 [==============================] - 36s 65ms/step - loss: 167.0069\n", "bx_gsm_mean: 12.923114 rmse nano-Tesla\n", "558/558 [==============================] - 41s 73ms/step - loss: 173.1632\n", "by_gsm_mean: 13.159147 rmse nano-Tesla\n", "558/558 [==============================] - 34s 60ms/step - loss: 334.1352\n", "bz_gsm_mean: 18.279366 rmse nano-Tesla\n", "558/558 [==============================] - 53s 96ms/step - loss: 216.5030\n", "speed_mean: 14.714041 rmse nano-Tesla\n", "558/558 [==============================] - 54s 97ms/step - loss: 169.8606\n", "density_mean: 13.033059 rmse nano-Tesla\n", "558/558 [==============================] - 56s 100ms/step - loss: 169.9884\n", "bt_std: 13.037961 rmse nano-Tesla\n", "558/558 [==============================] - 58s 103ms/step - loss: 164.2722\n", "temperature_std: 12.816871 rmse nano-Tesla\n", "558/558 [==============================] - 59s 105ms/step - loss: 167.5227\n", "bx_gsm_std: 12.943056 rmse nano-Tesla\n", "558/558 [==============================] - 61s 109ms/step - loss: 173.0972\n", "by_gsm_std: 13.156641 rmse nano-Tesla\n", "558/558 [==============================] - 64s 114ms/step - loss: 165.3865\n", "bz_gsm_std: 12.860270 rmse nano-Tesla\n", "558/558 [==============================] - 63s 112ms/step - loss: 164.6986\n", "speed_std: 12.833495 rmse nano-Tesla\n", "558/558 [==============================] - 59s 105ms/step - loss: 165.6094\n", "density_std: 12.868932 rmse nano-Tesla\n", "558/558 [==============================] - 59s 106ms/step - loss: 175.3159\n", "smoothed_ssn: 13.240692 rmse nano-Tesla\n" ] } ], "source": [ "# A couple of ways to learn about the contents of a BatchDataset:\n", "# print(list(test_ds.as_numpy_iterator()))\n", "# type(test_ds)\n", "\n", "\n", "rmse_permute_df = pd.DataFrame(np.zeros((1,len(XCOLS))), columns=XCOLS)\n", "for fname in XCOLS:\n", "\n", " # We're going to edit this data so make a deep copy of our preprocessed training dataset.\n", " test_for_permute = test.copy(deep=True)\n", "\n", " # Approximate split permutation by simply reversing the data in this feature\n", " test_for_permute[fname].values[:] = test_for_permute[fname].values[::-1]\n", "\n", " # create TensorFlow BatchDataset\n", " permute_ds = timeseries_dataset_from_df(test_for_permute, data_config[\"batch_size\"])\n", "\n", " # evaluate model\n", " rmse_permute_df[fname] = model.evaluate(permute_ds)**0.5\n", "\n", " print('%s: %f rmse nano-Tesla' % (fname, rmse_permute_df[fname]))" ] }, { "cell_type": "markdown", "metadata": { "id": "7jfs_EkGEvwF" }, "source": [ "
\n", "Info: Permutation Importance is evaluated as the influence a feature has relative to our unpermuted baseline performance. It's typical to use either a ratio or subtraction to relate to our baseline. Here we use a ratio.\n", "
" ] }, { "cell_type": "code", "execution_count": 21, "metadata": { "id": "DG43wGSwETt7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "In order of most important feature first to least important by rmse(j)/rmse:\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
0
bz_gsm_mean1.428911
speed_mean1.150207
bt_mean1.078727
smoothed_ssn1.035034
by_gsm_mean1.028660
by_gsm_std1.028464
bt_std1.019187
density_mean1.018803
bx_gsm_std1.011768
bx_gsm_mean1.010209
density_std1.005974
bz_gsm_std1.005296
speed_std1.003203
temperature_std1.001904
temperature_mean0.993672
\n", "
" ], "text/plain": [ " 0\n", "bz_gsm_mean 1.428911\n", "speed_mean 1.150207\n", "bt_mean 1.078727\n", "smoothed_ssn 1.035034\n", "by_gsm_mean 1.028660\n", "by_gsm_std 1.028464\n", "bt_std 1.019187\n", "density_mean 1.018803\n", "bx_gsm_std 1.011768\n", "bx_gsm_mean 1.010209\n", "density_std 1.005974\n", "bz_gsm_std 1.005296\n", "speed_std 1.003203\n", "temperature_std 1.001904\n", "temperature_mean 0.993672" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Ratio the Permuted RMSE to the overall RMSE and sort in order of importance\n", "print('In order of most important feature first to least important by rmse(j)/rmse:')\n", "rmse_ratio_df = (rmse_permute_df/rmse).sort_values(ascending=False, by=0, axis=1)\n", "rmse_ratio_df.T" ] }, { "cell_type": "code", "execution_count": 22, "metadata": { "id": "FasRNQv_EvwG" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Visualize the Permutation Importance outcome\n", "plt.plot(rmse_ratio_df.columns, rmse_ratio_df.values.T, 'x-')\n", "plt.xticks(rotation=270)\n", "plt.ylabel('RMSE Ratio %')\n", "plt.grid(True)\n", "plt.show()\n", "\n", "# Uncomment for a Pandas barplot:\n", "# (Note that there might be too many colors to easily interpret)\n", "#rmse_ratio_df.plot(kind='bar', figsize=(10, 5))\n", "#plt.title('Permutation Feature Importance')\n", "#plt.xlim(-0.25,)\n", "#plt.ylim(0.95, rmse_ratio_df.iloc[0,0])\n", "#plt.grid(True)\n", "#plt.tight_layout()" ] }, { "cell_type": "markdown", "metadata": { "id": "yLbZKYfgEjoT" }, "source": [ "#### Use these Feature Importances to Aid Evaluation" ] }, { "cell_type": "markdown", "metadata": { "id": "6NOmthg4EvwG" }, "source": [ "
\n", "Exercise: What are the model sensitivities to the input parameters?\n", "\n", "- How do our input data (features) compare in their influence on performance in predicting Dst?\n", " \n", "- How does the order of this list compare to your intuition from the Feature Correlation Heatmap we made in the beginning of this notebook?\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "pA-g5jrAbeCL" }, "source": [ "### Event Case Studies\n", "\n", "To fully evaluate our model's performance, we need to get familiar with how well it generalizes to a diverse set of geomagnetic storm events including their preconditioning. Even though we split our data into 'Train', 'Validation', and 'Test' using best practices to avoid over/under fitting, looking at specific events while leveraging our space weather geomagnetic intuition will help us gain insight into how the model performs across different phases of different types of storms.\n", "\n", "
\n", "Tip: You can use these storm phase descriptions for contextualizing your experience:\n", "\n", "* Climatology / quiet periods: Dst is generally horizontal and nearly 0 nano-Tesla.\n", "* Sudden Impulse: Dst rises from near 0 to positive values rapidly over a few hours.\n", "* Storm Sudden Commencement and Main Phase: Dst drops sharply and remains significantly negative for up to several days.\n", "* Storm Peak: Dst reaches its minimum (most negative) value.\n", "* Recovery Phase: Dst recovers from large negative values back to climatology, near 0 nano-Tesla.\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "ZqWamOEKbeCL" }, "source": [ "#### Define Prediction Function" ] }, { "cell_type": "code", "execution_count": 23, "metadata": { "id": "qHpsrh-WbeCL" }, "outputs": [], "source": [ "from typing import Tuple\n", "\n", "TIMESTEPS = data_config['timesteps']\n", "\n", "def predict_dst(\n", " solar_wind_7d: pd.DataFrame,\n", " satellite_positions_7d: pd.DataFrame,\n", " latest_sunspot_number: float,\n", ") -> Tuple[float, float]:\n", " \"\"\"Take all of the data up until time t-1, and then make predictions for times t and t+1.\n", " \n", " Parameters\n", " ----------\n", " solar_wind_7d: pd.DataFrame\n", " The last 7 days of satellite data up until (t - 1) minutes [exclusive of t]\n", " satellite_positions_7d: pd.DataFrame\n", " The last 7 days of satellite position data up until the present time [inclusive of t]\n", " latest_sunspot_number: float\n", " The latest monthly sunspot number (SSN) to be available\n", " \n", " Returns\n", " -------\n", " predictions : Tuple[float, float]\n", " A tuple of two predictions, for (t and t + 1 hour) respectively; these should\n", " be between -2,000 and 500.\n", " \"\"\"\n", "\n", " # Re-format data to fit into our pipeline\n", " sunspots = pd.DataFrame(index=solar_wind_7d.index, columns=[\"smoothed_ssn\"])\n", " sunspots[\"smoothed_ssn\"].values[:] = latest_sunspot_number\n", "\n", " # Process our features and grab last 32 (timesteps) hours\n", " features, s = preprocess_features(\n", " solar_wind_7d, sunspots, scaler=scaler, subset=SOLAR_WIND_FEATURES\n", " )\n", " model_input = features[-TIMESTEPS:][XCOLS].values.reshape(\n", " (1, TIMESTEPS, features.shape[1])\n", " )\n", " #pprint.pprint(features)\n", "\n", " # Make a prediction\n", " prediction_at_t0, prediction_at_t1 = model.predict(model_input)[0]\n", "\n", " # Optional check for unexpected values\n", " if not np.isfinite(prediction_at_t0):\n", " prediction_at_t0 = -12\n", " if not np.isfinite(prediction_at_t1):\n", " prediction_at_t1 = -12\n", "\n", " return prediction_at_t0, prediction_at_t1" ] }, { "cell_type": "markdown", "metadata": { "id": "02qv90XObeCM" }, "source": [ "#### Ingest Real Event Data from Competition's \"Private\" Data\n", "Recall the data we call \"private\" here is now publicly accessible, since the MagNet competition has ended. It was the \"private\" data held back from the competitors for use by the evaluators to judge the competition entries." ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "id": "Z0LXe_NxbeCM" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Importing data from: data/private\n", "Reading in the Dst output data...\n", "Reading in the Sunspot input data...\n", "Reading in the Solarwind input data...\n", "Reading in the Satellite position input data...\n" ] } ], "source": [ "# Real Event Data that was previously held back from the competitors\n", "DATA_PATH = Path(\"data/private/\")\n", "print('Importing data from: %s' % DATA_PATH)\n", "\n", "print('Reading in the Dst output data...')\n", "dst = pd.read_csv(DATA_PATH / \"dst_labels.csv\")\n", "dst.timedelta = pd.to_timedelta(dst.timedelta)\n", "dst.set_index([\"period\", \"timedelta\"], inplace=True)\n", "\n", "print('Reading in the Sunspot input data...')\n", "sunspots = pd.read_csv(DATA_PATH / \"sunspots.csv\")\n", "sunspots.timedelta = pd.to_timedelta(sunspots.timedelta)\n", "sunspots.set_index([\"period\", \"timedelta\"], inplace=True)\n", "\n", "print('Reading in the Solarwind input data...')\n", "solar_wind = pd.read_csv(DATA_PATH / \"solar_wind.csv\")\n", "solar_wind.timedelta = pd.to_timedelta(solar_wind.timedelta)\n", "solar_wind.set_index([\"period\", \"timedelta\"], inplace=True)\n", "\n", "print('Reading in the Satellite position input data...')\n", "satellite_positions = pd.read_csv(DATA_PATH / \"satellite_positions.csv\")\n", "satellite_positions.timedelta = pd.to_timedelta(satellite_positions.timedelta)\n", "satellite_positions.set_index([\"period\", \"timedelta\"], inplace=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "w0lreT8xbeCM" }, "source": [ "#### Event: Geomagnetic storm with Dst minimum of approx. -180 nT" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "id": "wR54LwzwbeCM" }, "outputs": [ { "data": { "text/plain": [ "\"\\n# Summarize final block of Input data\\nprint('\\nSummarizing final block of input data (head and tail):')\\npprint.pprint(solar_wind_7d_by_min['bz_gsm'].head())\\npprint.pprint(solar_wind_7d_by_min['bz_gsm'].tail())\\npprint.pprint(satellite_positions_7d_by_day['gse_x_ace'].head())\\npprint.pprint(satellite_positions_7d_by_day['gse_x_ace'].tail())\\npprint.pprint(latest_sunspot_number)\\n\"" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "''' User Choice of Case Study Event '''\n", "# Here we've chosen a specific event case study, you can choose different\n", "# case studied by looking at the sibling notebook magnet_cnn_tutorial.ipynb\n", "# TODO: for the correct sunspot value, currently, you have to manually peek\n", "# into the sunspots DataFrame and identify the closest match manually.\n", "event_start_day = 140\n", "latest_sunspot_number = sunspots.iloc[5] # The nearest match for Day 140.\n", "''' End User Config '''\n", "\n", "# Setup our range indices\n", "idx_event_1day = range(event_start_day, event_start_day + 7 )\n", "idx_event_1hr = range(event_start_day*24, event_start_day*24 + 7*24 )\n", "idx_event_1min = range(event_start_day*24*60, event_start_day*24*60 + 7*24*60)\n", "\n", "dst_predicted_t0 = np.nan * np.zeros(len(idx_event_1hr))\n", "dst_predicted_t1 = np.nan * np.zeros(len(idx_event_1hr))\n", "i_dst = 0\n", "###idx_1min = range((event_start_day-7)*24*60, event_start_day*24*60)\n", "for i_offset_hour in range(-7*24, 0):\n", "\n", " # for the\n", " idx_7day_1min = range(idx_event_1min[0]+i_offset_hour*60 - 1, idx_event_1min[-1]+i_offset_hour*60 - 1)\n", "\n", " idx_7day_1day = range(idx_event_1day[0]+i_offset_hour//24, idx_event_1day[-1]+i_offset_hour//24)\n", "\n", " # Subset to 7 days around event\n", " solar_wind_7d_by_min = solar_wind.iloc[idx_7day_1min]\n", " satellite_positions_7d_by_day = satellite_positions.iloc[idx_7day_1day]\n", "\n", " # Predict Dst\n", " dst_t0_t1 = predict_dst(solar_wind_7d=solar_wind_7d_by_min, satellite_positions_7d=satellite_positions_7d_by_day, latest_sunspot_number=latest_sunspot_number)\n", "\n", " dst_predicted_t0[i_dst] = dst_t0_t1[0]\n", " dst_predicted_t1[i_dst] = dst_t0_t1[1]\n", "\n", " i_dst += 1\n", "\n", " # Uncomment to see the input and output data every hour:\n", " #print('Hour %4d: SSN %.1f, Bz %.1f nT, V %.0f km/s, Dst [t0,t1] = [%.1f, %.1f] nT'\n", " # % (i_offset_hour, latest_sunspot_number, solar_wind_7d_by_min['bz_gsm'].mean(),\n", " # solar_wind_7d_by_min['speed'].mean(), dst_t0_t1[0], dst_t0_t1[1]))\n", "\n", "\n", "'''\n", "# Summarize final block of Input data\n", "print('\\nSummarizing final block of input data (head and tail):')\n", "pprint.pprint(solar_wind_7d_by_min['bz_gsm'].head())\n", "pprint.pprint(solar_wind_7d_by_min['bz_gsm'].tail())\n", "pprint.pprint(satellite_positions_7d_by_day['gse_x_ace'].head())\n", "pprint.pprint(satellite_positions_7d_by_day['gse_x_ace'].tail())\n", "pprint.pprint(latest_sunspot_number)\n", "'''" ] }, { "cell_type": "code", "execution_count": 26, "metadata": { "id": "7N7RgNYVbeCN" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "RMSE to t0 prediction: 43.441645 nT\n", "RMSE to t1 prediction: 42.952979 nT\n" ] } ], "source": [ "# RMSE for this event:\n", "# Remember to line the indices up for observed Dst and predicted Dst[t1]\n", "rmse_t0 = np.mean((dst['dst'][idx_event_1hr] - dst_predicted_t0 )**2)**0.5\n", "rmse_t1 = np.mean((dst['dst'][idx_event_1hr][1:] - dst_predicted_t1[:-1])**2)**0.5\n", "print('RMSE to t0 prediction: %f nT' % rmse_t0 )\n", "print('RMSE to t1 prediction: %f nT' % rmse_t1 )" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "id": "_sYxIvcXbeCN" }, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Get real geophysical datetimes from our timedelta:\n", "dst_datetime = convert_timedelta_to_datetime(dst)\n", "x_values = dst_datetime['datetime'][idx_event_1hr].values\n", "\n", "# Dst Observed\n", "title = 'Dst \\n%s to %s\\nIndex Time: %s to %s' % \\\n", " (dst_datetime['datetime'][idx_event_1hr][0],\n", " dst_datetime['datetime'][idx_event_1hr][-1],\n", " dst_datetime.index[idx_event_1hr[0]][1], dst_datetime.index[idx_event_1hr[-1]][1])\n", "\n", "fig = plt.figure(figsize=(15,8))\n", "plt.plot( x_values, dst['dst'][idx_event_1hr].values, label='Dst Observed')\n", "plt.xticks(rotation=25)\n", "plt.title( title )\n", "\n", "# Dst Predicted at t0 and t1\n", "# Shift Dst[t1] to the right one hour to line up with the time axis\n", "ax = plt.gca()\n", "ax.plot(x_values, dst_predicted_t0, 'b.', label='Dst Predicted t0')\n", "ax.plot(x_values, np.concatenate(([np.nan],dst_predicted_t1[:-1])), 'r.', label='Dst Predicted t1')\n", "plt.grid(True)\n", "plt.legend(fontsize=14)\n", "plt.tight_layout()" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "id": "aXHnV7nubeCN" }, "outputs": [ { "data": { "text/plain": [ "\" Uncomment this block if you want to plot just the Dst predictions.\\n# Dst Predicted\\nfig = plt.figure(figsize=(15, 8))\\n\\nplt.plot(dst_predicted_t0, 'b', label='Dst Predicted t0')\\nplt.plot(dst_predicted_t1, 'r', label='Dst Predicted t1')\\nplt.grid(True)\\nplt.legend(fontsize=14)\\nplt.title('Dst Predicted')\\n\\nplt.tight_layout()\\n\"" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "''' Uncomment this block if you want to plot just the Dst predictions.\n", "# Dst Predicted\n", "fig = plt.figure(figsize=(15, 8))\n", "\n", "plt.plot(dst_predicted_t0, 'b', label='Dst Predicted t0')\n", "plt.plot(dst_predicted_t1, 'r', label='Dst Predicted t1')\n", "plt.grid(True)\n", "plt.legend(fontsize=14)\n", "plt.title('Dst Predicted')\n", "\n", "plt.tight_layout()\n", "'''" ] }, { "cell_type": "markdown", "metadata": { "id": "-b9wR3IOEvwJ" }, "source": [ "
\n", "Exercise: Prediction Performance:\n", "\n", "How does our LSTM model output for this event look compared to our observed Dst?\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "jn5uRW1kDq2p" }, "source": [ "## Student Exercise: Additional Case Studies and Degraded Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercises\n", "\n", "### Additional Excercises - Using the current notebook\n", "\n", "The following exercises are designed to help expand your intuition, by extending concepts in earlier sections. You should find these straight forward to engage in, using the materials in this notebook plus a couple of supplementary resources as indicated below." ] }, { "cell_type": "markdown", "metadata": { "id": "de9Wow0cEvv3" }, "source": [ "
\n", " Exercise: Together with the Chapter 1 notebook you can adjust the list of inputs (features) used to train the model and compare the losses, as well as performance in predicting specific storms.\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "mR73zQn-Evv3" }, "source": [ "
\n", "Exercise: Are there any additional inputs (features) we should consider adding to improve our prediction of Dst?\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "-b9wR3IOEvwJ" }, "source": [ "
\n", "Exercise: Improving Performance: \n", " \n", "What hyper parameter changes to the LSTM architecture might you explore to increase its performance?\n", "
\n", "\n", "
\n", "Hint: You can use the model loading section of this notebook to load a different pre-trained LSTM model. Or you can use the model definition section of the Chapter 1 notebook to adjust the hyper parameters and train a new model.\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "6NOmthg4EvwG" }, "source": [ "
\n", "Exercise: Instrument Availability:\n", "\n", "If one or more solar wind instruments were to degrade on orbit how might this impact model performance?\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "Hint: Review the model sensitivities to the input parameters via the permutation importance section.\n", "
" ] }, { "cell_type": "markdown", "metadata": { "id": "HV8qus9mEvwG" }, "source": [ "
\n", "Exercise: LSTM versus CNN:\n", "\n", "- For the baseline storm event above, how does our LSTM model performance compare to the TAI4ES CNN notebook?\n", "- Compare and contrast the influence of features on this notebook's LSTM model with those in the TAI4ES CNN notebook. \n", "- How might these differences speak to differences in the performance of the two models?\n", "- Are there storm events where LSTM is close to performing as well as the CNN model?\n", "- Are there phases or characteristics of different storm events where LSTM or CNN do better than each other?\n", "
\n", "\n", "
\n", "Tip: You can use these storm phase descriptions for contextualizing your findings:\n", "\n", "* Climatology / quiet periods: Dst is generally horizontal and nearly 0 nano-Tesla.\n", "* Sudden Impulse: Dst rises from near 0 to positive values rapidly over a few hours.\n", "* Storm Sudden Commencement and Main Phase: Dst drops sharply and remains significantly negative for up to several days.\n", "* Storm Peak: Dst reaches its minimum (most negative) value.\n", "* Recovery Phase: Dst recovers from large negative values back to climatology, near 0 nano-Tesla." ] }, { "cell_type": "markdown", "metadata": { "id": "9kEgSgWZbeCN" }, "source": [ "
\n", "Exercise: Degraded Observations\n", "\n", "Degrade the instrument measurements and run the model to see how the performance is impacted. Start simple by adding Gaussian noise (mean 0), to the least important and the most important input parameters (aka features) and evaluating a specific event.\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Next steps\n", "\n", "Congragulations on engaging with the learning objectives of this Chapter 2 LSTM focused notebook--the benchmark from the NOAA MagNet competition. There is one additional Chapter 1 notebook in the MagNet LSTM series, on model development in case you didn't start there.\n", "\n", "There is an additional NCAI notebook in preparation for this MagNet series:\n", "A higher performing ensemble Convolutional Neural Netowork (CNN) from the NOAA Geomagnetism team based on the 2nd place entry from the MagNet competition. \n", "As mentioned in an earlier section, this notebook's precursor is the [TAI4ES Space Weather CNN Notebook](https://github.com/ai2es/tai4es-trustathon-2022/tree/main/space)\n", "\n", "Additionally, a web search will provide other Dst modeling notebooks and publications using ML techniques." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Examples in the community\n", "\n", "For a comprehensive treatment of the need to build robust predictions of the Dst space weather storm indicator (e.g. for magnetic navigation applications), see Nair et al., 2023 and references therein:\n", "* Nair et al., 2023 (in press) (TODO: Update with public URL as soon as available),\n", "\n", "For a summary, see:\n", "* https://www.drivendata.org/competitions/73/noaa-magnetic-forecasting/\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Data statement\n", "The competition discussed above used public data for development and the public leaderboard. A private dataset was kept internal during the competition for use in scoring by the organizers. Since the competition has passed, both datasets are publicly accessible from NOAA.\n", "\n", "All data used in this notebook are publicly available here:\n", "* https://ngdc.noaa.gov/geomag/data/geomag/magnet/public.zip\n", "* https://ngdc.noaa.gov/geomag/data/geomag/magnet/private.zip" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## References\n", "\n", "* Nair, M., Redmon, R.J., Young, L.Y., Chulliat, A., Trotta, B., Chung, C., Lipstein, G., Slavitt, I. (2023),\"MagNet - a data-science competition to predict Disturbance Storm-time index (Dst) from solar wind data\", Space Weather, In Press.\n", "* [CIRES GeoMag MagNet repository](https://github.com/liyo6397/MagNet/), TODO: update URL to new CIRES repo.\n", "* [Trustworthy Artificial Intelligence for Environmental Science 2022 Summer School](https://www2.cisl.ucar.edu/events/tai4es-2022-summer-school), TAI4ES, accessed July 2022.\n", "* [TAI4ES Space Weather Notebooks (LSTM, CNN)](https://github.com/ai2es/tai4es-trustathon-2022/tree/main/space), GitHub, accessed July 2022.\n", "* [MagNet: Model the Geomagnetic Field](https://ngdc.noaa.gov/geomag/mag-net-challenge.html), NOAA, accessed March 2022.\n", "* Chung, C. (2020), \"HOW TO PREDICT DISTURBANCES IN THE GEOMAGENTIC FIELD WITH LSTMS - BENCHMARK\", Blogpost, Accessed March 2022, Available Online: https://drivendata.co/blog/model-geomagnetic-field-benchmark/.\n", "* DrivenData (2020), \"MagNet: Model the Geomagnetic Field\", Web Resource, Accessed March 2022, Available Online: https://www.drivendata.org/competitions/73/noaa-magnetic-forecasting/.\n", "* [Interpretable Machine Learning by Christop Molnar](https://christophm.github.io/interpretable-ml-book/shap.html)\n", "* Redmon, R. J., Seaton, D. B., Steenburgh, R., He, J., & Rodriguez, J. V. (2018). September 2017's geoeffective space weather and impacts to Caribbean radio communications during hurricane response. Space Weather, 16, 1190–1201. https://doi.org/10.1029/2018SW001897" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Metadata\n", " * Language / package(s):\n", " * Language: Python, \n", " * Packages: Keras Tensor Flow, Matplotlib, Numpy, Pandas, Scikit-learn\n", " * Scientific domain:\n", " * Space Weather, Geomagnetic modeling\n", " * Application keywords\n", " * Magnetic Navigation\n", " * Geophysical keywords\n", " * Disturbance Storm Index (Dst), Solar Wind\n", " * AI keywords\n", " * Long Short-Term Memory (LSTM)\n", " * Explainable AI (XAI), Permutation Feature Importance" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## License\n", "\n", "### Software and Content Description License\n", "Software code created by U.S. Government employees is not subject to copyright in the United States (17 U.S.C. §105). The United States/Department of Commerce reserve all rights to seek and obtain copyright protection in countries other than the United States for Software authored in its entirety by the Department of Commerce. To this end, the Department of Commerce hereby grants to Recipient a royalty-free, nonexclusive license to use, copy, and create derivative works of the Software outside of the United States." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Disclaimer\n", "\n", "> This Jupyter notebook is a scientific product and is not official communication of the National Oceanic and Atmospheric Administration, or the United States Department of Commerce. All NOAA Jupyter notebooks are provided on an 'as is' basis and the user assumes responsibility for its use. Any claims against the Department of Commerce or Department of Commerce bureaus stemming from the use of this Jupyter notebook will be governed by all applicable Federal law. Any reference to specific commercial products, processes, or services by service mark, trademark, manufacturer, or otherwise does not constitute or imply their endorsement, recommendation or favoring by the Department of Commerce. The Department of Commerce seal and logo, or the seal and logo of a DOC bureau, shall not be used in any manner to imply endorsement of any commercial product or activity by DOC or the United States Government." ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [ "v003LoxwbeB6", "kRzCSuRwbeB7", "YIeZxbJXbeB7", "CdZDisojbeB8", "kJQsAZGlbeB8", "GCrRyAmibeB9", "5WEhRi6RMX0o", "AVRI8V_AI3NS", "ZDtowUzbMXNx", "Vk0h7YPGMxUw", "Is66dqN0M3VP", "5HoW8LxqM8eS", "J261z_FvOK_p", "HqKPO3qXOWDb", "_islsePpOgRI", "DHXdMwzzTHaF", "wkwxYtOqT8WT", "nl-bgU9WHXQO", "yLbZKYfgEjoT", "dWs2MeMe0ZFm", "acm3GxIUbeCL", "jn5uRW1kDq2p", "9kEgSgWZbeCN" ], "gpuType": "V100", "machine_shape": "hm", "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.16" } }, "nbformat": 4, "nbformat_minor": 4 }