{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](../docs/banner.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 8: Basic Data Wrangling With Pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h2>Chapter Outline<span class=\"tocSkip\"></span></h2>\n",
    "<hr>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#1.-DataFrame-Characteristics\" data-toc-modified-id=\"1.-DataFrame-Characteristics-2\">1. DataFrame Characteristics</a></span></li><li><span><a href=\"#2.-Basic-DataFrame-Manipulations\" data-toc-modified-id=\"2.-Basic-DataFrame-Manipulations-3\">2. Basic DataFrame Manipulations</a></span></li><li><span><a href=\"#3.-DataFrame-Reshaping\" data-toc-modified-id=\"3.-DataFrame-Reshaping-4\">3. DataFrame Reshaping</a></span></li><li><span><a href=\"#4.-Working-with-Multiple-DataFrames\" data-toc-modified-id=\"4.-Working-with-Multiple-DataFrames-5\">4. Working with Multiple DataFrames</a></span></li><li><span><a href=\"#5.-More-DataFrame-Operations\" data-toc-modified-id=\"5.-More-DataFrame-Operations-6\">5. More DataFrame Operations</a></span></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chapter Learning Objectives\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Inspect a dataframe with `df.head()`, `df.tail()`, `df.info()`, `df.describe()`.\n",
    "- Obtain dataframe summaries with `df.info()` and `df.describe()`.\n",
    "- Manipulate how a dataframe displays in Jupyter by modifying Pandas configuration options such as `pd.set_option(\"display.max_rows\", n)`.\n",
    "- Rename columns of a dataframe using the `df.rename()` function or by accessing the `df.columns` attribute.\n",
    "- Modify the index name and index values of a dataframe using `.set_index()`, `.reset_index()` , `df.index.name`, `.index`.\n",
    "- Use `df.melt()` and `df.pivot()` to reshape dataframes, specifically to make tidy dataframes.\n",
    "- Combine dataframes using `df.merge()` and `pd.concat()` and know when to use these different methods.\n",
    "- Apply functions to a dataframe `df.apply()` and `df.applymap()`\n",
    "- Perform grouping and aggregating operations using `df.groupby()` and `df.agg()`.\n",
    "- Perform aggregating methods on grouped or ungrouped objects such as finding the minimum, maximum and sum of values in a dataframe using `df.agg()`.\n",
    "- Remove or fill missing values in a dataframe with `df.dropna()` and `df.fillna()`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. DataFrame Characteristics\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Last chapter we looked at how we can create dataframes. Let's now look at some helpful ways we can view our dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Head/Tail"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `.head()` and `.tail()` methods allow you to view the top/bottom *n* (default 5) rows of a dataframe. Let's load in the cycling data set from last chapter and try them out:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    Date            Name  Type  Time  Distance  \\\n",
       "0  10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1  10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2  11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3  11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4  12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "\n",
       "                       Comments  \n",
       "0                          Rain  \n",
       "1                          rain  \n",
       "2     Wet road but nice weather  \n",
       "3  Stopped for photo of sunrise  \n",
       "4  Tired by the end of the week  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('data/cycling_data.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The default return value is 5 rows, but we can pass in any number we like. For example, let's take a look at the top 10 rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>16 Sep 2019, 13:57:48</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2272</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>17 Sep 2019, 00:15:47</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1973</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>17 Sep 2019, 13:43:34</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2285</td>\n",
       "      <td>12.60</td>\n",
       "      <td>Raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>18 Sep 2019, 13:49:53</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2903</td>\n",
       "      <td>14.57</td>\n",
       "      <td>Raining today</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>18 Sep 2019, 00:15:52</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2101</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Pumped up tires</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    Date            Name  Type  Time  Distance  \\\n",
       "0  10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1  10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2  11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3  11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4  12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "5  16 Sep 2019, 13:57:48    Morning Ride  Ride  2272     12.45   \n",
       "6  17 Sep 2019, 00:15:47  Afternoon Ride  Ride  1973     12.45   \n",
       "7  17 Sep 2019, 13:43:34    Morning Ride  Ride  2285     12.60   \n",
       "8  18 Sep 2019, 13:49:53    Morning Ride  Ride  2903     14.57   \n",
       "9  18 Sep 2019, 00:15:52  Afternoon Ride  Ride  2101     12.48   \n",
       "\n",
       "                       Comments  \n",
       "0                          Rain  \n",
       "1                          rain  \n",
       "2     Wet road but nice weather  \n",
       "3  Stopped for photo of sunrise  \n",
       "4  Tired by the end of the week  \n",
       "5     Rested after the weekend!  \n",
       "6          Legs feeling strong!  \n",
       "7                       Raining  \n",
       "8                 Raining today  \n",
       "9               Pumped up tires  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or the bottom 5 rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  "
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### DataFrame Summaries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Three very helpful attributes/functions for getting high-level summaries of your dataframe are:\n",
    "- `.shape`\n",
    "- `.info()`\n",
    "- `.describe()`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`.shape` is just like the ndarray attribute we've seen previously. It gives the shape (rows, cols) of your dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(33, 6)"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`.info()` prints information about the dataframe itself, such as dtypes, memory usages, non-null values, etc:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 33 entries, 0 to 32\n",
      "Data columns (total 6 columns):\n",
      " #   Column    Non-Null Count  Dtype  \n",
      "---  ------    --------------  -----  \n",
      " 0   Date      33 non-null     object \n",
      " 1   Name      33 non-null     object \n",
      " 2   Type      33 non-null     object \n",
      " 3   Time      33 non-null     int64  \n",
      " 4   Distance  31 non-null     float64\n",
      " 5   Comments  33 non-null     object \n",
      "dtypes: float64(1), int64(1), object(4)\n",
      "memory usage: 1.7+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`.describe()` provides summary statistics of the values within a dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>33.000000</td>\n",
       "      <td>31.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>3512.787879</td>\n",
       "      <td>12.667419</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>8003.309233</td>\n",
       "      <td>0.428618</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1712.000000</td>\n",
       "      <td>11.790000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>1863.000000</td>\n",
       "      <td>12.480000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>2118.000000</td>\n",
       "      <td>12.620000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>2285.000000</td>\n",
       "      <td>12.750000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>48062.000000</td>\n",
       "      <td>14.570000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "               Time   Distance\n",
       "count     33.000000  31.000000\n",
       "mean    3512.787879  12.667419\n",
       "std     8003.309233   0.428618\n",
       "min     1712.000000  11.790000\n",
       "25%     1863.000000  12.480000\n",
       "50%     2118.000000  12.620000\n",
       "75%     2285.000000  12.750000\n",
       "max    48062.000000  14.570000"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, `.describe()` only print summaries of numeric features. We can force it to give summaries on all features using the argument `include='all'` (although they may not make sense!):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>33</td>\n",
       "      <td>33</td>\n",
       "      <td>33</td>\n",
       "      <td>33.000000</td>\n",
       "      <td>31.000000</td>\n",
       "      <td>33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>33</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>25</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>20 Sep 2019, 01:02:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>1</td>\n",
       "      <td>17</td>\n",
       "      <td>33</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3512.787879</td>\n",
       "      <td>12.667419</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>8003.309233</td>\n",
       "      <td>0.428618</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1712.000000</td>\n",
       "      <td>11.790000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1863.000000</td>\n",
       "      <td>12.480000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2118.000000</td>\n",
       "      <td>12.620000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2285.000000</td>\n",
       "      <td>12.750000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>48062.000000</td>\n",
       "      <td>14.570000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         Date            Name  Type          Time   Distance  \\\n",
       "count                      33              33    33     33.000000  31.000000   \n",
       "unique                     33               2     1           NaN        NaN   \n",
       "top     20 Sep 2019, 01:02:05  Afternoon Ride  Ride           NaN        NaN   \n",
       "freq                        1              17    33           NaN        NaN   \n",
       "mean                      NaN             NaN   NaN   3512.787879  12.667419   \n",
       "std                       NaN             NaN   NaN   8003.309233   0.428618   \n",
       "min                       NaN             NaN   NaN   1712.000000  11.790000   \n",
       "25%                       NaN             NaN   NaN   1863.000000  12.480000   \n",
       "50%                       NaN             NaN   NaN   2118.000000  12.620000   \n",
       "75%                       NaN             NaN   NaN   2285.000000  12.750000   \n",
       "max                       NaN             NaN   NaN  48062.000000  14.570000   \n",
       "\n",
       "                         Comments  \n",
       "count                          33  \n",
       "unique                         25  \n",
       "top     Rested after the weekend!  \n",
       "freq                            3  \n",
       "mean                          NaN  \n",
       "std                           NaN  \n",
       "min                           NaN  \n",
       "25%                           NaN  \n",
       "50%                           NaN  \n",
       "75%                           NaN  \n",
       "max                           NaN  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe(include='all')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Displaying DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Displaying your dataframes effectively can be an important part of your workflow. If a dataframe has more than 60 rows, Pandas will only display the first 5 and last 5 rows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.643224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.617756</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.650490</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.289595</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.469394</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95</th>\n",
       "      <td>0.549403</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>96</th>\n",
       "      <td>0.187836</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>97</th>\n",
       "      <td>0.016904</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>98</th>\n",
       "      <td>0.733392</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99</th>\n",
       "      <td>0.875343</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>100 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "           0\n",
       "0   0.643224\n",
       "1   0.617756\n",
       "2   0.650490\n",
       "3   0.289595\n",
       "4   0.469394\n",
       "..       ...\n",
       "95  0.549403\n",
       "96  0.187836\n",
       "97  0.016904\n",
       "98  0.733392\n",
       "99  0.875343\n",
       "\n",
       "[100 rows x 1 columns]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(np.random.rand(100))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For dataframes of less than 60 rows, Pandas will print the whole dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>16 Sep 2019, 13:57:48</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2272</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>17 Sep 2019, 00:15:47</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1973</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>17 Sep 2019, 13:43:34</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2285</td>\n",
       "      <td>12.60</td>\n",
       "      <td>Raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>18 Sep 2019, 13:49:53</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2903</td>\n",
       "      <td>14.57</td>\n",
       "      <td>Raining today</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>18 Sep 2019, 00:15:52</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2101</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Pumped up tires</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>19 Sep 2019, 00:30:01</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>48062</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>19 Sep 2019, 13:52:09</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2090</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Getting colder which is nice</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>20 Sep 2019, 01:02:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2961</td>\n",
       "      <td>12.81</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>23 Sep 2019, 13:50:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2462</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>24 Sep 2019, 00:35:42</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2076</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Oiled chain, bike feels smooth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>24 Sep 2019, 13:41:24</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2321</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Bike feeling much smoother</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>25 Sep 2019, 00:07:21</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1775</td>\n",
       "      <td>12.10</td>\n",
       "      <td>Feeling really tired</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>25 Sep 2019, 13:35:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2124</td>\n",
       "      <td>12.65</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>26 Sep 2019, 00:13:33</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1860</td>\n",
       "      <td>12.52</td>\n",
       "      <td>raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>26 Sep 2019, 13:42:43</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2350</td>\n",
       "      <td>12.91</td>\n",
       "      <td>Detour around trucks at Jericho</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>27 Sep 2019, 01:00:18</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1712</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>30 Sep 2019, 13:53:52</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2118</td>\n",
       "      <td>12.71</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1 Oct 2019, 00:15:07</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1732</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1 Oct 2019, 13:45:55</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2222</td>\n",
       "      <td>12.82</td>\n",
       "      <td>Beautiful morning! Feeling fit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2 Oct 2019, 00:13:09</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1756</td>\n",
       "      <td>NaN</td>\n",
       "      <td>A little tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>2 Oct 2019, 13:46:06</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2134</td>\n",
       "      <td>13.06</td>\n",
       "      <td>Bit tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>3 Oct 2019, 00:45:22</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1724</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>3 Oct 2019, 13:47:36</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2182</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Wet road</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type   Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride   2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride   2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride   1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride   2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride   1891     12.48   \n",
       "5   16 Sep 2019, 13:57:48    Morning Ride  Ride   2272     12.45   \n",
       "6   17 Sep 2019, 00:15:47  Afternoon Ride  Ride   1973     12.45   \n",
       "7   17 Sep 2019, 13:43:34    Morning Ride  Ride   2285     12.60   \n",
       "8   18 Sep 2019, 13:49:53    Morning Ride  Ride   2903     14.57   \n",
       "9   18 Sep 2019, 00:15:52  Afternoon Ride  Ride   2101     12.48   \n",
       "10  19 Sep 2019, 00:30:01  Afternoon Ride  Ride  48062     12.48   \n",
       "11  19 Sep 2019, 13:52:09    Morning Ride  Ride   2090     12.59   \n",
       "12  20 Sep 2019, 01:02:05  Afternoon Ride  Ride   2961     12.81   \n",
       "13  23 Sep 2019, 13:50:41    Morning Ride  Ride   2462     12.68   \n",
       "14  24 Sep 2019, 00:35:42  Afternoon Ride  Ride   2076     12.47   \n",
       "15  24 Sep 2019, 13:41:24    Morning Ride  Ride   2321     12.68   \n",
       "16  25 Sep 2019, 00:07:21  Afternoon Ride  Ride   1775     12.10   \n",
       "17  25 Sep 2019, 13:35:41    Morning Ride  Ride   2124     12.65   \n",
       "18  26 Sep 2019, 00:13:33  Afternoon Ride  Ride   1860     12.52   \n",
       "19  26 Sep 2019, 13:42:43    Morning Ride  Ride   2350     12.91   \n",
       "20  27 Sep 2019, 01:00:18  Afternoon Ride  Ride   1712     12.47   \n",
       "21  30 Sep 2019, 13:53:52    Morning Ride  Ride   2118     12.71   \n",
       "22   1 Oct 2019, 00:15:07  Afternoon Ride  Ride   1732       NaN   \n",
       "23   1 Oct 2019, 13:45:55    Morning Ride  Ride   2222     12.82   \n",
       "24   2 Oct 2019, 00:13:09  Afternoon Ride  Ride   1756       NaN   \n",
       "25   2 Oct 2019, 13:46:06    Morning Ride  Ride   2134     13.06   \n",
       "26   3 Oct 2019, 00:45:22  Afternoon Ride  Ride   1724     12.52   \n",
       "27   3 Oct 2019, 13:47:36    Morning Ride  Ride   2182     12.68   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride   1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride   2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride   1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride   2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride   1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "5                   Rested after the weekend!  \n",
       "6                        Legs feeling strong!  \n",
       "7                                     Raining  \n",
       "8                               Raining today  \n",
       "9                             Pumped up tires  \n",
       "10                               Feeling good  \n",
       "11               Getting colder which is nice  \n",
       "12                               Feeling good  \n",
       "13                  Rested after the weekend!  \n",
       "14             Oiled chain, bike feels smooth  \n",
       "15                 Bike feeling much smoother  \n",
       "16                       Feeling really tired  \n",
       "17               Stopped for photo of sunrise  \n",
       "18                                    raining  \n",
       "19            Detour around trucks at Jericho  \n",
       "20               Tired by the end of the week  \n",
       "21                  Rested after the weekend!  \n",
       "22                       Legs feeling strong!  \n",
       "23             Beautiful morning! Feeling fit  \n",
       "24      A little tired today but good weather  \n",
       "25           Bit tired today but good weather  \n",
       "26                               Feeling good  \n",
       "27                                   Wet road  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I find the 60 row threshold to be a little too much, I prefer something more like 20. You can change the setting using `pd.set_option(\"display.max_rows\", 20)` so that anything with more than 20 rows will be summarised by the first and last 5 rows as before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.set_option(\"display.max_rows\", 20)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are also other display options you can change, such as how many columns are shown, how numbers are formatted, etc. See the [official documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html#options-and-settings) for more.\n",
    "\n",
    "One display option I will point out is that Pandas allows you to style your tables, for example by highlighting negative values, or adding conditional colour maps to your dataframe. Below I'll style values based on their value ranging from negative (purple) to postive (yellow) but you can see the [styling documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html#Styling) for more examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style  type=\"text/css\" >\n",
       "#T_d358424a_454c_11eb_9e92_acde48001122row0_col0{\n",
       "            background-color:  #8004a8;\n",
       "            color:  #f1f1f1;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row0_col1{\n",
       "            background-color:  #5002a2;\n",
       "            color:  #f1f1f1;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row0_col2,#T_d358424a_454c_11eb_9e92_acde48001122row1_col0,#T_d358424a_454c_11eb_9e92_acde48001122row3_col1,#T_d358424a_454c_11eb_9e92_acde48001122row3_col4,#T_d358424a_454c_11eb_9e92_acde48001122row4_col3{\n",
       "            background-color:  #0d0887;\n",
       "            color:  #f1f1f1;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row0_col3,#T_d358424a_454c_11eb_9e92_acde48001122row1_col2,#T_d358424a_454c_11eb_9e92_acde48001122row2_col0,#T_d358424a_454c_11eb_9e92_acde48001122row4_col1,#T_d358424a_454c_11eb_9e92_acde48001122row4_col4{\n",
       "            background-color:  #f0f921;\n",
       "            color:  #000000;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row0_col4{\n",
       "            background-color:  #9d189d;\n",
       "            color:  #f1f1f1;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row1_col1{\n",
       "            background-color:  #feb72d;\n",
       "            color:  #000000;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row1_col3{\n",
       "            background-color:  #2f0596;\n",
       "            color:  #f1f1f1;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row1_col4{\n",
       "            background-color:  #febe2a;\n",
       "            color:  #000000;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row2_col1{\n",
       "            background-color:  #f58b47;\n",
       "            color:  #000000;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row2_col2{\n",
       "            background-color:  #a51f99;\n",
       "            color:  #f1f1f1;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row2_col3{\n",
       "            background-color:  #ed7953;\n",
       "            color:  #000000;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row2_col4{\n",
       "            background-color:  #ea7457;\n",
       "            color:  #000000;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row3_col0{\n",
       "            background-color:  #fad824;\n",
       "            color:  #000000;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row3_col2{\n",
       "            background-color:  #fa9c3c;\n",
       "            color:  #000000;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row3_col3{\n",
       "            background-color:  #5e01a6;\n",
       "            color:  #f1f1f1;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row4_col0{\n",
       "            background-color:  #d04d73;\n",
       "            color:  #000000;\n",
       "        }#T_d358424a_454c_11eb_9e92_acde48001122row4_col2{\n",
       "            background-color:  #f1814d;\n",
       "            color:  #000000;\n",
       "        }</style><table id=\"T_d358424a_454c_11eb_9e92_acde48001122\" ><thead>    <tr>        <th class=\"blank level0\" ></th>        <th class=\"col_heading level0 col0\" >feature_0</th>        <th class=\"col_heading level0 col1\" >feature_1</th>        <th class=\"col_heading level0 col2\" >feature_2</th>        <th class=\"col_heading level0 col3\" >feature_3</th>        <th class=\"col_heading level0 col4\" >feature_4</th>    </tr></thead><tbody>\n",
       "                <tr>\n",
       "                        <th id=\"T_d358424a_454c_11eb_9e92_acde48001122level0_row0\" class=\"row_heading level0 row0\" >row_0</th>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row0_col0\" class=\"data row0 col0\" >-0.519284</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row0_col1\" class=\"data row0 col1\" >-0.553179</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row0_col2\" class=\"data row0 col2\" >-1.768810</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row0_col3\" class=\"data row0 col3\" >1.023342</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row0_col4\" class=\"data row0 col4\" >-0.668625</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_d358424a_454c_11eb_9e92_acde48001122level0_row1\" class=\"row_heading level0 row1\" >row_1</th>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row1_col0\" class=\"data row1 col0\" >-1.064582</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row1_col1\" class=\"data row1 col1\" >0.205538</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row1_col2\" class=\"data row1 col2\" >0.350555</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row1_col3\" class=\"data row1 col3\" >-1.942761</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row1_col4\" class=\"data row1 col4\" >1.594635</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_d358424a_454c_11eb_9e92_acde48001122level0_row2\" class=\"row_heading level0 row2\" >row_2</th>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row2_col0\" class=\"data row2 col0\" >1.057886</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row2_col1\" class=\"data row2 col1\" >0.074919</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row2_col2\" class=\"data row2 col2\" >-0.998969</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row2_col3\" class=\"data row2 col3\" >-0.030725</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row2_col4\" class=\"data row2 col4\" >0.677628</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_d358424a_454c_11eb_9e92_acde48001122level0_row3\" class=\"row_heading level0 row3\" >row_3</th>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row3_col0\" class=\"data row3 col0\" >0.898228</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row3_col1\" class=\"data row3 col1\" >-0.697138</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row3_col2\" class=\"data row3 col2\" >-0.134935</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row3_col3\" class=\"data row3 col3\" >-1.592920</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row3_col4\" class=\"data row3 col4\" >-2.127225</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_d358424a_454c_11eb_9e92_acde48001122level0_row4\" class=\"row_heading level0 row4\" >row_4</th>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row4_col0\" class=\"data row4 col0\" >0.039582</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row4_col1\" class=\"data row4 col1\" >0.372877</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row4_col2\" class=\"data row4 col2\" >-0.299539</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row4_col3\" class=\"data row4 col3\" >-2.132972</td>\n",
       "                        <td id=\"T_d358424a_454c_11eb_9e92_acde48001122row4_col4\" class=\"data row4 col4\" >2.194797</td>\n",
       "            </tr>\n",
       "    </tbody></table>"
      ],
      "text/plain": [
       "<pandas.io.formats.style.Styler at 0x1577c5250>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test = pd.DataFrame(np.random.randn(5, 5),\n",
    "                    index = [f\"row_{_}\" for _ in range(5)],\n",
    "                    columns = [f\"feature_{_}\" for _ in range(5)])\n",
    "test.style.background_gradient(cmap='plasma')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Views vs Copies"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In previous chapters we've discussed views (\"looking\" at a part of an existing object) and copies (making a new copy of the object in memory). These things get a little abstract with Pandas and \"...it’s very hard to predict whether it will return a view or a copy\" (that's a quote straight [from a dedicated section in the Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy)).\n",
    "\n",
    "Basically, it depends on the operation you are trying to perform, your dataframe's structure and the memory layout of the underlying array. But don't worry, let me tell you all you need to know. Firstly, the most common warning you'll encounter in Pandas is the `SettingWithCopy`, Pandas raises it as a warning that you might not be doing what you think you're doing. Let's see an example. You may recall there is one outlier `Time` in our dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>19 Sep 2019, 00:30:01</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>48062</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type   Time  Distance      Comments\n",
       "10  19 Sep 2019, 00:30:01  Afternoon Ride  Ride  48062     12.48  Feeling good"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['Time'] > 4000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Imagine we wanted to change this to `2000`. You'd probably do the following:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/opt/miniconda3/lib/python3.7/site-packages/ipykernel_launcher.py:1: SettingWithCopyWarning: \n",
      "A value is trying to be set on a copy of a slice from a DataFrame.\n",
      "Try using .loc[row_indexer,col_indexer] = value instead\n",
      "\n",
      "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
      "  \"\"\"Entry point for launching an IPython kernel.\n"
     ]
    }
   ],
   "source": [
    "df[df['Time'] > 4000]['Time'] = 2000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Ah, there's that warning. Did our dataframe get changed?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>19 Sep 2019, 00:30:01</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>48062</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type   Time  Distance      Comments\n",
       "10  19 Sep 2019, 00:30:01  Afternoon Ride  Ride  48062     12.48  Feeling good"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['Time'] > 4000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "No it didn't, even though you probably thought it did. What happened above is that `df[df['Time'] > 4000]` was executed first and returned a copy of the dataframe, we can confirm by using `id()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The id of the original dataframe is: 5762156560\n",
      " The id of the indexed dataframe is: 5781171152\n"
     ]
    }
   ],
   "source": [
    "print(f\"The id of the original dataframe is: {id(df)}\")\n",
    "print(f\" The id of the indexed dataframe is: {id(df[df['Time'] > 4000])}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then tried to set a value on this new object by appending `['Time'] = 2000`. Pandas is warning us that we are doing that operation on a copy of the original dataframe, which is probably not what we want. To fix this, you need to index in a single go, using `.loc[]` for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.loc[df['Time'] > 4000, 'Time'] = 2000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "No error this time! And let's confirm the change:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [Date, Name, Type, Time, Distance, Comments]\n",
       "Index: []"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['Time'] > 4000]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The second thing you need to know is that if you're ever in doubt about whether something is a view or a copy, you can just use the `.copy()` method to force a copy of a dataframe. Just like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2 = df[df['Time'] > 4000].copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That way, your guaranteed a copy that you can modify as you wish."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Basic DataFrame Manipulations\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Renaming Columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can rename columns two ways:\n",
    "1. Using `.rename()` (to selectively change column names)\n",
    "2. By setting the `.columns` attribute (to change all column names at once)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's give it a go:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.rename(columns={\"Date\": \"Datetime\",\n",
    "                   \"Comments\": \"Notes\"})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Wait? What happened? Nothing changed? In the code above we did actually rename columns of our dataframe but we didn't modify the dataframe inplace, we made a copy of it. There are generally two options for making permanent dataframe changes:\n",
    "- 1. Use the argument `inplace=True`, e.g., `df.rename(..., inplace=True)`, available in most functions/methods\n",
    "- 2. Re-assign, e.g., `df = df.rename(...)`\n",
    "The Pandas team recommends **Method 2 (re-assign)**, for a [few reasons](https://www.youtube.com/watch?v=hK6o_TDXXN8&t=700) (mostly to do with how memory is allocated under the hood)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Datetime</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Notes</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Datetime            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                        Notes  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.rename(columns={\"Date\": \"Datetime\",\n",
    "                        \"Comments\": \"Notes\"})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If you wish to change all of the columns of a dataframe, you can do so by setting the `.columns` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Column 1</th>\n",
       "      <th>Column 2</th>\n",
       "      <th>Column 3</th>\n",
       "      <th>Column 4</th>\n",
       "      <th>Column 5</th>\n",
       "      <th>Column 6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Column 1        Column 2 Column 3  Column 4  Column 5  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride     Ride      2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride     Ride      2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride     Ride      1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride     Ride      2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride     Ride      1891     12.48   \n",
       "..                    ...             ...      ...       ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride     Ride      1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride     Ride      2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride     Ride      1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride     Ride      2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride     Ride      1843     11.79   \n",
       "\n",
       "                                     Column 6  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns = [f\"Column {_}\" for _ in range(1, 7)]\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Changing the Index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can change the index labels of a dataframe in 3 main ways:\n",
    "1. `.set_index()` to make one of the columns of the dataframe the index\n",
    "2. Directly modify `df.index.name` to change the index name\n",
    "3. `.reset_index()` to move the current index as a column and to reset the index with integer labels starting from 0\n",
    "4. Directly modify the `.index()` attribute"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Column 1</th>\n",
       "      <th>Column 2</th>\n",
       "      <th>Column 3</th>\n",
       "      <th>Column 4</th>\n",
       "      <th>Column 5</th>\n",
       "      <th>Column 6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Column 1        Column 2 Column 3  Column 4  Column 5  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride     Ride      2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride     Ride      2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride     Ride      1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride     Ride      2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride     Ride      1891     12.48   \n",
       "..                    ...             ...      ...       ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride     Ride      1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride     Ride      2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride     Ride      1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride     Ride      2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride     Ride      1843     11.79   \n",
       "\n",
       "                                     Column 6  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Below I will set the index as `Column 1` and rename the index to \"New Index\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Column 2</th>\n",
       "      <th>Column 3</th>\n",
       "      <th>Column 4</th>\n",
       "      <th>Column 5</th>\n",
       "      <th>Column 6</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>New Index</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10 Sep 2019, 00:13:04</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10 Sep 2019, 13:52:18</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11 Sep 2019, 00:23:50</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11 Sep 2019, 14:06:19</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12 Sep 2019, 00:28:05</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4 Oct 2019, 01:08:08</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9 Oct 2019, 13:55:40</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10 Oct 2019, 00:10:31</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10 Oct 2019, 13:47:14</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11 Oct 2019, 00:16:57</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                             Column 2 Column 3  Column 4  Column 5  \\\n",
       "New Index                                                            \n",
       "10 Sep 2019, 00:13:04  Afternoon Ride     Ride      2084     12.62   \n",
       "10 Sep 2019, 13:52:18    Morning Ride     Ride      2531     13.03   \n",
       "11 Sep 2019, 00:23:50  Afternoon Ride     Ride      1863     12.52   \n",
       "11 Sep 2019, 14:06:19    Morning Ride     Ride      2192     12.84   \n",
       "12 Sep 2019, 00:28:05  Afternoon Ride     Ride      1891     12.48   \n",
       "...                               ...      ...       ...       ...   \n",
       "4 Oct 2019, 01:08:08   Afternoon Ride     Ride      1870     12.63   \n",
       "9 Oct 2019, 13:55:40     Morning Ride     Ride      2149     12.70   \n",
       "10 Oct 2019, 00:10:31  Afternoon Ride     Ride      1841     12.59   \n",
       "10 Oct 2019, 13:47:14    Morning Ride     Ride      2463     12.79   \n",
       "11 Oct 2019, 00:16:57  Afternoon Ride     Ride      1843     11.79   \n",
       "\n",
       "                                                        Column 6  \n",
       "New Index                                                         \n",
       "10 Sep 2019, 00:13:04                                       Rain  \n",
       "10 Sep 2019, 13:52:18                                       rain  \n",
       "11 Sep 2019, 00:23:50                  Wet road but nice weather  \n",
       "11 Sep 2019, 14:06:19               Stopped for photo of sunrise  \n",
       "12 Sep 2019, 00:28:05               Tired by the end of the week  \n",
       "...                                                          ...  \n",
       "4 Oct 2019, 01:08:08            Very tired, riding into the wind  \n",
       "9 Oct 2019, 13:55:40               Really cold! But feeling good  \n",
       "10 Oct 2019, 00:10:31        Feeling good after a holiday break!  \n",
       "10 Oct 2019, 13:47:14               Stopped for photo of sunrise  \n",
       "11 Oct 2019, 00:16:57  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 5 columns]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.set_index(\"Column 1\")\n",
    "df.index.name = \"New Index\"\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I can send the index back to a column and have a default integer index using `.reset_index()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>New Index</th>\n",
       "      <th>Column 2</th>\n",
       "      <th>Column 3</th>\n",
       "      <th>Column 4</th>\n",
       "      <th>Column 5</th>\n",
       "      <th>Column 6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                New Index        Column 2 Column 3  Column 4  Column 5  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride     Ride      2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride     Ride      2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride     Ride      1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride     Ride      2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride     Ride      1891     12.48   \n",
       "..                    ...             ...      ...       ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride     Ride      1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride     Ride      2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride     Ride      1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride     Ride      2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride     Ride      1843     11.79   \n",
       "\n",
       "                                     Column 6  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.reset_index()\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like with column names, we can also modify the index directly, but I can't remember ever doing this, usually I'll use `.set_index()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=33, step=1)"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>New Index</th>\n",
       "      <th>Column 2</th>\n",
       "      <th>Column 3</th>\n",
       "      <th>Column 4</th>\n",
       "      <th>Column 5</th>\n",
       "      <th>Column 6</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>101</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>104</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>128</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>129</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>130</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>131</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                 New Index        Column 2 Column 3  Column 4  Column 5  \\\n",
       "100  10 Sep 2019, 00:13:04  Afternoon Ride     Ride      2084     12.62   \n",
       "101  10 Sep 2019, 13:52:18    Morning Ride     Ride      2531     13.03   \n",
       "102  11 Sep 2019, 00:23:50  Afternoon Ride     Ride      1863     12.52   \n",
       "103  11 Sep 2019, 14:06:19    Morning Ride     Ride      2192     12.84   \n",
       "104  12 Sep 2019, 00:28:05  Afternoon Ride     Ride      1891     12.48   \n",
       "..                     ...             ...      ...       ...       ...   \n",
       "128   4 Oct 2019, 01:08:08  Afternoon Ride     Ride      1870     12.63   \n",
       "129   9 Oct 2019, 13:55:40    Morning Ride     Ride      2149     12.70   \n",
       "130  10 Oct 2019, 00:10:31  Afternoon Ride     Ride      1841     12.59   \n",
       "131  10 Oct 2019, 13:47:14    Morning Ride     Ride      2463     12.79   \n",
       "132  11 Oct 2019, 00:16:57  Afternoon Ride     Ride      1843     11.79   \n",
       "\n",
       "                                      Column 6  \n",
       "100                                       Rain  \n",
       "101                                       rain  \n",
       "102                  Wet road but nice weather  \n",
       "103               Stopped for photo of sunrise  \n",
       "104               Tired by the end of the week  \n",
       "..                                         ...  \n",
       "128           Very tired, riding into the wind  \n",
       "129              Really cold! But feeling good  \n",
       "130        Feeling good after a holiday break!  \n",
       "131               Stopped for photo of sunrise  \n",
       "132  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index = range(100, 133, 1)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding/Removing Columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are two main ways to add/remove columns of a dataframe:\n",
    "1. Use `[]` to add columns\n",
    "2. Use `.drop()` to drop columns\n",
    "\n",
    "Let's re-read in a fresh copy of the cycling dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('data/cycling_data.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can add a new column to a dataframe by simply using `[]` with a new column name and value(s):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "      <th>Rider</th>\n",
       "      <th>Avg Speed</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>6.055662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>5.148163</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>6.720344</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>5.857664</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>6.599683</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>6.754011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>5.909725</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>6.838675</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>5.192854</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "      <td>Tom Beuzen</td>\n",
       "      <td>6.397179</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 8 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments       Rider  Avg Speed  \n",
       "0                                        Rain  Tom Beuzen   6.055662  \n",
       "1                                        rain  Tom Beuzen   5.148163  \n",
       "2                   Wet road but nice weather  Tom Beuzen   6.720344  \n",
       "3                Stopped for photo of sunrise  Tom Beuzen   5.857664  \n",
       "4                Tired by the end of the week  Tom Beuzen   6.599683  \n",
       "..                                        ...         ...        ...  \n",
       "28           Very tired, riding into the wind  Tom Beuzen   6.754011  \n",
       "29              Really cold! But feeling good  Tom Beuzen   5.909725  \n",
       "30        Feeling good after a holiday break!  Tom Beuzen   6.838675  \n",
       "31               Stopped for photo of sunrise  Tom Beuzen   5.192854  \n",
       "32  Bike feeling tight, needs an oil and pump  Tom Beuzen   6.397179  \n",
       "\n",
       "[33 rows x 8 columns]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Rider'] = 'Tom Beuzen'\n",
    "df['Avg Speed'] = df['Distance'] * 1000 / df['Time']  # avg. speed in m/s\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.drop(columns=['Rider', 'Avg Speed'])\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding/Removing Rows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You won't often be adding rows to a dataframe manually (you'll usually add rows through concatenating/joining - that's coming up next). You can add/remove rows of a dataframe in two ways:\n",
    "1. Use `.append()` to add rows\n",
    "2. Use `.drop()` to drop rows"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's add a new row to the bottom of this dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>12 Oct 2019, 00:10:57</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2331</td>\n",
       "      <td>12.67</td>\n",
       "      <td>Washed and oiled bike last night</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>34 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "33  12 Oct 2019, 00:10:57    Morning Ride  Ride  2331     12.67   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "33           Washed and oiled bike last night  \n",
       "\n",
       "[34 rows x 6 columns]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "another_row = pd.DataFrame([[\"12 Oct 2019, 00:10:57\", \"Morning Ride\", \"Ride\",\n",
    "                             2331, 12.67, \"Washed and oiled bike last night\"]],\n",
    "                           columns = df.columns,\n",
    "                           index = [33])\n",
    "df = df.append(another_row)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can drop all rows above index 30 using `.drop()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>2 Oct 2019, 13:46:06</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2134</td>\n",
       "      <td>13.06</td>\n",
       "      <td>Bit tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>3 Oct 2019, 00:45:22</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1724</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>3 Oct 2019, 13:47:36</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2182</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Wet road</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>30 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "25   2 Oct 2019, 13:46:06    Morning Ride  Ride  2134     13.06   \n",
       "26   3 Oct 2019, 00:45:22  Afternoon Ride  Ride  1724     12.52   \n",
       "27   3 Oct 2019, 13:47:36    Morning Ride  Ride  2182     12.68   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "\n",
       "                            Comments  \n",
       "0                               Rain  \n",
       "1                               rain  \n",
       "2          Wet road but nice weather  \n",
       "3       Stopped for photo of sunrise  \n",
       "4       Tired by the end of the week  \n",
       "..                               ...  \n",
       "25  Bit tired today but good weather  \n",
       "26                      Feeling good  \n",
       "27                          Wet road  \n",
       "28  Very tired, riding into the wind  \n",
       "29     Really cold! But feeling good  \n",
       "\n",
       "[30 rows x 6 columns]"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.drop(index=range(30, 34))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. DataFrame Reshaping\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[Tidy data](https://vita.had.co.nz/papers/tidy-data.pdf) is about \"linking the structure of a dataset with its semantics (its meaning)\". It is defined by:\n",
    "1. Each variable forms a column\n",
    "2. Each observation forms a row\n",
    "3. Each type of observational unit forms a table\n",
    "\n",
    "Often you'll need to reshape a dataframe to make it tidy (or for some other purpose).\n",
    "    \n",
    "![](img/chapter8/tidy.png)\n",
    "\n",
    "Source: [r4ds](https://r4ds.had.co.nz/tidy-data.html#fig:tidy-structure)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Melt and Pivot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas `.melt()`, `.pivot()` and `.pivot_table()` can help reshape dataframes\n",
    "- `.melt()`: make wide data long.\n",
    "- `.pivot()`: make long data width.\n",
    "- `.pivot_table()`: same as `.pivot()` but can handle multiple indexes.\n",
    "    \n",
    "![](img/chapter8/melt_pivot.gif)\n",
    "\n",
    "Source: [Garrick Aden-Buie's GitHub](https://github.com/gadenbuie/tidyexplain#spread-and-gather)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The below data shows how many courses different instructors taught across different years. If the question you want to answer is something like: \"Does the number of courses taught vary depending on year?\" then the below would probably not be considered tidy because there are multiple observations of courses taught in a year per row (i.e., there is data for 2018, 2019 and 2020 in a single row):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>2018</th>\n",
       "      <th>2019</th>\n",
       "      <th>2020</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Varada</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joel</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name  2018  2019  2020\n",
       "0      Tom     1     2     5\n",
       "1     Mike     3     4     2\n",
       "2  Tiffany     4     3     4\n",
       "3   Varada     5     2     4\n",
       "4     Joel     3     1     3"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({\"Name\": [\"Tom\", \"Mike\", \"Tiffany\", \"Varada\", \"Joel\"],\n",
    "                   \"2018\": [1, 3, 4, 5, 3],\n",
    "                   \"2019\": [2, 4, 3, 2, 1],\n",
    "                   \"2020\": [5, 2, 4, 4, 3]})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's make it tidy with `.melt()`. `.melt()` takes a few arguments, most important is the `id_vars` which indicated which column should be the \"identifier\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Year</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>2018</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Varada</td>\n",
       "      <td>2018</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joel</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Tom</td>\n",
       "      <td>2019</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Mike</td>\n",
       "      <td>2019</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>2019</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Varada</td>\n",
       "      <td>2019</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Joel</td>\n",
       "      <td>2019</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Tom</td>\n",
       "      <td>2020</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Mike</td>\n",
       "      <td>2020</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>2020</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Varada</td>\n",
       "      <td>2020</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Joel</td>\n",
       "      <td>2020</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Name  Year  Courses\n",
       "0       Tom  2018        1\n",
       "1      Mike  2018        3\n",
       "2   Tiffany  2018        4\n",
       "3    Varada  2018        5\n",
       "4      Joel  2018        3\n",
       "5       Tom  2019        2\n",
       "6      Mike  2019        4\n",
       "7   Tiffany  2019        3\n",
       "8    Varada  2019        2\n",
       "9      Joel  2019        1\n",
       "10      Tom  2020        5\n",
       "11     Mike  2020        2\n",
       "12  Tiffany  2020        4\n",
       "13   Varada  2020        4\n",
       "14     Joel  2020        3"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_melt = df.melt(id_vars=\"Name\",\n",
    "                  var_name=\"Year\",\n",
    "                  value_name=\"Courses\")\n",
    "df_melt"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `value_vars` argument allows us to select which specific variables we want to \"melt\" (if you don't specify `value_vars`, all non-identifier columns will be used). For example, below I'm omitting the `2018` column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Year</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>2019</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>2019</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>2019</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Varada</td>\n",
       "      <td>2019</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joel</td>\n",
       "      <td>2019</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Tom</td>\n",
       "      <td>2020</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Mike</td>\n",
       "      <td>2020</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>2020</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Varada</td>\n",
       "      <td>2020</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Joel</td>\n",
       "      <td>2020</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name  Year  Courses\n",
       "0      Tom  2019        2\n",
       "1     Mike  2019        4\n",
       "2  Tiffany  2019        3\n",
       "3   Varada  2019        2\n",
       "4     Joel  2019        1\n",
       "5      Tom  2020        5\n",
       "6     Mike  2020        2\n",
       "7  Tiffany  2020        4\n",
       "8   Varada  2020        4\n",
       "9     Joel  2020        3"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.melt(id_vars=\"Name\",\n",
    "        value_vars=[\"2019\", \"2020\"],\n",
    "        var_name=\"Year\",\n",
    "        value_name=\"Courses\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes, you want to make long data wide, which we can do with `.pivot()`. When using `.pivot()` we need to specify the `index` to pivot on, and the `columns` that will be used to make the new columns of the wider dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Year</th>\n",
       "      <th>2018</th>\n",
       "      <th>2019</th>\n",
       "      <th>2020</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Joel</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Mike</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tiffany</th>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Varada</th>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Year     2018  2019  2020\n",
       "Name                     \n",
       "Joel        3     1     3\n",
       "Mike        3     4     2\n",
       "Tiffany     4     3     4\n",
       "Tom         1     2     5\n",
       "Varada      5     2     4"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_pivot = df_melt.pivot(index=\"Name\",\n",
    "                         columns=\"Year\",\n",
    "                         values=\"Courses\")\n",
    "df_pivot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You'll notice that Pandas set our specified `index` as the index of the new dataframe and preserved the label of the columns. We can easily remove these names and reset the index to make our dataframe look like it originally did:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>2018</th>\n",
       "      <th>2019</th>\n",
       "      <th>2020</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Joel</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>4</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Tom</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Varada</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name  2018  2019  2020\n",
       "0     Joel     3     1     3\n",
       "1     Mike     3     4     2\n",
       "2  Tiffany     4     3     4\n",
       "3      Tom     1     2     5\n",
       "4   Varada     5     2     4"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df_pivot = df_pivot.reset_index()\n",
    "df_pivot.columns.name = None\n",
    "df_pivot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`.pivot()` will often get you what you want, but it won't work if you want to:\n",
    "- Use multiple indexes (next chapter), or\n",
    "- Have duplicate index/column labels\n",
    "\n",
    "In these cases you'll have to use `.pivot_table()`. I won't focus on it too much here because I'd rather you learn about `pivot()` first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Department</th>\n",
       "      <th>Year</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>CS</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Tom</td>\n",
       "      <td>STATS</td>\n",
       "      <td>2018</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mike</td>\n",
       "      <td>CS</td>\n",
       "      <td>2018</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Mike</td>\n",
       "      <td>STATS</td>\n",
       "      <td>2018</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Tom</td>\n",
       "      <td>CS</td>\n",
       "      <td>2019</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Tom</td>\n",
       "      <td>STATS</td>\n",
       "      <td>2019</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Mike</td>\n",
       "      <td>CS</td>\n",
       "      <td>2019</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Mike</td>\n",
       "      <td>STATS</td>\n",
       "      <td>2019</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Tom</td>\n",
       "      <td>CS</td>\n",
       "      <td>2020</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Tom</td>\n",
       "      <td>STATS</td>\n",
       "      <td>2020</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Mike</td>\n",
       "      <td>CS</td>\n",
       "      <td>2020</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Mike</td>\n",
       "      <td>STATS</td>\n",
       "      <td>2020</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    Name Department  Year  Courses\n",
       "0    Tom         CS  2018        1\n",
       "1    Tom      STATS  2018        2\n",
       "2   Mike         CS  2018        3\n",
       "3   Mike      STATS  2018        1\n",
       "4    Tom         CS  2019        2\n",
       "5    Tom      STATS  2019        3\n",
       "6   Mike         CS  2019        4\n",
       "7   Mike      STATS  2019        2\n",
       "8    Tom         CS  2020        5\n",
       "9    Tom      STATS  2020        1\n",
       "10  Mike         CS  2020        2\n",
       "11  Mike      STATS  2020        2"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({\"Name\": [\"Tom\", \"Tom\", \"Mike\", \"Mike\"],\n",
    "                   \"Department\": [\"CS\", \"STATS\", \"CS\", \"STATS\"],\n",
    "                   \"2018\": [1, 2, 3, 1],\n",
    "                   \"2019\": [2, 3, 4, 2],\n",
    "                   \"2020\": [5, 1, 2, 2]}).melt(id_vars=[\"Name\", \"Department\"], var_name=\"Year\", value_name=\"Courses\")\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above case, we have duplicates in `Name`, so `pivot()` won't work. It will throw us a `ValueError: Index contains duplicate entries, cannot reshape`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [
    {
     "ename": "ValueError",
     "evalue": "Index contains duplicate entries, cannot reshape",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mValueError\u001b[0m                                Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-41-585e29950052>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      1\u001b[0m df.pivot(index=\"Name\",\n\u001b[1;32m      2\u001b[0m          \u001b[0mcolumns\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m\"Year\"\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 3\u001b[0;31m          values=\"Courses\")\n\u001b[0m",
      "\u001b[0;32m/opt/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36mpivot\u001b[0;34m(self, index, columns, values)\u001b[0m\n\u001b[1;32m   6669\u001b[0m         \u001b[0;32mfrom\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpivot\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpivot\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   6670\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 6671\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0mpivot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcolumns\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mvalues\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   6672\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   6673\u001b[0m     _shared_docs[\n",
      "\u001b[0;32m/opt/miniconda3/lib/python3.7/site-packages/pandas/core/reshape/pivot.py\u001b[0m in \u001b[0;36mpivot\u001b[0;34m(data, index, columns, values)\u001b[0m\n\u001b[1;32m    475\u001b[0m         \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    476\u001b[0m             \u001b[0mindexed\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mdata\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_constructor_sliced\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mdata\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0mvalues\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_values\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mindex\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 477\u001b[0;31m     \u001b[0;32mreturn\u001b[0m \u001b[0mindexed\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0munstack\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    478\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    479\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/miniconda3/lib/python3.7/site-packages/pandas/core/series.py\u001b[0m in \u001b[0;36munstack\u001b[0;34m(self, level, fill_value)\u001b[0m\n\u001b[1;32m   3888\u001b[0m         \u001b[0;32mfrom\u001b[0m \u001b[0mpandas\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcore\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreshape\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0munstack\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   3889\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 3890\u001b[0;31m         \u001b[0;32mreturn\u001b[0m \u001b[0munstack\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfill_value\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   3891\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   3892\u001b[0m     \u001b[0;31m# ----------------------------------------------------------------------\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/miniconda3/lib/python3.7/site-packages/pandas/core/reshape/reshape.py\u001b[0m in \u001b[0;36munstack\u001b[0;34m(obj, level, fill_value)\u001b[0m\n\u001b[1;32m    423\u001b[0m             \u001b[0;32mreturn\u001b[0m \u001b[0m_unstack_extension_series\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfill_value\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    424\u001b[0m         unstacker = _Unstacker(\n\u001b[0;32m--> 425\u001b[0;31m             \u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlevel\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlevel\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mconstructor\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mobj\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_constructor_expanddim\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    426\u001b[0m         )\n\u001b[1;32m    427\u001b[0m         return unstacker.get_result(\n",
      "\u001b[0;32m/opt/miniconda3/lib/python3.7/site-packages/pandas/core/reshape/reshape.py\u001b[0m in \u001b[0;36m__init__\u001b[0;34m(self, index, level, constructor)\u001b[0m\n\u001b[1;32m    118\u001b[0m             \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Unstacked DataFrame is too big, causing int32 overflow\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    119\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 120\u001b[0;31m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_make_selectors\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    121\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    122\u001b[0m     \u001b[0;34m@\u001b[0m\u001b[0mcache_readonly\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/miniconda3/lib/python3.7/site-packages/pandas/core/reshape/reshape.py\u001b[0m in \u001b[0;36m_make_selectors\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m    167\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    168\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mmask\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msum\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m<\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mindex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 169\u001b[0;31m             \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Index contains duplicate entries, cannot reshape\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    170\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    171\u001b[0m         \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgroup_index\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mcomp_index\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mValueError\u001b[0m: Index contains duplicate entries, cannot reshape"
     ]
    }
   ],
   "source": [
    "df.pivot(index=\"Name\",\n",
    "         columns=\"Year\",\n",
    "         values=\"Courses\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In such a case, we'd use `.pivot_table()`. It will apply an aggregation function to our duplicates, in this case, we'll `sum()` them up:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Year</th>\n",
       "      <th>2018</th>\n",
       "      <th>2019</th>\n",
       "      <th>2020</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Mike</th>\n",
       "      <td>4</td>\n",
       "      <td>6</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Tom</th>\n",
       "      <td>3</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Year  2018  2019  2020\n",
       "Name                  \n",
       "Mike     4     6     4\n",
       "Tom      3     5     6"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index=\"Name\", columns='Year', values='Courses', aggfunc='sum')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we wanted to keep the numbers per department, we could specify both `Name` and `Department` as multiple indexes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Year</th>\n",
       "      <th>2018</th>\n",
       "      <th>2019</th>\n",
       "      <th>2020</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th>Department</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Mike</th>\n",
       "      <th>CS</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>STATS</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th rowspan=\"2\" valign=\"top\">Tom</th>\n",
       "      <th>CS</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>STATS</th>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Year             2018  2019  2020\n",
       "Name Department                  \n",
       "Mike CS             3     4     2\n",
       "     STATS          1     2     2\n",
       "Tom  CS             1     2     5\n",
       "     STATS          2     3     1"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(index=[\"Name\", \"Department\"], columns='Year', values='Courses')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result above is a mutlti-index or \"hierarchically indexed\" dataframe (more on those next chapter). If you ever have a need to use it, you can read more about `pivot_table()` in the [documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/reshaping.html#pivot-tables)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Working with Multiple DataFrames\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Often you'll work with multiple dataframes that you want to stick together or merge. `df.merge()` and `df.concat()` are all you need to know for combining dataframes. The Pandas [documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html) is very helpful for these functions, but they are pretty easy to grasp.\n",
    "\n",
    "```{note}\n",
    "The example joins shown in this section are inspired by [Chapter 15](https://stat545.com/join-cheatsheet.html) of Jenny Bryan's STAT 545 materials.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sticking DataFrames Together with `pd.concat()`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can use `pd.concat()` to stick dataframes together:\n",
    "- Vertically: if they have the same **columns**, OR\n",
    "- Horizontally: if they have the same **rows**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1 = pd.DataFrame({'A': [1, 3, 5],\n",
    "                    'B': [2, 4, 6]})\n",
    "df2 = pd.DataFrame({'A': [7, 9, 11],\n",
    "                    'B': [8, 10, 12]})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   A  B\n",
       "0  1  2\n",
       "1  3  4\n",
       "2  5  6"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B\n",
       "0   7   8\n",
       "1   9  10\n",
       "2  11  12"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B\n",
       "0   1   2\n",
       "1   3   4\n",
       "2   5   6\n",
       "0   7   8\n",
       "1   9  10\n",
       "2  11  12"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat((df1, df2), axis=0)  # axis=0 specifies a vertical stick, i.e., on the columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the indexes were simply joined together? This may or may not be what you want. To reset the index, you can specify the argument `ignore_index=True`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    A   B\n",
       "0   1   2\n",
       "1   3   4\n",
       "2   5   6\n",
       "3   7   8\n",
       "4   9  10\n",
       "5  11  12"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat((df1, df2), axis=0, ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use `axis=1` to stick together horizontally:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0  1   2   3\n",
       "0  1  2   7   8\n",
       "1  3  4   9  10\n",
       "2  5  6  11  12"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat((df1, df2), axis=1, ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You are not limited to just two dataframes, you can concatenate as many as you want:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>9</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>11</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A   B\n",
       "0    1   2\n",
       "1    3   4\n",
       "2    5   6\n",
       "3    7   8\n",
       "4    9  10\n",
       "5   11  12\n",
       "6    1   2\n",
       "7    3   4\n",
       "8    5   6\n",
       "9    7   8\n",
       "10   9  10\n",
       "11  11  12"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.concat((df1, df2, df1, df2), axis=0, ignore_index=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Joining DataFrames with `pd.merge()`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`pd.merge()` gives you the ability to \"join\" dataframes using different rules (just like with SQL if you're familiar with it). You can use `df.merge()` to join dataframes based on shared `key` columns. Methods include:\n",
    "- \"inner join\"\n",
    "- \"outer join\"\n",
    "- \"left join\"\n",
    "- \"right join\"\n",
    "\n",
    "See this great [cheat sheet](https://pandas.pydata.org/pandas-docs/stable/getting_started/comparison/comparison_with_sql.html#compare-with-sql-join) and [these great animations](https://github.com/gadenbuie/tidyexplain) for more insights."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1 = pd.DataFrame({\"name\": ['Magneto', 'Storm', 'Mystique', 'Batman', 'Joker', 'Catwoman', 'Hellboy'],\n",
    "                    'alignment': ['bad', 'good', 'bad', 'good', 'bad', 'bad', 'good'],\n",
    "                    'gender': ['male', 'female', 'female', 'male', 'male', 'female', 'male'],\n",
    "                    'publisher': ['Marvel', 'Marvel', 'Marvel', 'DC', 'DC', 'DC', 'Dark Horse Comics']})\n",
    "df2 = pd.DataFrame({'publisher': ['DC', 'Marvel', 'Image'],\n",
    "                    'year_founded': [1934, 1939, 1992]})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](img/chapter8/join.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An \"inner\" join will return all rows of `df1` where matching values for \"publisher\" are found in `df2`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender publisher  year_founded\n",
       "0   Magneto       bad    male    Marvel          1939\n",
       "1     Storm      good  female    Marvel          1939\n",
       "2  Mystique       bad  female    Marvel          1939\n",
       "3    Batman      good    male        DC          1934\n",
       "4     Joker       bad    male        DC          1934\n",
       "5  Catwoman       bad  female        DC          1934"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"inner\", on=\"publisher\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](img/chapter8/inner_join.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An \"outer\" join will return all rows of `df1` and `df2`, placing NaNs where information is unavailable:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Hellboy</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>Dark Horse Comics</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Image</td>\n",
       "      <td>1992.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender          publisher  year_founded\n",
       "0   Magneto       bad    male             Marvel        1939.0\n",
       "1     Storm      good  female             Marvel        1939.0\n",
       "2  Mystique       bad  female             Marvel        1939.0\n",
       "3    Batman      good    male                 DC        1934.0\n",
       "4     Joker       bad    male                 DC        1934.0\n",
       "5  Catwoman       bad  female                 DC        1934.0\n",
       "6   Hellboy      good    male  Dark Horse Comics           NaN\n",
       "7       NaN       NaN     NaN              Image        1992.0"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"outer\", on=\"publisher\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](img/chapter8/outer_join.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Return all rows from `df1` and all columns of `df1` and `df2`, populated where matches occur:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Hellboy</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>Dark Horse Comics</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender          publisher  year_founded\n",
       "0   Magneto       bad    male             Marvel        1939.0\n",
       "1     Storm      good  female             Marvel        1939.0\n",
       "2  Mystique       bad  female             Marvel        1939.0\n",
       "3    Batman      good    male                 DC        1934.0\n",
       "4     Joker       bad    male                 DC        1934.0\n",
       "5  Catwoman       bad  female                 DC        1934.0\n",
       "6   Hellboy      good    male  Dark Horse Comics           NaN"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"left\", on=\"publisher\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](img/chapter8/left_join.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Image</td>\n",
       "      <td>1992</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender publisher  year_founded\n",
       "0    Batman      good    male        DC          1934\n",
       "1     Joker       bad    male        DC          1934\n",
       "2  Catwoman       bad  female        DC          1934\n",
       "3   Magneto       bad    male    Marvel          1939\n",
       "4     Storm      good  female    Marvel          1939\n",
       "5  Mystique       bad  female    Marvel          1939\n",
       "6       NaN       NaN     NaN     Image          1992"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"right\", on=\"publisher\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are many ways to specify the `key` to join dataframes on, you can join on index values, different, column names, etc. Another helpful argument is the `indicator` argument which will add a column to the result telling you where matches were found in the dataframes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>alignment</th>\n",
       "      <th>gender</th>\n",
       "      <th>publisher</th>\n",
       "      <th>year_founded</th>\n",
       "      <th>_merge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Magneto</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Storm</td>\n",
       "      <td>good</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mystique</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>Marvel</td>\n",
       "      <td>1939.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Batman</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Joker</td>\n",
       "      <td>bad</td>\n",
       "      <td>male</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Catwoman</td>\n",
       "      <td>bad</td>\n",
       "      <td>female</td>\n",
       "      <td>DC</td>\n",
       "      <td>1934.0</td>\n",
       "      <td>both</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Hellboy</td>\n",
       "      <td>good</td>\n",
       "      <td>male</td>\n",
       "      <td>Dark Horse Comics</td>\n",
       "      <td>NaN</td>\n",
       "      <td>left_only</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Image</td>\n",
       "      <td>1992.0</td>\n",
       "      <td>right_only</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       name alignment  gender          publisher  year_founded      _merge\n",
       "0   Magneto       bad    male             Marvel        1939.0        both\n",
       "1     Storm      good  female             Marvel        1939.0        both\n",
       "2  Mystique       bad  female             Marvel        1939.0        both\n",
       "3    Batman      good    male                 DC        1934.0        both\n",
       "4     Joker       bad    male                 DC        1934.0        both\n",
       "5  Catwoman       bad  female                 DC        1934.0        both\n",
       "6   Hellboy      good    male  Dark Horse Comics           NaN   left_only\n",
       "7       NaN       NaN     NaN              Image        1992.0  right_only"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.merge(df1, df2, how=\"outer\", on=\"publisher\", indicator=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By the way, you can use `pd.concat()` to do a simple \"inner\" or \"outer\" join on multiple datadrames at once. It's less flexible than merge, but can be useful sometimes."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. More DataFrame Operations\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Applying Custom Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There will be times when you want to apply a function that is not built-in to Pandas. For this, we also have methods:\n",
    "- `df.apply()`, applies a function column-wise or row-wise across a dataframe (the function must be able to accept/return an array)\n",
    "- `df.applymap()`, applies a function element-wise (for functions that accept/return single values at a time)\n",
    "- `series.apply()`/`series.map()`, same as above but for Pandas series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For example, say you want to use a numpy function on a column in your dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>-0.901866</td>\n",
       "      <td>0.053604</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>-0.901697</td>\n",
       "      <td>0.447197</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>-0.035549</td>\n",
       "      <td>-0.046354</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>-0.739059</td>\n",
       "      <td>0.270228</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>-0.236515</td>\n",
       "      <td>-0.086263</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>-0.683372</td>\n",
       "      <td>0.063586</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>0.150056</td>\n",
       "      <td>0.133232</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>0.026702</td>\n",
       "      <td>0.023627</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>-0.008640</td>\n",
       "      <td>0.221770</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>0.897861</td>\n",
       "      <td>-0.700695</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        Time  Distance\n",
       "0  -0.901866  0.053604\n",
       "1  -0.901697  0.447197\n",
       "2  -0.035549 -0.046354\n",
       "3  -0.739059  0.270228\n",
       "4  -0.236515 -0.086263\n",
       "..       ...       ...\n",
       "28 -0.683372  0.063586\n",
       "29  0.150056  0.133232\n",
       "30  0.026702  0.023627\n",
       "31 -0.008640  0.221770\n",
       "32  0.897861 -0.700695\n",
       "\n",
       "[33 rows x 2 columns]"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('data/cycling_data.csv')\n",
    "df[['Time', 'Distance']].apply(np.sin)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or you may want to apply your own custom function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.578889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.703056</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.517500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.608889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.525278</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>0.519444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>0.596944</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>0.511389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>0.684167</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>0.511944</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        Time\n",
       "0   0.578889\n",
       "1   0.703056\n",
       "2   0.517500\n",
       "3   0.608889\n",
       "4   0.525278\n",
       "..       ...\n",
       "28  0.519444\n",
       "29  0.596944\n",
       "30  0.511389\n",
       "31  0.684167\n",
       "32  0.511944\n",
       "\n",
       "[33 rows x 1 columns]"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def seconds_to_hours(x):\n",
    "    return x / 3600\n",
    "\n",
    "df[['Time']].apply(seconds_to_hours)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This may have been better as a lambda function..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.578889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.703056</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.517500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.608889</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.525278</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>0.519444</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>0.596944</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>0.511389</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>0.684167</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>0.511944</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        Time\n",
       "0   0.578889\n",
       "1   0.703056\n",
       "2   0.517500\n",
       "3   0.608889\n",
       "4   0.525278\n",
       "..       ...\n",
       "28  0.519444\n",
       "29  0.596944\n",
       "30  0.511389\n",
       "31  0.684167\n",
       "32  0.511944\n",
       "\n",
       "[33 rows x 1 columns]"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Time']].apply(lambda x: x / 3600)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can even use functions that require additional arguments. Just specify the arguments in `.apply()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>34.733333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>42.183333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>31.050000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>36.533333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>31.516667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>31.166667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>35.816667</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>30.683333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>41.050000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>30.716667</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "         Time\n",
       "0   34.733333\n",
       "1   42.183333\n",
       "2   31.050000\n",
       "3   36.533333\n",
       "4   31.516667\n",
       "..        ...\n",
       "28  31.166667\n",
       "29  35.816667\n",
       "30  30.683333\n",
       "31  41.050000\n",
       "32  30.716667\n",
       "\n",
       "[33 rows x 1 columns]"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def convert_seconds(x, to=\"hours\"):\n",
    "    if to == \"hours\":\n",
    "        return x / 3600\n",
    "    elif to == \"minutes\":\n",
    "        return x / 60\n",
    "\n",
    "df[['Time']].apply(convert_seconds, to=\"minutes\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some functions only accept/return a scalar:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "int(3.141)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "float() argument must be a string or a number, not 'list'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-62-97f9c30c3ff1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mfloat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m3.141\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m10.345\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mTypeError\u001b[0m: float() argument must be a string or a number, not 'list'"
     ]
    }
   ],
   "source": [
    "float([3.141, 10.345])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For these, we need `.applymap()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2084</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2531</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1863</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>2192</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1891</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>1870</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>2149</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>1841</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>2463</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>1843</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "    Time\n",
       "0   2084\n",
       "1   2531\n",
       "2   1863\n",
       "3   2192\n",
       "4   1891\n",
       "..   ...\n",
       "28  1870\n",
       "29  2149\n",
       "30  1841\n",
       "31  2463\n",
       "32  1843\n",
       "\n",
       "[33 rows x 1 columns]"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Time']].applymap(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, there are often \"vectorized\" versions of common functions like this already available, which are much faster. In the case above, we can use `.astype()` to change the dtype of a whole column quickly:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "'astype' is 1.98 faster than 'applymap'!\n"
     ]
    }
   ],
   "source": [
    "time_applymap = %timeit  -q -o -r 3 df[['Time']].applymap(float)\n",
    "time_builtin = %timeit  -q -o -r 3 df[['Time']].astype(float)\n",
    "print(f\"'astype' is {time_applymap.average / time_builtin.average:.2f} faster than 'applymap'!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": [
     "raises-exception"
    ]
   },
   "source": [
    "### Grouping"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Often we are interested in examining specific groups in our data. `df.groupby()` allows us to group our data based on a variable(s)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('data/cycling_data.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's group this dataframe on the column `Name`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pandas.core.groupby.generic.DataFrameGroupBy object at 0x1577bb150>"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfg = df.groupby(by='Name')\n",
    "dfg"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What is a `DataFrameGroupBy` object? It contains information about the groups of the dataframe:\n",
    "\n",
    "![](img/chapter8/groupby_1.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The groupby object is really just a dictionary of index-mappings, which we could look at if we wanted to:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'Afternoon Ride': [0, 2, 4, 6, 9, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32], 'Morning Ride': [1, 3, 5, 7, 8, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31]}"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfg.groups"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also access a group using the `.get_group()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>17 Sep 2019, 00:15:47</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1973</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>18 Sep 2019, 00:15:52</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2101</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Pumped up tires</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>19 Sep 2019, 00:30:01</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>48062</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>20 Sep 2019, 01:02:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2961</td>\n",
       "      <td>12.81</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>24 Sep 2019, 00:35:42</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2076</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Oiled chain, bike feels smooth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>25 Sep 2019, 00:07:21</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1775</td>\n",
       "      <td>12.10</td>\n",
       "      <td>Feeling really tired</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>26 Sep 2019, 00:13:33</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1860</td>\n",
       "      <td>12.52</td>\n",
       "      <td>raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>27 Sep 2019, 01:00:18</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1712</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1 Oct 2019, 00:15:07</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1732</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2 Oct 2019, 00:13:09</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1756</td>\n",
       "      <td>NaN</td>\n",
       "      <td>A little tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>3 Oct 2019, 00:45:22</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1724</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type   Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride   2084     12.62   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride   1863     12.52   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride   1891     12.48   \n",
       "6   17 Sep 2019, 00:15:47  Afternoon Ride  Ride   1973     12.45   \n",
       "9   18 Sep 2019, 00:15:52  Afternoon Ride  Ride   2101     12.48   \n",
       "10  19 Sep 2019, 00:30:01  Afternoon Ride  Ride  48062     12.48   \n",
       "12  20 Sep 2019, 01:02:05  Afternoon Ride  Ride   2961     12.81   \n",
       "14  24 Sep 2019, 00:35:42  Afternoon Ride  Ride   2076     12.47   \n",
       "16  25 Sep 2019, 00:07:21  Afternoon Ride  Ride   1775     12.10   \n",
       "18  26 Sep 2019, 00:13:33  Afternoon Ride  Ride   1860     12.52   \n",
       "20  27 Sep 2019, 01:00:18  Afternoon Ride  Ride   1712     12.47   \n",
       "22   1 Oct 2019, 00:15:07  Afternoon Ride  Ride   1732       NaN   \n",
       "24   2 Oct 2019, 00:13:09  Afternoon Ride  Ride   1756       NaN   \n",
       "26   3 Oct 2019, 00:45:22  Afternoon Ride  Ride   1724     12.52   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride   1870     12.63   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride   1841     12.59   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride   1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "2                   Wet road but nice weather  \n",
       "4                Tired by the end of the week  \n",
       "6                        Legs feeling strong!  \n",
       "9                             Pumped up tires  \n",
       "10                               Feeling good  \n",
       "12                               Feeling good  \n",
       "14             Oiled chain, bike feels smooth  \n",
       "16                       Feeling really tired  \n",
       "18                                    raining  \n",
       "20               Tired by the end of the week  \n",
       "22                       Legs feeling strong!  \n",
       "24      A little tired today but good weather  \n",
       "26                               Feeling good  \n",
       "28           Very tired, riding into the wind  \n",
       "30        Feeling good after a holiday break!  \n",
       "32  Bike feeling tight, needs an oil and pump  "
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfg.get_group('Afternoon Ride')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The usual thing to do however, is to apply aggregate functions to the groupby object:\n",
    "\n",
    "![](img/chapter8/groupby_2.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afternoon Ride</th>\n",
       "      <td>4654.352941</td>\n",
       "      <td>12.462</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Morning Ride</th>\n",
       "      <td>2299.875000</td>\n",
       "      <td>12.860</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       Time  Distance\n",
       "Name                                 \n",
       "Afternoon Ride  4654.352941    12.462\n",
       "Morning Ride    2299.875000    12.860"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfg.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can apply multiple functions using `.aggregate()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"3\" halign=\"left\">Time</th>\n",
       "      <th colspan=\"3\" halign=\"left\">Distance</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>sum</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>sum</th>\n",
       "      <th>count</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afternoon Ride</th>\n",
       "      <td>4654.352941</td>\n",
       "      <td>79124</td>\n",
       "      <td>17</td>\n",
       "      <td>12.462</td>\n",
       "      <td>186.93</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Morning Ride</th>\n",
       "      <td>2299.875000</td>\n",
       "      <td>36798</td>\n",
       "      <td>16</td>\n",
       "      <td>12.860</td>\n",
       "      <td>205.76</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       Time              Distance              \n",
       "                       mean    sum count     mean     sum count\n",
       "Name                                                           \n",
       "Afternoon Ride  4654.352941  79124    17   12.462  186.93    15\n",
       "Morning Ride    2299.875000  36798    16   12.860  205.76    16"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dfg.aggregate(['mean', 'sum', 'count'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And even apply different functions to different columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"4\" halign=\"left\">Time</th>\n",
       "      <th>Distance</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>max</th>\n",
       "      <th>min</th>\n",
       "      <th>mean</th>\n",
       "      <th>num_range</th>\n",
       "      <th>sum</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Name</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>Afternoon Ride</th>\n",
       "      <td>48062</td>\n",
       "      <td>1712</td>\n",
       "      <td>4654.352941</td>\n",
       "      <td>46350</td>\n",
       "      <td>186.93</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Morning Ride</th>\n",
       "      <td>2903</td>\n",
       "      <td>2090</td>\n",
       "      <td>2299.875000</td>\n",
       "      <td>813</td>\n",
       "      <td>205.76</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                 Time                              Distance\n",
       "                  max   min         mean num_range      sum\n",
       "Name                                                       \n",
       "Afternoon Ride  48062  1712  4654.352941     46350   186.93\n",
       "Morning Ride     2903  2090  2299.875000       813   205.76"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def num_range(x):\n",
    "    return x.max() - x.min()\n",
    "\n",
    "dfg.aggregate({\"Time\": ['max', 'min', 'mean', num_range], \n",
    "               \"Distance\": ['sum']})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By the way, you can use aggregate for non-grouped dataframes too. This is pretty much what `df.describe` does under-the-hood:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1 Oct 2019, 00:15:07</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1712.000000</td>\n",
       "      <td>11.790000</td>\n",
       "      <td>A little tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>33</td>\n",
       "      <td>33</td>\n",
       "      <td>33</td>\n",
       "      <td>33.000000</td>\n",
       "      <td>31.000000</td>\n",
       "      <td>33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>3512.787879</td>\n",
       "      <td>12.667419</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>num_range</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>46350.000000</td>\n",
       "      <td>2.780000</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                           Date            Name  Type          Time  \\\n",
       "min        1 Oct 2019, 00:15:07  Afternoon Ride  Ride   1712.000000   \n",
       "count                        33              33    33     33.000000   \n",
       "mean                        NaN             NaN   NaN   3512.787879   \n",
       "num_range                   NaN             NaN   NaN  46350.000000   \n",
       "\n",
       "            Distance                               Comments  \n",
       "min        11.790000  A little tired today but good weather  \n",
       "count      31.000000                                     33  \n",
       "mean       12.667419                                    NaN  \n",
       "num_range   2.780000                                    NaN  "
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.agg(['mean', 'min', 'count', num_range])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dealing with Missing Values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Missing values are typically denoted with `NaN`. We can use `df.isnull()` to find missing values in a dataframe. It returns a boolean for each element in the dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>33 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Date   Name   Type   Time  Distance  Comments\n",
       "0   False  False  False  False     False     False\n",
       "1   False  False  False  False     False     False\n",
       "2   False  False  False  False     False     False\n",
       "3   False  False  False  False     False     False\n",
       "4   False  False  False  False     False     False\n",
       "..    ...    ...    ...    ...       ...       ...\n",
       "28  False  False  False  False     False     False\n",
       "29  False  False  False  False     False     False\n",
       "30  False  False  False  False     False     False\n",
       "31  False  False  False  False     False     False\n",
       "32  False  False  False  False     False     False\n",
       "\n",
       "[33 rows x 6 columns]"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isnull()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But it's usually more helpful to get this information by row or by column using the `.any()` or `.info()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 33 entries, 0 to 32\n",
      "Data columns (total 6 columns):\n",
      " #   Column    Non-Null Count  Dtype  \n",
      "---  ------    --------------  -----  \n",
      " 0   Date      33 non-null     object \n",
      " 1   Name      33 non-null     object \n",
      " 2   Type      33 non-null     object \n",
      " 3   Time      33 non-null     int64  \n",
      " 4   Distance  31 non-null     float64\n",
      " 5   Comments  33 non-null     object \n",
      "dtypes: float64(1), int64(1), object(4)\n",
      "memory usage: 1.7+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1 Oct 2019, 00:15:07</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1732</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2 Oct 2019, 00:13:09</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1756</td>\n",
       "      <td>NaN</td>\n",
       "      <td>A little tired today but good weather</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                    Date            Name  Type  Time  Distance  \\\n",
       "22  1 Oct 2019, 00:15:07  Afternoon Ride  Ride  1732       NaN   \n",
       "24  2 Oct 2019, 00:13:09  Afternoon Ride  Ride  1756       NaN   \n",
       "\n",
       "                                 Comments  \n",
       "22                   Legs feeling strong!  \n",
       "24  A little tired today but good weather  "
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df.isnull().any(axis=1)]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When you have missing values, we usually either drop them or impute them.You can drop missing values with `df.dropna()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>31 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type  Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride  2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride  2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride  1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride  2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride  1891     12.48   \n",
       "..                    ...             ...   ...   ...       ...   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride  1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride  2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride  1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride  2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride  1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "..                                        ...  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "\n",
       "[31 rows x 6 columns]"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.dropna()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or you can impute (\"fill\") them using `.fillna()`. This method has various options for filling, you can use a fixed value, the mean of the column, the previous non-nan value, etc:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B   C  D\n",
       "0  NaN  2.0 NaN  0\n",
       "1  3.0  4.0 NaN  1\n",
       "2  NaN  NaN NaN  5\n",
       "3  NaN  3.0 NaN  4"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame([[np.nan, 2, np.nan, 0],\n",
    "                   [3, 4, np.nan, 1],\n",
    "                   [np.nan, np.nan, np.nan, 5],\n",
    "                   [np.nan, 3, np.nan, 4]],\n",
    "                  columns=list('ABCD'))\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B    C  D\n",
       "0  0.0  2.0  0.0  0\n",
       "1  3.0  4.0  0.0  1\n",
       "2  0.0  0.0  0.0  5\n",
       "3  0.0  3.0  0.0  4"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fillna(0)  # fill with 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B   C  D\n",
       "0  3.0  2.0 NaN  0\n",
       "1  3.0  4.0 NaN  1\n",
       "2  3.0  3.0 NaN  5\n",
       "3  3.0  3.0 NaN  4"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fillna(df.mean())  # fill with the mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B   C  D\n",
       "0  3.0  2.0 NaN  0\n",
       "1  3.0  4.0 NaN  1\n",
       "2  NaN  3.0 NaN  5\n",
       "3  NaN  3.0 NaN  4"
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fillna(method='bfill')  # backward (upwards) fill from non-nan values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3.0</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B   C  D\n",
       "0  NaN  2.0 NaN  0\n",
       "1  3.0  4.0 NaN  1\n",
       "2  3.0  4.0 NaN  5\n",
       "3  3.0  3.0 NaN  4"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.fillna(method='ffill')  # forward (downward) fill from non-nan values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, sometimes I use visualizations to help identify (patterns in) missing values. One thing I often do is print a heatmap of my dataframe to get a feel for where my missing values are. If you want to run this code, you may need to install `seaborn`:\n",
    "\n",
    "```sh\n",
    "conda install seaborn\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [],
   "source": [
    "import seaborn as sns\n",
    "sns.set(rc={'figure.figsize':(7, 7)})"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>A</th>\n",
       "      <th>B</th>\n",
       "      <th>C</th>\n",
       "      <th>D</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>NaN</td>\n",
       "      <td>2.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3.0</td>\n",
       "      <td>4.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>NaN</td>\n",
       "      <td>3.0</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     A    B   C  D\n",
       "0  NaN  2.0 NaN  0\n",
       "1  3.0  4.0 NaN  1\n",
       "2  NaN  NaN NaN  5\n",
       "3  NaN  3.0 NaN  4"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAagAAAGeCAYAAADBkZVwAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAM0ElEQVR4nO3dW4iVhf7G8d/SaUyywCwFAytr7w6eQM2MCEEqDykaFJQloSAJNaZ100XRTeBNCR0IkpDIDkqMOkiBdMIkGhOCosQKc5DQkLqJLkydmX0R2z/uvZv8gzPrqfX53Pm+I+9z4/r6rvUuptHf399fABBmWLMHAMD/IlAARBIoACIJFACRBAqASAIFQKS2obxY34//HMrL8T/MGz+t2ROoql1Hvmj2hJbn30KG9/re/sNz7qAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACR2v7sBw4ePFi7du2qH3/8sYYNG1Zjx46tW265paZMmTIU+wBoUQPeQb3xxhv16KOPVlXVlClTatKkSVVV9eSTT9amTZsGfx0ALWvAO6jXXnutduzYUSNHjjzj+IoVK+rOO++slStXDuo4AFrXgHdQbW1tderUqf86fvz48TrvvPMGbRQADHgHtXr16lq6dGnddNNNdemll1aj0ahjx45Vd3d3rVu3bqg2AtCCBgzU4sWLa9asWfXpp5/WsWPHqq+vr2bOnFkdHR01bty4odoIQAv606f4xo0bV0uXLh2CKQDwf3wPCoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUCktqG82Lzx04bycgD8hbmDAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASG0DnTxy5MiAf3n8+PHndAwA/NuAgXrwwQerp6enxo4dW/39/WecazQa9cEHHwzqOABa14CBeuutt2rZsmX11FNP1YwZM4ZqEwAM/BnUqFGj6umnn64dO3YM0RwA+N2Ad1BVVVOnTq2pU6cOxRYAOM1TfABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkdqG8mK7jnwxlJcD4C/MHRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiDSnwbq/fffr82bN9fhw4fPOL5169ZBGwUAAwbqmWeeqddff716enrq3nvvra6urtPntmzZMujjAGhdbQOd3L17d23fvr3a2tpq+fLltXLlympvb68FCxZUf3//UG0EoAUNGKj+/v5qNBpVVXXFFVfUyy+/XCtWrKiLL7749HEAGAwDvsU3f/78Wr58eX355ZdVVfWPf/yjnnvuuVq7du1/fSYFAOfSgHdQDz/8cM2YMaMuuOCC08dmzJhR27Ztq02bNg36OABaV6N/CD9M6vvxn0N1KYABzRs/rdkTqKr3+t7+w3O+BwVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEKmt2QMYWvPGT2v2BKpq15Evmj0B4rmDAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARGr7sx/o6empkSNH1rhx4+rtt9+ub775pqZPn14LFy4cin0AtKgBA/Xqq6/W5s2bq6+vr2bPnl1Hjx6t2267rTo7O+vQoUP10EMPDdVOAFrMgIHq7Oysd999t3766adatGhRdXd314gRI+ruu++uu+66S6AAGDQDfgbV19dX7e3tddlll9XKlStrxIgRp8/19vYO+jgAWteAgbr99tvr/vvvr97e3uro6KiqqgMHDtSyZctqwYIFQzIQgNY04Ft8jzzySO3bt6+GDx9++lh7e3t1dHTUnDlzBn0cAK3rT5/iu+GGG87488SJE2vixImDNggAqnwPCoBQAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIJJAARBJoACIJFAARBIoACIJFACRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAiCRQAkQQKgEgCBUAkgQIgUqO/v7+/2SMA4D+5gwIgkkABEEmgAIgkUABEEigAIgkUAJEECoBIAgVAJIECIFJbswf8lXz77be1ePHiev7552vevHnNntNS9u7dW6tXr64JEyZUf39/nTx5su6555564IEHmj2tpfz666/17LPP1r59+2r48OF10UUX1eOPP16TJk1q9rSW8cMPP9T8+fPrqquuqqqq48eP1/Tp0+uxxx6rSy65pMnrzi2B+n/o7Oys+fPn19atWwWqCSZPnlybN2+uqt9fKO+44466+eab6+qrr27ystbQ19dXq1atqhtvvLF27NhRbW1t1d3dXatWrap33nmnRo8e3eyJLWPs2LHV1dVVVVX9/f21YcOGWrNmTb355ptNXnZueYvvLJ08ebJ27txZa9eura+//roOHz7c7Ekt7bfffqvhw4fXhRde2OwpLWPv3r119OjRWrNmTbW1/f5/29mzZ9f69eurr6+vyetaV6PRqI6Ojvruu+/qwIEDzZ5zTgnUWdq9e3eNHz++rrzyyrr11ltr69atzZ7Ucr766qtasmRJLV68uObOnVuzZs2qsWPHNntWy9i/f39de+21NWzYmS8bc+bMqTFjxjRpFVVV7e3tdfnll9f333/f7CnnlECdpc7Ozlq0aFFVVS1cuLC2bdtWJ06caPKq1jJ58uTq6uqqnTt31ieffFI9PT21cePGZs9qGcOGDasRI0Y0ewZ/oNFo1Pnnn9/sGeeUQJ2Fn3/+ufbs2VObNm2quXPn1hNPPFG//PJLvffee82e1rJGjRpVCxYsqM8//7zZU1rG5MmTa//+/fWfv6Fnw4YN1d3d3aRVVFWdOHGiDh069Lf7PFagzkJXV1fNnj27Pv744/rwww/ro48+qtWrV9eWLVuaPa1l9fb21meffVbXX399s6e0jJkzZ9aYMWPqxRdfrN7e3qqq2rNnT23btu1v98L4V9LX11cvvPBCTZs2rSZMmNDsOeeUp/jOwvbt22vdunVnHLvvvvvqlVdeqYMHD55+3JPB9e/PoBqNRp06daquueaaWrVqVbNntYxGo1EvvfRSrV+/vhYtWlRtbW01evTo2rhx49/u8eZ0x44dqyVLllTV74G67rrrasOGDU1ede75jboARPIWHwCRBAqASAIFQCSBAiCSQAEQSaAAiCRQAEQSKAAi/QvXS9vP2cvFXgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 504x504 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.heatmap(df.isnull(), cmap='viridis', cbar=False);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAaoAAAGgCAYAAAD7FCOQAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAA14UlEQVR4nO3df1xUdb4/8BfyK4hcrwniVwlbNxavP7AolVDRkhEYxiExBLwgElcoA9Fupiwrrg9F8bqRrj32EQ/RLZEECSVpxQhdr4ir6S3YslzzV3olAilxVmRgON8/vM5VFx04c8544LyefzUH/Mxrfhzenc/nnPexEwRBABERkUL1e9gBiIiIHoSFioiIFI2FioiIFI2FioiIFI2FioiIFI2FioiIFM2qQrV3716EhYVBo9Fgx44dUmUiIiIycxD7DxsaGpCbm4vS0lI4OTkhOjoaEyZMwK9+9Ssp8xERkcqJLlQ1NTWYOHEiBgwYAACYMWMGKioq8Prrr1v8t8H9Xhb7tERE1EdVdu7qcrvoqb8ff/wR7u7u5sceHh5oaGgQOxwREVGXRBeqzs5O2NnZmR8LgnDXYyIiIimInvrz9PTEiRMnzI8bGxvh4eEhSSjqffZfqZVsrBn/z0+ysYio9xN9RPX888/j6NGjaG5uRmtrKz799FNMmTJFymxERETij6gGDx6MxYsXIz4+Hu3t7Zg9ezbGjh0rZTYiIiLxhQoAdDoddDqdVFmIiIj+iVWFimxLyetAXFei3krK/QrgviAHtlAiIiJFY6EiIiJFs6pQbdy4EWFhYdBqtdi2bZtUmYiIiMxEr1EdP34cf/3rX/Hxxx+jo6MDYWFhCAoKwi9/+Usp89EdpJz75rw82ZKSv29q+u4q+XN4ENFHVOPHj8cHH3wABwcHXL16FSaTCa6urlJmIyIism7qz9HREZs2bYJWq0VAQAAGDx4sVS4iIiIAEpxMkZaWhqNHj6K+vh7FxcVSZCIiIjITvUZ19uxZGI1GjBw5Ei4uLtBoNDh9+rSU2Yioj1DTOpCS9dbPQfQR1eXLl5GZmQmj0Qij0Yiqqir4+/tLmY2IiEj8EVVQUBDq6uoQEREBe3t7aDQaaLVaKbMRERHBThAEwdZPyjv8Um8l9em9Uuqt0zpEt0l+h18iIiJbYKEiIiJFs6pQHThwALNmzUJoaChWr14tVSYiIiIz0SdTXLp0CVlZWdi1axcef/xxzJs3D4cOHUJQUJCU+YgURcnrQEpuj8O1vb7JVp+r6EJVWVmJsLAweHp6AgByc3Ph7OwsWTAiIiLAiqm/ixcvwmQyISUlBXq9HoWFhfjFL34hZTYiIiLxhcpkMuHo0aPIzs5GUVER6urqsHv3bimzERERiZ/6GzRoEAICAjBw4EAAwPTp01FXV4dZs2ZJFo7ko+T1DOp7+P1QBqXv95WdXW8XfUQ1bdo0VFdXo6WlBSaTCYcPH8aoUaPEDkdERNQl0UdUfn5+SEpKQmxsLNrb2xEYGIjIyEgpsxEREYkvVAAwe/ZszJ49W6osRERE/4S9/u4h5Rwu5+WJblH62ggpA3v9ERFRr2RVocrLy8OMGTOg0+nwxz/+UapMREREZqILVU1NDfbu3YuPPvoIe/bsQW1tLT799FMpsxEREYk/meLUqVOYNGkS3NzcAACTJ0/GZ599Bo1GI1m4h4H9zciW1LImKnU2tbxvdIvoI6pRo0ahuroaP//8M9ra2nDgwAE0NTVJmY2IiEj8EVVAQABmzZqFuLg4DBgwAAEBAaitVe4RBBER9U6iT083GAy4du0ahg4dCgDYsmULfvjhB2RmZlr8t1Kens7TXomI+gbJT0+/fPkyXnvtNXR0dOD69esoKSlBaGio6IBERERdET315+vrC41Gg5kzZ8JkMiEhIQH+/v5SZiMiIur9nSk49UdE1Dfcb+rPql5/SsDCQkTUt7GFEhERKRoLFRERKVq3CpXBYEB4eDguX74MACgqKkJ4eDh0Oh2WL18Oo9Eoa0giIlIvi2tUtbW1yMzMxIULFwAA58+fR35+PkpLS/Hoo49i2bJlKCwsREJCgsxRbYOtWai34olFfQ8/01ssHlEVFxcjKysLHh4eAAAnJydkZWXBzc0NdnZ28PHxwZUrV2QPSkRE6mTxiGrNmjV3PR46dKi5G0VzczN27NiBtWvXypOOiIhUT/TJFA0NDZg3bx4iIyMxYcIEKTMRERGZibqO6uzZs0hKSkJcXBwSExOlzvRQ9dY53J5S8tw3sz38sdRETZ+pkl/rg/S4UBkMBrzyyitIT09HRESEDJGIiIj+T4+n/kpKStDU1IRt27ZBr9dDr9dj48aNcmQjIiLq/hHVgQMHAAAJCQl95lR0IiJSvl7f64/EUfJ6hpqy8bo9IsvYQomIiBRNVAul5cuXQ6PRmNeoKisrZQ1JRETq1eMWSgDw1VdfoaCgwNytgkjJU1hKzsbpuodPTZ9Bb32tPW6h1NraiitXriAjIwM6nQ6bNm1CZ2en7EGJiEidLBaqNWvW4NlnnzU/bmpqwsSJE5GdnY3i4mKcOHECJSUlsoYkIiL16vHJFF5eXnj33Xfh4eEBFxcXxMXF4dChQ3JkIyIi6vnp6adPn8aFCxcwY8YMAIAgCHBweHhnuffWliB9jZLfNyVnkxL3BeqrenxEJQgCsrOzce3aNbS3t6OoqAjBwcFyZCMiIur5EZWvry8WLFiAmJgYdHR0QKPRIDw8XI5sREREPW+hBABz587F3LlzZQlERER0p17fQonz6H2PktdalJyN+wL1VWyhREREisZCRUREiiaq1191dTVmzpyJ8PBwLF26FEajUdaQRESkXqJ6/f3mN7/B1q1bMWLECKSlpaGsrAwvv/yynDkJyu5ZJyU19fojIst63OsPAEwmEwwGA0wmE9ra2uDs7CxrSCIiUi+LR1Rr1qz5p20rV65EXFwc3NzcMGzYMISEhMgSjoiIqMcnUzQ2NmLDhg0oLy9HdXU1/Pz8sHbtWjmyERER9fw6qhMnTsDHxwdPPPEEACAqKgrp6elS56IucH1EHL5v9CBKvjZOar31tfb4iMrHxwd1dXVoamoCAFRVVWHMmDGSByMiIgJEHFGNGDECixYtQnx8POzt7eHt7Y1Vq1bJkY2IiAh2giAItn7S4H7qOJVd6sNsKSl5eoKIblH6VJ3U+fp5/r3r7ZI+CxERkcRYqIiISNG6Vag2b94MrVYLrVaL9evXm7e3t7dj3rx5OHbsmGwBiYhI3SyeTFFTU4Pq6mrs3r0bdnZ2SEpKQmVlJUaMGIGMjAycOnXKFjnvS8lzuFwHIqK+TOq/cZWdXW+3eETl7u6OZcuWwcnJCY6OjhgxYgSuXLmCkpISJCUlwc+Pf4yJiEg+Fo+onnrqKfN/X7hwAfv27cOHH36I4cOHAwDef/992cIRERF1+2SKM2fOIDExEUuXLjUXKSIiIrl164LfkydPIi0tDRkZGdBqtXJn6hGuAxFRX8W/b7dYLFT19fVYuHAhcnNzERAQYItMREREZhYLVX5+Ptra2rBu3TrztujoaMTExMgajIiICGALJSIiUojKzl1dbu9xU1rqG5R8/RmRJVJ+f/ndVT62UCIiIkVjoSIiIkWzqtcfABQUFCAuLk6WcERERKJ7/QUHB+O7775DXl4evL29bZGVJMR5ebJEyetA/P6qi+hef0ajEStWrEBaWpotchIRkUqJ7vX3+9//HpGRkRg2bJisAYmISN26fXr6mTNnkJycjKVLl+J//ud/UF9fj+XLl/NeVEQKoeRLDpScTWpSv1bq5skUJ0+eREJCAt544w289NJLKC8vx5kzZ6DX65GZmYmvvvoK6enpMkclIiI1EtXrb+3ateafHzt2DJs3b8Y777wjW0giIlIv9vojIiJFY68/IpKdmtao1ETqz7Wf59+73i7psxAREUmMhYqIiBStW6enb968Gfv27QMABAUFYcKECXj77bfNP29oaICfnx/ee+89eVISEZFqiWqh9PTTT6OsrAwA0NjYiJiYGCxfvlz2sER0f0peB+KaUt8k9eda2dn1douF6s4WSgDMLZRuW79+PaKjozF8+HBJghIREd3J4hrVU089hXHjxgH4vxZKQUFB5sfHjx9HfHy8rCGJiEi9un0yxZkzZ5CYmIilS5eaj56KiooQGxtrPtoiIiKSWrdOpjh58iTS0tKQkZEBrVZr3l5VVYX8/HzZwhFR93EdiPoqUS2UAKC5uRk3b96El5eXrAGJiEjdRLdQGjVqFDw9PWUNR0RExBZKRESkCJWdu7rc3u37USmVku/9oqY1AyXftpyIeje2UCIiIkXrVqHauHEjwsLCoNVqsW3bNgC3OlbodDpoNBrk5ubKGpKIiNTL4tTf8ePH8de//hUff/wxOjo6EBYWhoCAAGRkZGD79u0YMmQIkpOTcejQIfOFwLbEaSJl4OdARHKxeEQ1fvx4fPDBB3BwcMDVq1dhMpnQ0tICb29veHl5wcHBATqdDhUVFbbIS0REKtOtqT9HR0ds2rQJWq0WAQEB+PHHH+Hu7m7+uYeHBxoaGmQLSURE6tXtkynS0tJw9OhR1NfX48KFC7CzszP/TBCEux4TERFJxeIa1dmzZ2E0GjFy5Ei4uLhAo9GgoqIC9vb25t9pbGyEh4eHrEF7IyXfdoHEUdNnyksOSCksHlFdvnwZmZmZMBqNMBqNqKqqQnR0NM6fP4+LFy/CZDKhvLwcU6ZMsUVeIiJSGYtHVEFBQairq0NERATs7e2h0Wig1WoxcOBApKamoq2tDUFBQQgJCbFFXiIiUpludaZITU1FamrqXdsCAgLw8ccfyxKKiIjotl7f609NawZED6LkfUHJ2Ug8qT/Xfp5/73q7pM9CREQkMRYqIiJSNNG9/goLC6HVahEWFoacnBw8hBlEIiJSAVG9/oKCgvCnP/0Je/bsgbOzM+bOnYsjR45g0qRJtsh8F85VE93CfYH6KlG9/lxdXfHJJ5/A1dUVLS0tMBgM6N+/vy3yEhGRyojq9Td48GA4OjqiuLgY06dPh7u7O3x9feXOSkREKiSq119xcTEAICoqCseOHcOgQYOwefNm2UISEZF6ier1V1tbi1/96lfw9/eHg4MDtFotPvzwQ1vkJYVScl84JWdTCyW/b7zGSzypX2tlZ9fbRfX6GzZsGN588020tLRAEATs378f/v7+kgYmIiICRPb6e+211zBw4EBER0fD3t4ezz77LObPn2+LvEREpDK9voUSEd3CKSzq7So7d3W5nZ0piIhI0VioiIhI0US3UPriiy8QFRUFrVaLJUuWwGg0yhqUiIjUSVQLpYCAAKSmpmLLli3w9fXFkiVLUFJSgtjYWFtklhVPZX74uNYijlpeJ8D9VG0sFqo7Wyg1NDTAZDLhm2++wbhx48zdKDIzM2EymWQPS0RE6iOqhVJjYyNcXV2xePFi6PV6/OEPf2CvPyIikoWoFkpGoxHV1dVYsmQJSktL0drairy8PDlzEhGRSolqoZSTk4PAwEB4eXkBAEJDQ1FQUCB7WFtQy3w114GoN1PL94376S2iWiitWrUKX3/9Nerr6wEABw8exKhRo2QPS0RE6iOqhVJERAQGDBiAlJQUtLW1YeTIkXjrrbdskZeIiFTGYqECgNTUVKSmpt61berUqZg6daocmYiIiMweSq+/zh98JBurt865EhHR3djrj4iIeiUWKiIiUjTRvf5KS0sRFhYGnU6H1atXo6OjQ9agRESkTqJ6/QUFBeGdd95BSUkJPDw8sHLlSmzfvr3bN0/kuhIREXWXxSOqO3v9Xb16FSaTCXV1dRg3bhw8PDwAANOmTcNnn30me1giIlIfUb3+xo4di9raWtTX18NkMqGiogJNTU1yZyUiIhXq0enpra2tSElJQVhYGFxcXLB161Y88sgjCAkJQUlJCcrLy7s1jlpuRc/2J0S9A28bogyiT08/e/YsvvnmGwAw9/qrq6vD2LFjsWfPHuzcuRODBw829/0jIupNpP4fSpKeqF5/EyZMQEJCAgwGA4xGIwoKChAWFmaLvEREpDKiev3NnDkTbW1tmDNnDjo6OhAeHg6dTmeLvEREpDIPpYWSWtaoiEgeXFMSR+nr5myhREREvRILFRERKVq3C1VOTg6WLVsGACgqKjKvSy1fvhxGo1G2gEREpG7duh/V0aNHsXv3bkydOhXnz59Hfn4+SktL8eijj2LZsmUoLCxEQkKCzFG7pvQ5VyKSHvdTcXrr+2bxiOrnn39Gbm4uUlJSAABOTk7IysqCm5sb7Ozs4OPjgytXrsgelIiI1MlioVqxYgUWL16M/v37AwCGDh2KwMBAAEBzczN27NiBF198Ud6URESkWg8sVLt27cKQIUMQEBDwTz9raGjAvHnzEBkZiQkTJsgWkIiI1O2B11HNnz8fjY2NsLe3x7Vr13Djxg1ERERgzpw5SEpKQlxcHBITE3v8pLyOioiI7nW/66geeDLF7ZskArdulHj8+HGkpaUhPDwc6enpiIiIkDQkERHRvXp8HVVJSQmampqwbds26PV66PV6bNy4UY5sREREbKFERETKIGrqj4hIidTS60/ptyCx1XvHFkpERKRo3T6iysnJwU8//YR169Zh+fLlOHnyJFxcXAAAr7/+OoKDg2ULSURE6tXjFkoA8NVXX6GgoAAeHh5yZiMiIrJcqO5sofTtt9+itbUVV65cQUZGBhoaGhAcHIzXX38d/fo9nFlE9vojS7ieIY6SX6uSsymZ1O+brdbQetxCqampCRMnTkR2djaKi4tx4sQJlJSUyB6UiIjUqcctlLy8vPDuu+/Cw8MDLi4uiIuLw6FDh2QPSkRE6vTAqb8///nPaGxshF6vN7dQWrhwIWbOnIkZM2YAAARBgIPDwzvLnVMAZAm/I9RbKf27K/2t6Lve3uMWSgkJCUhOTsbEiRPh6uqKoqIivPTSS5KGJSIiuq3Hh0K+vr5YsGABYmJi0NHRAY1Gg/DwcDmyERERsYUSUV+hprP+qG9iC6WHgH84iOShlksO6Ba2UCIiIkVjoSIiIkXrdqHKycnBsmXLAADV1dWYOXMmwsPDsXTpUhiNRtkCEhGRuonq9feb3/wGW7duxYgRI5CWloaysjK8/DJPkLgX577JEq61iKOm10rdOKK6s9ffbSaTCQaDASaTCW1tbXB2dpY1JBERqZfFI6rbvf7q6+vN21auXIm4uDi4ublh2LBhCAkJkTUkERGpV497/TU2NmLDhg0oLy9HdXU1/Pz8sHbtWtmDEhGROj3wgt/58+ejsbER9vb25l5/bm5uGD16NDZt2gQAOHPmDNLT0/HJJ590+0k7f/CxPvn/UnLbes6jKwM/U6LeQdQFv131+vv3f/93vPLKK2hqasKgQYNQVVWFMWPGSJuWiIjof/W4M8WIESOwaNEixMfHw97eHt7e3li1apUc2YiIiB5Orz9O/ZEt8TMl6h0U1etPyTu7krNJiX0IichairkVPRER0cPUrUIVFxcHrVYLvV4PvV6P2tpbVbS9vR3z5s3DsWPHZA1JRETqZXHqTxAEXLhwAQcPHrzrlvPnzp1DRkYGTp06JWtAkoeSp+o4LUnUO9jqVvQWj6jOnTsHAEhMTMTMmTNRUFAAACgpKUFSUhL8/PhHgIiI5GPxiKqlpQUBAQH47W9/i/b2dsTHx+PJJ5/E0qVLAQDvv/++7CGJiEi9LBaqp59+Gk8//bT58ezZs3Ho0CEEBgbKGoyIiAjoRqE6ceIE2tvbzf3+BEG4a62qr7HV6ZZiqGWtRS2vk8TjtXHqYnGN6vr161i/fj3a2tpgMBiwe/duBAcH2yIbERGR5SOqadOmoba2FhEREejs7ERsbOxdU4FERERyeigtlIL7KfduwJz6I1I+Tv31TYpqoaRk/NISKR/3U3VhCyUiIlI0FioiIlK0bk39xcXFobm52Xxa+qpVq8wdKQoKCrB//35s375dvpRERKRaonv9AcB3332HvLw8eHt7yxaQ1EfJJ7RITS1rLezfSNYQ3evPaDRixYoVSEtLkzchERGpmuhef//1X/+FyMhIDBs2zBY5iYhIpUT1+vvP//xPPPHEE1i+fDnvRUVERLIS1etv5MiR+PLLL6HX63Hjxg00NTUhPT0d77zzjtx5VU3JazdSrhlIvf7Ai0OJejdRvf6ioqKwb98+lJWVYfXq1Rg9ejSLFBERyYK9/oiISNHY64+IuqTkKVO1TIMrndSfQz/Pv3e9XdJnISIikhgLFRERKZqoFkrBwcGorKw0/7yhoQF+fn5477335ElJRESqZXGNShAETJkypcsWSgDQ2NiImJgYbNmyBcOHD+/Wk3KN6uFjSxuyJX7fqDvudz8q0S2Ublu/fj2io6O7XaSIiIh6QnQLpcDAQFy4cAHHjx/HmjVrbJGViIhUSFQLpUOHDiEwMBBFRUWIjY2Fk5OTrCGJiEi9RLVQur1WVVVVhfz8fHkTkpmSrx0hehCuKZE1RLVQCg4ORnNzM27evAkvLy9b5CQiIpUS3UKprq4Onp6etshIREQqxhZKvYiSp/44tUNE1rrf6enduuCXlEEtxUBN19wouZ8ePXxq2hcehC2UiIhI0VioiIhI0bpVqA4cOIBZs2YhNDQUq1evBgDU1NRAp9NBo9EgNzdX1pBERKReFk+muHTpEmJjY7Fr1y48/vjjmDdvHpKTk5GVlYXt27djyJAhSE5ORnx8PIKCgrr1pFKeTME5XCJ5cP2MLFHM/agqKysRFhYGT09PODo6Ijc3Fy4uLvD29oaXlxccHByg0+lQUVEhaWAiIiKgG2f9Xbx4EY6OjkhJSUF9fT2mTp2Kp556Cu7u7ubf8fDwQENDg6xBiYhInSwWKpPJhBMnTmD79u1wdXXFq6++ikceeQR2dnbm3xEE4a7HRGR7Sr7Ojvomqad0Kzu73m6xUA0aNAgBAQEYOHAgAGD69OmoqKiAvb29+XcaGxvh4eEhTVIiIqI7WFyjmjZtGqqrq9HS0gKTyYTDhw8jJCQE58+fx8WLF2EymVBeXo4pU6bYIi8REamMxSMqPz8/JCUlITY2Fu3t7QgMDERMTAx++ctfIjU1FW1tbQgKCkJISIgt8hIRkcqw1x8pjpouOeAp4PQgSl93lH6NSuSt6ImIiB4mFioiIlI00S2UCgsLodVqERYWhpycHDyEGUQiIlIBUS2UwsPD8ac//Ql79uyBs7Mz5s6di9dffx2TJk3q1pNyjYpI+bh+Jo6a3jdbtVCyeNbfnS2UACA3NxfOzs54+eWX4ejoiJ9++gkGgwH9+/eXNDARERHQjam/29dKpaSkQK/Xo7CwEL/4xS/g6OiI4uJiTJ8+He7u7vD19bVFXiIiUhmLhcpkMuHo0aPIzs5GUVER6urqsHv3bgBAVFQUjh07hkGDBmHz5s2yhyUiIvUR1ULp888/h7e3N/z9/eHg4ACtVosPP/xQ9rC9jZquB1IyNa0ZSElNr1VKUr5vSv8bYqtef6JaKPn6+uLNN99ES0sLBEHA/v374e/vL2lgIiIiQGQLpbi4ODg7OyM6Ohr29vZ49tlnMX/+fFvkJSIilbFYqABg9uzZmD179l3boqOjER0dLUsoIiKi27pVqEgczvFTb8a1PXH4vkmPLZSIiEjRRLdQ+uKLLxAVFQWtVoslS5bAaDTKGpSIiNTJ4tTfpUuXkJWVdVcLpc8++wwrV67Eli1b4OvriyVLlqCkpASxsbG2yCwrHrb3PWr5HJR+KjP13P4rtfwcILKF0pdffolx48aZu1FkZmbCZDLJm5SISGVYpG6xWKguXrwIR0dHpKSkoL6+HlOnTsWjjz4KV1dXLF68GOfOncMzzzyDZcuW2SIvERGpjKgWSiaTCdXV1ViyZAlKS0vR2tqKvLw8W+QlIiKVEdVCKScnB4GBgfDy8gIAhIaGoqCgQN6k98F5eaJbpP7ucr1WHDW9VlsR1UJpwYIF+Prrr1FfXw8AOHjwIEaNGiV7WCIiUh9RLZRee+01jB49GikpKWhra8PIkSPx1ltv2SIvERGpjOgWSlOnTsXUqVPlyERERGTW61socT6YSB7ct0gp2EKJiIgUjYWKiIgUTXSvv9LSUoSFhUGn02H16tXo6OiQNSgREamTnSAIwoN+4dKlS4iNjb2r19+MGTOQn5+PkpISeHh4YOXKlfD29u72zRM7f/CRJLwclDwvz+taiKgvq+zc1eV2i0dUd/b6c3R0RG5uLtzd3TFu3Dh4eHgAuHWt1WeffSZtYiIiInSjUF28eBEmkwkpKSnQ6/UoLCyEr68vamtrUV9fD5PJhIqKCjQ1NdkiLxERqYzF09NNJhNOnDiB7du3w9XVFa+++iq8vb3xxhtv4NVXX8UjjzyCkJAQ/O1vf7NFXiIiUhlRvf4+//xzJCcnY8+ePQCAffv2mfv+dQfXR8Th+0YkPfYLVT5Rvf6eeuopJCQkwGAwwGg0oqCgAGFhYbbIS0REKiOq119CQgIee+wxzJkzBx0dHQgPD4dOp7NFXiIiUhmLp6fLIbjfy7Z+SiLqQ3ipRt8k+vR0IiKih4mFioiIFM3iGtWuXbvuunvv5cuXodfr8etf/xrbt2+HnZ0dRo8ejd/97ndwcnKSNSwREalPj9aozpw5g4ULF2LDhg34j//4D5SWluLRRx/FsmXLMHLkSCQkJHRrHK5RkS0peT1Dydno4ZP61HmpSf2du98aVY/uR7Vy5UosXrwYjz/+OLKysuDm5gYA8PHxwZUrV6xPSUREdI9uF6qamhrcvHkToaGhAIChQ4cCAJqbm7Fjxw6sXbtWnoRERKRq3T6ZYufOnf/UHb2hoQHz5s1DZGQkJkyYIHk4IiKibq1RGY1GBAUFoaqqCq6urgCAs2fPIikpCXFxcUhMTOzRk3KNioiI7mXVGtXp06cxfPhwc5EyGAx45ZVXkJ6ejoiICMlCEhER3atbU3+XLl2Cp6en+XFJSQmampqwbds26PV66PV6bNy4UbaQRESkXmyhREREiiDJ6en0cPGaGyJSI7ZQIiIiRWOhIiIiRRPd66+1tRUnT56Ei4sLAOD1119HcHCwfEmJiEiVRPX627lzJ+bNm4f8/Hx4eHj0+Ek7f/Dp8b+5HzWttXCNioj6MknuR3W715+LiwuuXLmCjIwM6HQ6bNq0CZ2dnZIEJSIiulO3C9Wdvf6ampowceJEZGdno7i4GCdOnEBJSYmcOYmISKW6PfWXlpYGjUaD8PDwf/pZZWUl9uzZg3fffbdbTyrldVRKb4MvJSmn66R+3ziVSLbE72/fZNXUn9FoxOeff44XXngBwK2WSvv37zf/XBAEODjwkiwi6n1YpJSvW4Xq3l5/giAgOzsb165dQ3t7O4qKinjGHxERyaJbh0H39vrz9fXFggULEBMTg46OjvtOCRIREVmLvf56ESWvx3H6hGyJ+0LfJMnp6URERLbGQkVERIomuoXSCy+8gPXr16OzsxP/+q//itWrV8PJyUnWsEREpD6iWyi99NJL2Lp1K0aMGIG0tDRMnjwZL7/cvbUnrlEREdG9JG2hNHDgQJhMJhgMBphMJrS1tcHZ2VmSoERERHfq9lW6d7ZQAm4Vrbi4OLi5uWHYsGEICQmRLSQREalXt4+odu7cifnz5wMAGhsbsWHDBpSXl6O6uhp+fn5Yu3atbCGJiEi9unVEdbuF0rp16wAAJ06cgI+PD5544gkAQFRUFNLT02ULSdRX8dYtRJaJaqHk4+ODuro6NDU1AQCqqqowZswY+VISEZFqiWqhNGLECCxatAjx8fGwt7eHt7c3Vq1aJVtIIiJSL7ZQInqIOPVH9H/ud3o6781B9BCxuBBZxhZKRESkaN0qVGVlZdBqtdBqtcjJyTFvb29vx7x583Ds2DHZAhIRkbpZnPprbW3FmjVrUFFRgf79+yMmJgY1NTXw9PRERkYGTp06ZYucRESkUhYLlclkQmdnJ1pbW+Hq6oqOjg44OzujpKQESUlJeP/9922Rk1RE6nsNcR2IqHezWKjc3NywaNEihIaGwsXFBc899xyeeeYZ+Pv7AwALFRERycriGtW3336Ljz76CAcPHsThw4fRr18/5Ofn2yIbERGR5SOq6upqBAQE4PHHHwcAzJo1C4WFhUhKSpI9HKkTp+qI6E4Wj6h8fX1RU1ODGzduQBAEHDhwgO2SiIjIZiweUU2aNAmnTp3CrFmz4OjoiDFjxmDBggW2yEZERMQWSkREpAxsodQHsC+cOFKf7i4lNX0ORGKxhRIRESkaCxURESmaVb3+AKCgoABxcXGyhCMiIhLd6+/555/Hd999h7y8PHh7e9siq+pJuZ6hpjZFSs5GRJZZPKK6s9dfR0eHudef0WjEihUrkJaWZoucRESkUqJ7/a1btw6RkZEYNmyYLXISEZFKie71V19fj8jISFtkJCIiFRPV6y8/Px9NTU3Q6/W4ceMGmpqakJ6ejnfeeUfuvCQRJa/b8LoncZS87sjPVBwlf6a2JKrX3/jx47Fv3z6UlZVh9erVGD16NIsUERHJgr3+iIhI0brVQmnBggX3LU4TJkzAhAkTJA1FRER0G5vSEhGRItyvKS1bKBERkaJ1a+qvrKwMeXl5AIApU6Zg4sSJePvtt80/b2hogJ+fH9577z15UhIRkWqJaqE0efJklJWVAQAaGxsRExOD5cuXyx5W7dRymw8ln5Kr5GxEtmaryw5Et1C6bf369YiOjsbw4cPlzElERCplsVDd2UIpKCgIQ4cOxTPPPAMAuHDhAo4fP474+HjZgxIRkTqJbqEEAEVFRYiNjYWTk5PsQYmISJ1EtVAqLCxEUlISqqqqzEWL5Kfk9Qwlt8jh2p44anmtSn6dSif1e1fZ2fV2US2UxowZg+bmZty8eRNeXl6SBiUiIrqT6BZKp0+fhqenpy0yEhGRioluoTR27FgUFxfLEoqIiOi2bhUqUgYlz8sreZ5fyetnUlLyZyA1Nb1WYgslIiJSOBYqIiJStG4Vqry8PMyYMQM6nQ5//OMfAQA1NTXQ6XTQaDTIzc2VNSQREamXxTWqmpoa7N27Fx999BFcXFywcOFCfPzxx3j77bexfft2DBkyBMnJyTh06BCCgoJskfkuSr52RMnZlIzvmzh836ivsnhEderUKUyaNAlubm6wt7fH5MmTsWvXLnh7e8PLywsODg7Q6XSoqKiwRV4iIlIZi4Vq1KhRqK6uxs8//4y2tjYcOHAA//3f/w13d3fz73h4eKChoUHWoEREpE4Wp/4CAgIwa9YsxMXFYcCAAQgICEB1dTXs7OzMvyMIwl2PiYiIpGKxUBkMBmg0GsyfPx8AsGXLFowfPx6NjY3m32lsbISHh4d8KR9AyfPoSs6mZHzfxOH7RrammPtRXb58Ga+99ho6Ojpw/fp1lJSUID09HefPn8fFixdhMplQXl6OKVOm2CIvERGpjMUjKl9fX2g0GsycORMmkwkJCQnw9/fHunXrkJqaira2NgQFBSEkJMQWeYmISGXsBEEQbP2kwf1etvVTEtFDpOQ2VlJPmSq51ZnSVXbu6nI7O1MQEZGisVAREZGiiW6hVFhYCK1Wi7CwMOTk5OAhzCASEZEKiGqhtHXrVuzcuRN79uyBs7Mz5s6diyNHjmDSpEm2yNxrsKUNkTyUvC8oOZvUFHN6elctlL799lt88skncHV1RUtLCwwGA/r372+LvEREpDKiWig1NTXB0dERxcXFmD59Otzd3eHr62uLvEREpDIWC9WdLZSSkpLg7+8PR0dHAEBUVBSOHTuGQYMGYfPmzbKHJSIi9bF4HZXBYMC1a9cwdOhQALdaKJ07dw6RkZHw9/cHAPzlL3/Bhx9+iPfee69bT8rrqOhBuLZHlvBapb5J9HVUXbVQioiIwJtvvomWlhYIgoD9+/ebixYREZGURLVQGj9+PBYsWIDo6GjY29vj2WefNTetJSIikhJbKJHicOqPLOHUX990v6k/i0dURLbGPxzisMBTX8UWSkREpGiiWyh98cUXiIqKglarxZIlS2A0GmUNSkRE6mSxUN3ZQmnPnj2ora3Fnj17kJqailWrVuGTTz4BAJSUlMgeloiI1MfiGtWdLZQAYPLkyfjd736HwMBAczeKzMxMmEwmeZOSanCtRRy1vE5A2tfK75vyiWqhdOPGDbi6umLx4sXQ6/X4wx/+wF5/REQkC1EtlACguroaS5YsQWlpKVpbW5GXlyd7WCIiUh+LU38GgwEajcZ8Qe+WLVvg6uoKPz8/eHl5AQBCQ0NRUFAgb9L74GF738PPgCzhdVTqIqqF0nvvvYevv/4a9fX1AICDBw9i1KhRsoclIiL1Ed1CadWqVUhJSUFbWxtGjhyJt956yxZ5iYhIZXp9CyVO/RGpD6f++qY+20KJXzKiW9T0P21KzsYiKj22UCIiIkVjoSIiIkUT3euvtLQUYWFh0Ol0WL16NTo6OmQNSkRE6mRxjerOXn8uLi5YuHAh8vLyUFBQgJKSEnh4eGDlypXYvn17n7h5IueXxeH79vDxfaO+yuIR1Z29/uzt7TF58mRs3rwZ48aNg4eHBwBg2rRp+Oyzz2QPS0RE6iOq19+4ceNQW1uL+vp6mEwmVFRUoKmpyRZ5iYhIZSxO/d3Z62/AgAEICAhAbW0t3njjDbz66qt45JFHEBISgr/97W+2yEtERCpj8YJfg8GAa9euYejQoQBu9fr7/vvvkZiYiOHDhwMA9u3bh48//th8ogUREZFURPX60+v1SEhIgMFggNFoREFBAcLCwmyRl4iIVEZUrz9/f38sXLgQc+bMQUdHB8LDw6HT6WyRl4iIVOah9PojIiLqLnamICIiRWOhIiIiRWOhIiIiRWOhIiIiRWOhIiIiRWOhIiIiRVNkodq7dy/CwsKg0WiwY8cOq8czGAwIDw/H5cuXrR5r8+bN0Gq10Gq1WL9+vdXjbdy4EWFhYdBqtdi2bZvV4wFATk4Oli1bZvU4cXFx0Gq10Ov10Ov1qK0V3yH9wIEDmDVrFkJDQ7F69Wqrcu3atcucSa/Xw9/fH6tWrRI9XllZmfkzzcnJsSob0PVtcXrq3u9sTU0NdDodNBoNcnNzrR4PAJYuXYrS0lKrxyoqKjJfS7l8+XIYjUarxissLIRWq0VYWBhycnLQkyto7revFxQUIC4urke5uhpv+fLl0Gg05u9eZWWl6LG++OILREVFQavVYsmSJVa9b4cOHbprn5g4cSKSk5NFZ6uursbMmTMRHh6OpUuXWv2ZWn1bKEFhfvjhB2HatGnCTz/9JPzjH/8QdDqdcObMGdHjffnll0J4eLgwatQo4dKlS1ZlO3LkiDBnzhyhra1NMBqNQnx8vPDpp5+KHu/YsWNCdHS00N7eLrS2tgrTpk0Tzp49a1XGmpoaYcKECcJbb71l1TidnZ3CpEmThPb2dqvGEQRB+P7774VJkyYJ9fX1gtFoFGJiYoS//OUvVo8rCILw97//XQgODhauXr0q6t/fuHFDeO6554SrV68K7e3twuzZs4UjR46IznPkyBEhPDxcuH79utDR0SEkJycL+/fv79EY935nW1tbhaCgIOH7778X2tvbhcTExB69f/eO98MPPwjJycnC2LFjhY8++siqbOfOnROCg4OF69evC52dncLSpUuFbdu2iR7v+++/F4KDg4V//OMfQkdHhzBnzhzh8OHDosa67cyZM8LkyZOFf/u3f7PqtQqCIISHhwsNDQ09Gqersa5fvy4EBgYK33zzjSAIgrB48WJhx44dVmW77ccffxRefPFF4fz586LHmjJlivDdd98JgiAIqampQnFxsehsZ8+eFSZPnmx+37KysoStW7d2ezxBEATFHVHV1NRg4sSJGDBgAFxdXTFjxgxUVFSIHq+4uBhZWVnmW5JYw93dHcuWLYOTkxMcHR0xYsQIXLlyRfR448ePxwcffAAHBwdcvXoVJpMJrq6uosf7+eefkZubi5SUFNFj3Hbu3DkAQGJiImbOnImCggLRY1VWViIsLAyenp5wdHREbm4u/PykuXfSypUrsXjxYgwcOFDUvzeZTOjs7ERrays6OjrQ0dEBZ2dn0Xm6ui1OT2+Bc+93tq6uDt7e3vDy8oKDgwN0Ol2P9ol7x9u7dy9efPFFhIaG9ihXV2M5OTkhKysLbm5usLOzg4+PT4/2iXvH8/LywieffAJXV1e0tLTAYDCgf//+osYCAKPRiBUrViAtLa0Hr7Lr8VpbW3HlyhVkZGRAp9Nh06ZN6OzsFDXWkSNHMG7cOPj6+gIAMjMzERwcLDrbndavX4/o6GhzL1YxY5lMJhgMBphMJrS1tfVon7h3vNOnT1t9WyiLLZRs7ccff4S7u7v5sYeHB+rq6kSPt2bNGiliAQCeeuop839fuHAB+/btw4cffmjVmI6Ojti0aRO2bt2KkJAQDB48WPRYK1aswOLFi1FfX29VJgBoaWlBQEAAfvvb36K9vR3x8fF48sknERgY2OOxLl68CEdHR6SkpKC+vh5Tp05Fenq61Rlrampw8+ZNUX9wb3Nzc8OiRYsQGhoKFxcXPPfcc3jmmWdEjzdq1ChkZ2cjOTkZLi4uOHDgQI+mroB//s52tU80NDSIHi8pKQkAcPLkyR7l6mqsoUOHmhtWNzc3Y8eOHVi7dq3o8YBb+0RxcTFycnIwduxY8x9zMWP9/ve/R2RkJIYNG9btTPcbr6mpCRMnTkRWVhYee+wxJCcno6SkBFFRUT0e6+LFi3B1dcXixYtx7tw5PPPMMz2arr/f37ULFy7g+PHjPfq719Xvrly5EnFxcXBzc8OwYcMQEhIiejxfX1+sW7cO9fX18PDwEHVbKMUdUXV2dsLOzs78WBCEux4rwZkzZ5CYmIilS5d2+/9aHiQtLQ1Hjx5FfX09iouLRY2xa9cuDBkyBAEBAVbnAYCnn34a69evx2OPPYaBAwdi9uzZOHTokKixTCYTjh49iuzsbBQVFaGurg67d++2OuPOnTutvqv0t99+i48++ggHDx7E4cOH0a9fP+Tn54se787b4iQlJcHf3x+Ojo5WZewN+0RDQwPmzZuHyMhITJgwwerxoqKicOzYMQwaNAibN28WNcaRI0dQX1+PyMhIq/MAt4723n33XXh4eMDFxQVxcXFW7RPV1dVYsmQJSktL0drairy8PKszFhUVITY2Fk5OTqLHaGxsxIYNG1BeXo7q6mr4+fn16H8+7vXkk0+abws1d+5c/PrXv+7xPqG4QuXp6YnGxkbz48bGRkmm7aRy8uRJJCQk4I033sBLL71k1Vhnz57FN998AwBwcXGBRqPB6dOnRY315z//GUeOHIFer8emTZtw4MABZGdni8524sQJHD161PxYEAQ4OIg7AB80aBACAgIwcOBAPPLII5g+fbpVR8nArSmdzz//HC+88IJV41RXVyMgIACPP/44nJycMGvWLBw/flz0eAaDARqNBnv37sX27dvh5OQELy8vqzIqfZ84e/YsoqOj8dJLL2HhwoVWjVVfX28+0nNwcIBWqxW9T5SXl+PMmTPQ6/XIzMzEV199ZdWR/OnTp7F//37zY2v3CT8/P3h5ecHe3h6hoaFW7xMAUFVVZfWdLE6cOAEfHx888cQT6NevH6KioqzaJ9ra2jB27Fjs2bMHO3fuxODBg3u8TyiuUD3//PM4evQompub0draik8//RRTpkx52LEA3NqJFi5ciA0bNkCr1Vo93uXLl5GZmQmj0Qij0Yiqqir4+/uLGmvbtm0oLy9HWVkZ0tLS8MILLyAjI0N0tuvXr2P9+vVoa2uDwWDA7t27ezSHfqdp06ahuroaLS0tMJlMOHz4MEaNGiU6G3Drj8bw4cOtWtMDbk1L1NTU4MaNGxAEAQcOHMCYMWNEj9fVbXGsmZoEAD8/P5w/fx4XL16EyWRCeXm5YvYJg8GAV155BYsWLUJiYqLV412/fh1vvvkmWlpaIAgC9u/fL3qfWLt2Lfbt24eysjKsXr0ao0ePxjvvvCM6myAIyM7OxrVr19De3o6ioiLR+8SkSZPw9ddfm6fpDx48aPU+0dzcjJs3b1r9P0Y+Pj6oq6szT89VVVVZtU/cuHHD6ttCKW6NavDgwVi8eDHi4+PR3t6O2bNnY+zYsQ87FgAgPz8fbW1tWLdunXlbdHQ0YmJiRI0XFBSEuro6REREwN7eHhqNRpICKIVp06ahtrYWERER6OzsRGxsLJ5++mlRY/n5+SEpKQmxsbFob29HYGCg1dMxly5dgqenp1VjALf+YJw6dQqzZs2Co6MjxowZgwULFoge7363xbGGs7Mz1q1bh9TUVLS1tSEoKKhHawZyKikpQVNTE7Zt22a+vOKFF17AokWLRI3n4+ODBQsWIDo6Gvb29nj22Wetnt6Viq+vLxYsWICYmBh0dHRAo9EgPDxc1FhDhgzBqlWrkJKSgra2NowcORJvvfWWVfkuX74syT4xYsQILFq0CPHx8bC3t4e3t7dVl3/8y7/8i9W3heJtPoiISNEUN/VHRER0JxYqIiJSNBYqIiJSNBYqIiJSNBYqIiJSNBYqIiJSNBYqIiJSNBYqIiJStP8PqrwjibPfrbkAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 504x504 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Generate a larger synthetic dataset for demonstration\n",
    "np.random.seed(2020)\n",
    "npx = np.zeros((100,20))\n",
    "mask = np.random.choice([True, False], npx.shape, p=[.1, .9])\n",
    "npx[mask] = np.nan\n",
    "sns.heatmap(pd.DataFrame(npx).isnull(), cmap='viridis', cbar=False);"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Lecture Outline",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "305.797px"
   },
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}