{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](../docs/banner.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Chapter 7: Introduction to Pandas"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h2>Chapter Outline<span class=\"tocSkip\"></span></h2>\n",
    "<hr>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#1.-Introduction-to-Pandas\" data-toc-modified-id=\"1.-Introduction-to-Pandas-1\">1. Introduction to Pandas</a></span></li><li><span><a href=\"#2.-Pandas-Series\" data-toc-modified-id=\"2.-Pandas-Series-2\">2. Pandas Series</a></span></li><li><span><a href=\"#3.-Pandas-DataFrames\" data-toc-modified-id=\"3.-Pandas-DataFrames-3\">3. Pandas DataFrames</a></span></li><li><span><a href=\"#4.-Why-ndarrays-and-Series-and-DataFrames?\" data-toc-modified-id=\"4.-Why-ndarrays-and-Series-and-DataFrames?-4\">4. Why ndarrays and Series and DataFrames?</a></span></li></ul></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chapter Learning Objectives\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Create Pandas series with `pd.Series()` and Pandas dataframe with `pd.DataFrame()`\n",
    "- Be able to access values from a Series/DataFrame by indexing, slicing and boolean indexing using notation such as `df[]`, `df.loc[]`, `df.iloc[]`, `df.query[]`\n",
    "- Perform basic arithmetic operations between two series and anticipate the result.\n",
    "- Describe how Pandas assigns dtypes to Series and what the `object` dtype is\n",
    "- Read a standard .csv file from a local path or url using Pandas `pd.read_csv()`.\n",
    "- Explain the relationship and differences between `np.ndarray`, `pd.Series` and `pd.DataFrame` objects in Python."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Introduction to Pandas\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](img/chapter7/pandas.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas is most popular Python library for tabular data structures. You can think of Pandas as an extremely powerful version of Excel (but free and with a lot more features!) \n",
    "\n",
    "Pandas can be installed using `conda`:\n",
    "\n",
    "```\n",
    "conda install pandas\n",
    "```\n",
    "\n",
    "We usually import pandas with the alias `pd`. You'll see these two imports at the top of most data science workflows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Pandas Series\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What are Series?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A Series is like a NumPy array but with labels. They are strictly 1-dimensional and can contain any data type (integers, strings, floats, objects, etc), including a mix of them. Series can be created from a scalar, a list, ndarray or dictionary using `pd.Series()` (**note the captial \"S\"**). Here are some example series:\n",
    "\n",
    "![](img/chapter7/series.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating Series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, series are labelled with indices starting from 0. For example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    -5.0\n",
       "1     1.3\n",
       "2    21.0\n",
       "3     6.0\n",
       "4     3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(data = [-5, 1.3, 21, 6, 3])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "But you can add a custom index:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    -5.0\n",
       "b     1.3\n",
       "c    21.0\n",
       "d     6.0\n",
       "e     3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(data = [-5, 1.3, 21, 6, 3],\n",
    "          index = ['a', 'b', 'c', 'd', 'e'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can create a Series from a dictionary:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    10\n",
       "b    20\n",
       "c    30\n",
       "dtype: int64"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(data = {'a': 10, 'b': 20, 'c': 30})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or from an ndarray:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   -0.428301\n",
       "1   -0.104959\n",
       "2    0.170835\n",
       "dtype: float64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(data = np.random.randn(3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Or even a scalar:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    3.141\n",
       "dtype: float64"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(3.141)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "a    3.141\n",
       "b    3.141\n",
       "c    3.141\n",
       "dtype: float64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series(data=3.141, index=['a', 'b', 'c'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series Characteristics"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Series can be given a `name` attribute. I almost never use this but it might come up sometimes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   -1.363190\n",
       "1    0.425801\n",
       "2   -0.048966\n",
       "3   -0.298172\n",
       "4    1.899199\n",
       "Name: random_series, dtype: float64"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(data = np.random.randn(5), name='random_series')\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'random_series'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.name"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0   -1.363190\n",
       "1    0.425801\n",
       "2   -0.048966\n",
       "3   -0.298172\n",
       "4    1.899199\n",
       "Name: another_name, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.rename(\"another_name\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can access the index labels of your series using the `.index` attribute:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=5, step=1)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.index"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can access the underlying data array using `.to_numpy()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([-1.36319006,  0.42580052, -0.04896627, -0.29817227,  1.89919866])"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.to_numpy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([list([1, 2, 3]), 'b', 1], dtype=object)"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series([[1, 2, 3], \"b\", 1]).to_numpy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Indexing and Slicing Series"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Series are very much like ndarrays (in fact, series can be passed to most NumPy functions!). They can be indexed using square brackets `[ ]` and sliced using colon `:` notation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0\n",
       "B    1\n",
       "C    2\n",
       "D    3\n",
       "E    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series(data = range(5),\n",
    "              index = ['A', 'B', 'C', 'D', 'E'])\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "B    1\n",
       "C    2\n",
       "D    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[[1, 2, 3]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0\n",
       "B    1\n",
       "C    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[0:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note above how array-based indexing and slicing also returns the series index.\n",
    "\n",
    "Series are also like dictionaries, in that we can access values using index labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[\"A\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "B    1\n",
       "D    3\n",
       "C    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[[\"B\", \"D\", \"C\"]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0\n",
       "B    1\n",
       "C    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[\"A\":\"C\"]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"A\" in s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"Z\" in s"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Series do allow for non-unique indexing, but **be careful** because indexing operations won't return unique values:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0\n",
       "A    1\n",
       "A    2\n",
       "B    3\n",
       "C    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = pd.Series(data = range(5),\n",
    "              index = [\"A\", \"A\", \"A\", \"B\", \"C\"])\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0\n",
       "A    1\n",
       "A    2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x[\"A\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we can also do boolean indexing with series:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "B    1\n",
       "C    2\n",
       "D    3\n",
       "E    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[s >= 1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "D    3\n",
       "E    4\n",
       "dtype: int64"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s[s > s.mean()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A     True\n",
       "B    False\n",
       "C     True\n",
       "D     True\n",
       "E     True\n",
       "dtype: bool"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(s != 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Series Operations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Unlike ndarrays operations between Series (+, -, /, \\*) align values based on their **LABELS** (not their position in the structure). The resulting index will be the __*sorted union*__ of the two indexes. This gives you the flexibility to run operations on series regardless of their labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0\n",
       "B    1\n",
       "C    2\n",
       "D    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1 = pd.Series(data = range(4),\n",
    "               index = [\"A\", \"B\", \"C\", \"D\"])\n",
    "s1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "B    10\n",
       "C    11\n",
       "D    12\n",
       "E    13\n",
       "dtype: int64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s2 = pd.Series(data = range(10, 14),\n",
    "               index = [\"B\", \"C\", \"D\", \"E\"])\n",
    "s2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A     NaN\n",
       "B    11.0\n",
       "C    13.0\n",
       "D    15.0\n",
       "E     NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1 + s2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see above, indices that match will be operated on. Indices that don't match will appear in the product but with `NaN` values:\n",
    "\n",
    "![](img/chapter7/series_addition.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also perform standard operations on a series, like multiplying or squaring. NumPy also accepts series as an argument to most functions because series are built off numpy arrays (more on that later):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0\n",
       "B    1\n",
       "C    4\n",
       "D    9\n",
       "dtype: int64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1 ** 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A     1.000000\n",
       "B     2.718282\n",
       "C     7.389056\n",
       "D    20.085537\n",
       "dtype: float64"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "np.exp(s1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, just like arrays, series have many built-in methods for various operations. You can find them all by running `help(pd.Series)`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['T', 'abs', 'add', 'add_prefix', 'add_suffix', 'agg', 'aggregate', 'align', 'all', 'any', 'append', 'apply', 'argmax', 'argmin', 'argsort', 'array', 'asfreq', 'asof', 'astype', 'at', 'at_time', 'attrs', 'autocorr', 'axes', 'backfill', 'between', 'between_time', 'bfill', 'bool', 'cat', 'clip', 'combine', 'combine_first', 'compare', 'convert_dtypes', 'copy', 'corr', 'count', 'cov', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'div', 'divide', 'divmod', 'dot', 'drop', 'drop_duplicates', 'droplevel', 'dropna', 'dt', 'dtype', 'dtypes', 'duplicated', 'empty', 'eq', 'equals', 'ewm', 'expanding', 'explode', 'factorize', 'ffill', 'fillna', 'filter', 'first', 'first_valid_index', 'floordiv', 'ge', 'get', 'groupby', 'gt', 'hasnans', 'head', 'hist', 'iat', 'idxmax', 'idxmin', 'iloc', 'index', 'infer_objects', 'interpolate', 'is_monotonic', 'is_monotonic_decreasing', 'is_monotonic_increasing', 'is_unique', 'isin', 'isna', 'isnull', 'item', 'items', 'iteritems', 'keys', 'kurt', 'kurtosis', 'last', 'last_valid_index', 'le', 'loc', 'lt', 'mad', 'map', 'mask', 'max', 'mean', 'median', 'memory_usage', 'min', 'mod', 'mode', 'mul', 'multiply', 'name', 'nbytes', 'ndim', 'ne', 'nlargest', 'notna', 'notnull', 'nsmallest', 'nunique', 'pad', 'pct_change', 'pipe', 'plot', 'pop', 'pow', 'prod', 'product', 'quantile', 'radd', 'rank', 'ravel', 'rdiv', 'rdivmod', 'reindex', 'reindex_like', 'rename', 'rename_axis', 'reorder_levels', 'repeat', 'replace', 'resample', 'reset_index', 'rfloordiv', 'rmod', 'rmul', 'rolling', 'round', 'rpow', 'rsub', 'rtruediv', 'sample', 'searchsorted', 'sem', 'set_axis', 'shape', 'shift', 'size', 'skew', 'slice_shift', 'sort_index', 'sort_values', 'sparse', 'squeeze', 'std', 'str', 'sub', 'subtract', 'sum', 'swapaxes', 'swaplevel', 'tail', 'take', 'to_clipboard', 'to_csv', 'to_dict', 'to_excel', 'to_frame', 'to_hdf', 'to_json', 'to_latex', 'to_list', 'to_markdown', 'to_numpy', 'to_period', 'to_pickle', 'to_sql', 'to_string', 'to_timestamp', 'to_xarray', 'tolist', 'transform', 'transpose', 'truediv', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unique', 'unstack', 'update', 'value_counts', 'values', 'var', 'view', 'where', 'xs']\n"
     ]
    }
   ],
   "source": [
    "print([_ for _ in dir(pd.Series) if not _.startswith(\"_\")])  # print all common methods"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0\n",
       "B    1\n",
       "C    2\n",
       "D    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1.5"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1.mean()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "6"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1.sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "A    0.0\n",
       "B    1.0\n",
       "C    2.0\n",
       "D    3.0\n",
       "dtype: float64"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1.astype(float)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**\"Chaining\"** operations together is also common with pandas:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "22"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1.add(3.141).pow(2).mean().astype(int)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Types"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Series can hold all the data types (`dtypes`) you're used to, e.g., `int`, `float`, `bool`, etc. There are a few other special data types too (`object`, `DateTime` and `Categorical`) which we'll talk about in this and later chapters. You can always read more about pandas dtypes [in the documentation too](https://pandas.pydata.org/pandas-docs/stable/user_guide/basics.html#dtypes). For example, here's a series of `dtype` int64:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "dtype('int64')"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = pd.Series(range(5))\n",
    "x.dtype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The dtype \"`object`\" is used for series of strings or mixed data. Pandas is [currently experimenting](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.StringDtype.html#pandas.StringDtype) with a dedicated string dtype `StringDtype`, but it is still in testing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    A\n",
       "1    B\n",
       "dtype: object"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = pd.Series(['A', 'B'])\n",
    "x"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0                   A\n",
       "1                   1\n",
       "2    [I, AM, A, LIST]\n",
       "dtype: object"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x = pd.Series(['A', 1, [\"I\", \"AM\", \"A\", \"LIST\"]])\n",
    "x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While flexible, it is recommended to avoid the use of `object` dtypes because of higher memory requirements. Essentially, in an `object` dtype series, every single element stores information about its individual dtype. We can inspect the dtypes of all the elements in a mixed series in several ways, below I'll use the `map` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0     <class 'str'>\n",
       "1     <class 'int'>\n",
       "2    <class 'list'>\n",
       "dtype: object"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "x.map(type)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that each object in our series has a different dtype. This comes at a cost. Compare the [memory usage](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.memory_usage.html) of the series below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x1 dtype: int64\n",
      "x1 memory usage: 152 bytes\n",
      "\n",
      "x2 dtype: object\n",
      "x2 memory usage: 258 bytes\n",
      "\n",
      "x3 dtype: int8\n",
      "x3 memory usage: 131 bytes\n"
     ]
    }
   ],
   "source": [
    "x1 = pd.Series([1, 2, 3])\n",
    "print(f\"x1 dtype: {x1.dtype}\")\n",
    "print(f\"x1 memory usage: {x1.memory_usage(deep=True)} bytes\")\n",
    "print(\"\")\n",
    "x2 = pd.Series([1, 2, \"3\"])\n",
    "print(f\"x2 dtype: {x2.dtype}\")\n",
    "print(f\"x2 memory usage: {x2.memory_usage(deep=True)} bytes\")\n",
    "print(\"\")\n",
    "x3 = pd.Series([1, 2, \"3\"]).astype('int8')  # coerce the object series to int8\n",
    "print(f\"x3 dtype: {x3.dtype}\")\n",
    "print(f\"x3 memory usage: {x3.memory_usage(deep=True)} bytes\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In summary, try to use uniform dtypes where possible - they are more memory efficient!\n",
    "\n",
    "One more gotcha, `NaN` (frequently used to represent missing values in data) is a float:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "float"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(np.NaN)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This can be problematic if you have a series of integers and one missing value, because Pandas will cast the whole series to a float:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    1.0\n",
       "1    2.0\n",
       "2    3.0\n",
       "3    NaN\n",
       "dtype: float64"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series([1, 2, 3, np.NaN])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Only recently, Pandas has implemented a \"[nullable integer dtype](https://pandas.pydata.org/pandas-docs/stable/user_guide/integer_na.html)\", which can handle `NaN` in an integer series without affecting the `dtype`. Note the captial \"I\" in the type below, differentiating it from numpy's `int64` dtype:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0       1\n",
       "1       2\n",
       "2       3\n",
       "3    <NA>\n",
       "dtype: Int64"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.Series([1, 2, 3, np.NaN]).astype('Int64')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This is not the default in Pandas yet and functionality of this new feature is still subject to change."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Pandas DataFrames\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What are DataFrames?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas DataFrames are you're new best friend. They are like the Excel spreadsheets you may be used to. DataFrames are really just Series stuck together! Think of a DataFrame as a dictionary of series, with the \"keys\" being the column labels and the \"values\" being the series data:\n",
    "\n",
    "![](img/chapter7/dataframe.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dataframes can be created using `pd.DataFrame()` (note the capital \"D\" and \"F\"). Like series, index and column labels of dataframes are labelled starting from 0 by default:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   0  1  2\n",
       "0  1  2  3\n",
       "1  4  5  6\n",
       "2  7  8  9"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame([[1, 2, 3],\n",
    "              [4, 5, 6],\n",
    "              [7, 8, 9]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use the `index` and `columns` arguments to give them labels:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>C1</th>\n",
       "      <th>C2</th>\n",
       "      <th>C3</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>R1</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R2</th>\n",
       "      <td>4</td>\n",
       "      <td>5</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R3</th>\n",
       "      <td>7</td>\n",
       "      <td>8</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    C1  C2  C3\n",
       "R1   1   2   3\n",
       "R2   4   5   6\n",
       "R3   7   8   9"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame([[1, 2, 3],\n",
    "              [4, 5, 6],\n",
    "              [7, 8, 9]],\n",
    "             index = [\"R1\", \"R2\", \"R3\"],\n",
    "             columns = [\"C1\", \"C2\", \"C3\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are so many ways to create dataframes. I most often create them from dictionaries or ndarrays:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>C1</th>\n",
       "      <th>C2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>R1</th>\n",
       "      <td>1</td>\n",
       "      <td>A</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R2</th>\n",
       "      <td>2</td>\n",
       "      <td>B</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>R3</th>\n",
       "      <td>3</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    C1 C2\n",
       "R1   1  A\n",
       "R2   2  B\n",
       "R3   3  C"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame({\"C1\": [1, 2, 3],\n",
    "              \"C2\": ['A', 'B', 'C']},\n",
    "             index=[\"R1\", \"R2\", \"R3\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>col_1</th>\n",
       "      <th>col_2</th>\n",
       "      <th>col_3</th>\n",
       "      <th>col_4</th>\n",
       "      <th>col_5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>row_1</th>\n",
       "      <td>-1.511598</td>\n",
       "      <td>-1.073875</td>\n",
       "      <td>2.990474</td>\n",
       "      <td>2.408082</td>\n",
       "      <td>0.101569</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>row_2</th>\n",
       "      <td>0.767246</td>\n",
       "      <td>0.423030</td>\n",
       "      <td>-0.135450</td>\n",
       "      <td>0.369545</td>\n",
       "      <td>0.761417</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>row_3</th>\n",
       "      <td>0.714677</td>\n",
       "      <td>1.489430</td>\n",
       "      <td>0.843088</td>\n",
       "      <td>-1.284666</td>\n",
       "      <td>1.310033</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>row_4</th>\n",
       "      <td>-0.513656</td>\n",
       "      <td>0.539531</td>\n",
       "      <td>0.207057</td>\n",
       "      <td>0.425888</td>\n",
       "      <td>0.481794</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>row_5</th>\n",
       "      <td>-1.361988</td>\n",
       "      <td>-0.479158</td>\n",
       "      <td>0.158281</td>\n",
       "      <td>-0.196813</td>\n",
       "      <td>0.136745</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          col_1     col_2     col_3     col_4     col_5\n",
       "row_1 -1.511598 -1.073875  2.990474  2.408082  0.101569\n",
       "row_2  0.767246  0.423030 -0.135450  0.369545  0.761417\n",
       "row_3  0.714677  1.489430  0.843088 -1.284666  1.310033\n",
       "row_4 -0.513656  0.539531  0.207057  0.425888  0.481794\n",
       "row_5 -1.361988 -0.479158  0.158281 -0.196813  0.136745"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(np.random.randn(5, 5),\n",
    "             index=[f\"row_{_}\" for _ in range(1, 6)],\n",
    "             columns=[f\"col_{_}\" for _ in range(1, 6)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>15</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         0   1\n",
       "0      Tom   7\n",
       "1     Mike  15\n",
       "2  Tiffany   3"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.DataFrame(np.array([['Tom', 7], ['Mike', 15], ['Tiffany', 3]]))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here's a table of the main ways you can create dataframes (see the [Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html#dataframe) for more):\n",
    "\n",
    "|Create DataFrame from|Code|\n",
    "|---|---|\n",
    "|Lists of lists|`pd.DataFrame([['Tom', 7], ['Mike', 15], ['Tiffany', 3]])`|\n",
    "|ndarray|`pd.DataFrame(np.array([['Tom', 7], ['Mike', 15], ['Tiffany', 3]]))`|\n",
    "|Dictionary|`pd.DataFrame({\"Name\": ['Tom', 'Mike', 'Tiffany'], \"Number\": [7, 15, 3]})`|\n",
    "|List of tuples|`pd.DataFrame(zip(['Tom', 'Mike', 'Tiffany'], [7, 15, 3]))`|\n",
    "|Series|`pd.DataFrame({\"Name\": pd.Series(['Tom', 'Mike', 'Tiffany']), \"Number\": pd.Series([7, 15, 3])})`|\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Indexing and Slicing DataFrames"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are several main ways to select data from a DataFrame:\n",
    "1. `[]`\n",
    "2. `.loc[]`\n",
    "3. `.iloc[]`\n",
    "4. Boolean indexing\n",
    "5. `.query()`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>Python</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>R</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name Language  Courses\n",
       "0      Tom   Python        5\n",
       "1     Mike   Python        4\n",
       "2  Tiffany        R        7"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.DataFrame({\"Name\": [\"Tom\", \"Mike\", \"Tiffany\"],\n",
    "                   \"Language\": [\"Python\", \"Python\", \"R\"],\n",
    "                   \"Courses\": [5, 4, 7]})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Indexing with `[]`\n",
    "Select columns by single labels, lists of labels, or slices:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        Tom\n",
       "1       Mike\n",
       "2    Tiffany\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Name']  # returns a series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name\n",
       "0      Tom\n",
       "1     Mike\n",
       "2  Tiffany"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Name']]  # returns a dataframe!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>Python</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>R</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name Language\n",
       "0      Tom   Python\n",
       "1     Mike   Python\n",
       "2  Tiffany        R"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[['Name', 'Language']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can only index rows by using slices, not single values (but not recommended, see preferred methods below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {
    "tags": [
     "raises-exception"
    ]
   },
   "outputs": [
    {
     "ename": "KeyError",
     "evalue": "0",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[0;32m/opt/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m   2888\u001b[0m             \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2889\u001b[0;31m                 \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcasted_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2890\u001b[0m             \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/index.pyx\u001b[0m in \u001b[0;36mpandas._libs.index.IndexEngine.get_loc\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;32mpandas/_libs/hashtable_class_helper.pxi\u001b[0m in \u001b[0;36mpandas._libs.hashtable.PyObjectHashTable.get_item\u001b[0;34m()\u001b[0m\n",
      "\u001b[0;31mKeyError\u001b[0m: 0",
      "\nThe above exception was the direct cause of the following exception:\n",
      "\u001b[0;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-57-feb9bd85061b>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdf\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m \u001b[0;31m# doesn't work\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;32m/opt/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py\u001b[0m in \u001b[0;36m__getitem__\u001b[0;34m(self, key)\u001b[0m\n\u001b[1;32m   2900\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnlevels\u001b[0m \u001b[0;34m>\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2901\u001b[0m                 \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_getitem_multilevel\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2902\u001b[0;31m             \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcolumns\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2903\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mis_integer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mindexer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2904\u001b[0m                 \u001b[0mindexer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m[\u001b[0m\u001b[0mindexer\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/opt/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py\u001b[0m in \u001b[0;36mget_loc\u001b[0;34m(self, key, method, tolerance)\u001b[0m\n\u001b[1;32m   2889\u001b[0m                 \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_engine\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mget_loc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcasted_key\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2890\u001b[0m             \u001b[0;32mexcept\u001b[0m \u001b[0mKeyError\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2891\u001b[0;31m                 \u001b[0;32mraise\u001b[0m \u001b[0mKeyError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mkey\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0merr\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   2892\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   2893\u001b[0m         \u001b[0;32mif\u001b[0m \u001b[0mtolerance\u001b[0m \u001b[0;32mis\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mKeyError\u001b[0m: 0"
     ]
    }
   ],
   "source": [
    "df[0] # doesn't work"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Name Language  Courses\n",
       "0  Tom   Python        5"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[0:1] # does work"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>Python</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>R</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name Language  Courses\n",
       "1     Mike   Python        4\n",
       "2  Tiffany        R        7"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[1:] # does work"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Indexing with `.loc` and `.iloc`\n",
    "Pandas created the methods `.loc[]` and `.iloc[]` as more flexible alternatives for accessing data from a dataframe. Use `df.iloc[]` for indexing with integers and `df.loc[]` for indexing with labels. These are typically the [recommended methods of indexing](https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#ix-indexer-is-deprecated) in Pandas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>Python</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>R</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name Language  Courses\n",
       "0      Tom   Python        5\n",
       "1     Mike   Python        4\n",
       "2  Tiffany        R        7"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First we'll try out `.iloc` which accepts *integers* as references to rows/columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Name           Tom\n",
       "Language    Python\n",
       "Courses          5\n",
       "Name: 0, dtype: object"
      ]
     },
     "execution_count": 61,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[0]  # returns a series"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>Python</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Name Language  Courses\n",
       "0   Tom   Python        5\n",
       "1  Mike   Python        4"
      ]
     },
     "execution_count": 62,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[0:2]  # slicing returns a dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'R'"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[2, 1]  # returns the indexed object"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Python</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Language  Courses\n",
       "0   Python        5\n",
       "1   Python        4"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[[0, 1], [1, 2]]  # returns a dataframe"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's look at `.loc` which accepts *labels* as references to rows/columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0        Tom\n",
       "1       Mike\n",
       "2    Tiffany\n",
       "Name: Name, dtype: object"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[:, 'Name']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>Python</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>R</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name Language\n",
       "0      Tom   Python\n",
       "1     Mike   Python\n",
       "2  Tiffany        R"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[:, 'Name':'Language']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Language</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Python</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>R</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Language\n",
       "0   Python\n",
       "2        R"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[[0, 2], ['Language']]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes we want to use a mix of integers and labels to reference data in a dataframe. The easiest way to do this is to use `.loc[]` with a label then use an integer in combinations with `.index` or `.columns`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "RangeIndex(start=0, stop=3, step=1)"
      ]
     },
     "execution_count": 68,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['Name', 'Language', 'Courses'], dtype='object')"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 70,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[df.index[0], 'Courses']  # I want to reference the first row and the column named \"Courses\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'R'"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[2, df.columns[1]]  # I want to reference row \"2\" and the second column"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Boolean indexing\n",
    "Just like with series, we can select data based on boolean masks:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mike</td>\n",
       "      <td>Python</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>R</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name Language  Courses\n",
       "0      Tom   Python        5\n",
       "1     Mike   Python        4\n",
       "2  Tiffany        R        7"
      ]
     },
     "execution_count": 72,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>R</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name Language  Courses\n",
       "2  Tiffany        R        7"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['Courses'] > 5]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Name Language  Courses\n",
       "0  Tom   Python        5"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df['Name'] == \"Tom\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Indexing with `.query()`\n",
    "Boolean masks work fine, but I prefer to use the `.query()` method for selecting data. `df.query()` is a powerful tool for filtering data. It has an odd syntax, one of the strangest I've seen in Python, it is more like SQL - `df.query()` accepts a string expression to evaluate and it \"knows\" the names of the columns in your dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Name Language  Courses\n",
       "0  Tom   Python        5"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.query(\"Courses > 4 & Language == 'Python'\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note the use of single quotes AND double quotes above, lucky we have both in Python! Compare this to the equivalent boolean indexing operation and you can see that `.query()` is much more readable, especially as the query gets bigger!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  Name Language  Courses\n",
       "0  Tom   Python        5"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[(df['Courses'] > 4) & (df['Language'] == 'Python')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Query also allows you to reference variable in the current workspace using the `@` symbol:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Language</th>\n",
       "      <th>Courses</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Tom</td>\n",
       "      <td>Python</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Tiffany</td>\n",
       "      <td>R</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Name Language  Courses\n",
       "0      Tom   Python        5\n",
       "2  Tiffany        R        7"
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "course_threshold = 4\n",
    "df.query(\"Courses > @course_threshold\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Indexing cheatsheet\n",
    "\n",
    "|Method|Syntax|Output|\n",
    "|---|---|---|\n",
    "|Select column|`df[col_label]`|Series|\n",
    "|Select row slice|`df[row_1_int:row_2_int]`|DataFrame|\n",
    "|Select row/column by label|`df.loc[row_label(s), col_label(s)]`|Object for single selection, Series for one row/column, otherwise DataFrame|\n",
    "|Select row/column by integer|`df.iloc[row_int(s), col_int(s)]`|Object for single selection, Series for one row/column, otherwise DataFrame|\n",
    "|Select by row integer & column label|`df.loc[df.index[row_int], col_label]`|Object for single selection, Series for one row/column, otherwise DataFrame|\n",
    "|Select by row label & column integer|`df.loc[row_label, df.columns[col_int]]`|Object for single selection, Series for one row/column, otherwise DataFrame|\n",
    "|Select by boolean|`df[bool_vec]`|Object for single selection, Series for one row/column, otherwise DataFrame|\n",
    "|Select by boolean expression|`df.query(\"expression\")`|Object for single selection, Series for one row/column, otherwise DataFrame|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading/Writing Data From External Sources"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### .csv files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A lot of the time you will be loading .csv files for use in pandas. You can use the `pd.read_csv()` function for this. In the following chapters we'll use a real dataset of my cycling commutes to the University of British Columbia. There are so many arguments that can be used to help read in your .csv file in an efficient and appropriate manner, feel free to check them out now (by using `shift + tab` in Jupyter, or typing `help(pd.read_csv)`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Date</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2019-09-10 00:13:04</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-10 13:52:18</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-11 00:23:50</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-11 14:06:19</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-12 00:28:05</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-16 13:57:48</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2272</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-17 00:15:47</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1973</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-17 13:43:34</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2285</td>\n",
       "      <td>12.60</td>\n",
       "      <td>Raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-18 13:49:53</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2903</td>\n",
       "      <td>14.57</td>\n",
       "      <td>Raining today</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-18 00:15:52</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2101</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Pumped up tires</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-19 00:30:01</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>48062</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-19 13:52:09</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2090</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Getting colder which is nice</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-20 01:02:05</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2961</td>\n",
       "      <td>12.81</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-23 13:50:41</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2462</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-24 00:35:42</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2076</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Oiled chain, bike feels smooth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-24 13:41:24</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2321</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Bike feeling much smoother</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-25 00:07:21</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1775</td>\n",
       "      <td>12.10</td>\n",
       "      <td>Feeling really tired</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-25 13:35:41</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2124</td>\n",
       "      <td>12.65</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-26 00:13:33</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1860</td>\n",
       "      <td>12.52</td>\n",
       "      <td>raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-26 13:42:43</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2350</td>\n",
       "      <td>12.91</td>\n",
       "      <td>Detour around trucks at Jericho</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-27 01:00:18</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1712</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-09-30 13:53:52</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2118</td>\n",
       "      <td>12.71</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-01 00:15:07</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1732</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-01 13:45:55</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2222</td>\n",
       "      <td>12.82</td>\n",
       "      <td>Beautiful morning! Feeling fit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-02 00:13:09</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1756</td>\n",
       "      <td>NaN</td>\n",
       "      <td>A little tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-02 13:46:06</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2134</td>\n",
       "      <td>13.06</td>\n",
       "      <td>Bit tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-03 00:45:22</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1724</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-03 13:47:36</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2182</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Wet road</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-04 01:08:08</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-09 13:55:40</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-10 00:10:31</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-10 13:47:14</th>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2019-10-11 00:16:57</th>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                               Name  Type   Time  Distance  \\\n",
       "Date                                                         \n",
       "2019-09-10 00:13:04  Afternoon Ride  Ride   2084     12.62   \n",
       "2019-09-10 13:52:18    Morning Ride  Ride   2531     13.03   \n",
       "2019-09-11 00:23:50  Afternoon Ride  Ride   1863     12.52   \n",
       "2019-09-11 14:06:19    Morning Ride  Ride   2192     12.84   \n",
       "2019-09-12 00:28:05  Afternoon Ride  Ride   1891     12.48   \n",
       "2019-09-16 13:57:48    Morning Ride  Ride   2272     12.45   \n",
       "2019-09-17 00:15:47  Afternoon Ride  Ride   1973     12.45   \n",
       "2019-09-17 13:43:34    Morning Ride  Ride   2285     12.60   \n",
       "2019-09-18 13:49:53    Morning Ride  Ride   2903     14.57   \n",
       "2019-09-18 00:15:52  Afternoon Ride  Ride   2101     12.48   \n",
       "2019-09-19 00:30:01  Afternoon Ride  Ride  48062     12.48   \n",
       "2019-09-19 13:52:09    Morning Ride  Ride   2090     12.59   \n",
       "2019-09-20 01:02:05  Afternoon Ride  Ride   2961     12.81   \n",
       "2019-09-23 13:50:41    Morning Ride  Ride   2462     12.68   \n",
       "2019-09-24 00:35:42  Afternoon Ride  Ride   2076     12.47   \n",
       "2019-09-24 13:41:24    Morning Ride  Ride   2321     12.68   \n",
       "2019-09-25 00:07:21  Afternoon Ride  Ride   1775     12.10   \n",
       "2019-09-25 13:35:41    Morning Ride  Ride   2124     12.65   \n",
       "2019-09-26 00:13:33  Afternoon Ride  Ride   1860     12.52   \n",
       "2019-09-26 13:42:43    Morning Ride  Ride   2350     12.91   \n",
       "2019-09-27 01:00:18  Afternoon Ride  Ride   1712     12.47   \n",
       "2019-09-30 13:53:52    Morning Ride  Ride   2118     12.71   \n",
       "2019-10-01 00:15:07  Afternoon Ride  Ride   1732       NaN   \n",
       "2019-10-01 13:45:55    Morning Ride  Ride   2222     12.82   \n",
       "2019-10-02 00:13:09  Afternoon Ride  Ride   1756       NaN   \n",
       "2019-10-02 13:46:06    Morning Ride  Ride   2134     13.06   \n",
       "2019-10-03 00:45:22  Afternoon Ride  Ride   1724     12.52   \n",
       "2019-10-03 13:47:36    Morning Ride  Ride   2182     12.68   \n",
       "2019-10-04 01:08:08  Afternoon Ride  Ride   1870     12.63   \n",
       "2019-10-09 13:55:40    Morning Ride  Ride   2149     12.70   \n",
       "2019-10-10 00:10:31  Afternoon Ride  Ride   1841     12.59   \n",
       "2019-10-10 13:47:14    Morning Ride  Ride   2463     12.79   \n",
       "2019-10-11 00:16:57  Afternoon Ride  Ride   1843     11.79   \n",
       "\n",
       "                                                      Comments  \n",
       "Date                                                            \n",
       "2019-09-10 00:13:04                                       Rain  \n",
       "2019-09-10 13:52:18                                       rain  \n",
       "2019-09-11 00:23:50                  Wet road but nice weather  \n",
       "2019-09-11 14:06:19               Stopped for photo of sunrise  \n",
       "2019-09-12 00:28:05               Tired by the end of the week  \n",
       "2019-09-16 13:57:48                  Rested after the weekend!  \n",
       "2019-09-17 00:15:47                       Legs feeling strong!  \n",
       "2019-09-17 13:43:34                                    Raining  \n",
       "2019-09-18 13:49:53                              Raining today  \n",
       "2019-09-18 00:15:52                            Pumped up tires  \n",
       "2019-09-19 00:30:01                               Feeling good  \n",
       "2019-09-19 13:52:09               Getting colder which is nice  \n",
       "2019-09-20 01:02:05                               Feeling good  \n",
       "2019-09-23 13:50:41                  Rested after the weekend!  \n",
       "2019-09-24 00:35:42             Oiled chain, bike feels smooth  \n",
       "2019-09-24 13:41:24                 Bike feeling much smoother  \n",
       "2019-09-25 00:07:21                       Feeling really tired  \n",
       "2019-09-25 13:35:41               Stopped for photo of sunrise  \n",
       "2019-09-26 00:13:33                                    raining  \n",
       "2019-09-26 13:42:43            Detour around trucks at Jericho  \n",
       "2019-09-27 01:00:18               Tired by the end of the week  \n",
       "2019-09-30 13:53:52                  Rested after the weekend!  \n",
       "2019-10-01 00:15:07                       Legs feeling strong!  \n",
       "2019-10-01 13:45:55             Beautiful morning! Feeling fit  \n",
       "2019-10-02 00:13:09      A little tired today but good weather  \n",
       "2019-10-02 13:46:06           Bit tired today but good weather  \n",
       "2019-10-03 00:45:22                               Feeling good  \n",
       "2019-10-03 13:47:36                                   Wet road  \n",
       "2019-10-04 01:08:08           Very tired, riding into the wind  \n",
       "2019-10-09 13:55:40              Really cold! But feeling good  \n",
       "2019-10-10 00:10:31        Feeling good after a holiday break!  \n",
       "2019-10-10 13:47:14               Stopped for photo of sunrise  \n",
       "2019-10-11 00:16:57  Bike feeling tight, needs an oil and pump  "
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "path = 'data/cycling_data.csv'\n",
    "df = pd.read_csv(path, index_col=0, parse_dates=True)\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can print a dataframe to .csv using `df.to_csv()`. Be sure to check out all of the possible arguments to write your dataframe exactly how you want it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### url"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas also facilitates reading directly from a url - `pd.read_csv()` accepts urls as input:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Bottle</th>\n",
       "      <th>Grape</th>\n",
       "      <th>Origin</th>\n",
       "      <th>Alcohol</th>\n",
       "      <th>pH</th>\n",
       "      <th>Colour</th>\n",
       "      <th>Aroma</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>Australia</td>\n",
       "      <td>14.23</td>\n",
       "      <td>3.51</td>\n",
       "      <td>White</td>\n",
       "      <td>Floral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>Pinot Grigio</td>\n",
       "      <td>Italy</td>\n",
       "      <td>13.20</td>\n",
       "      <td>3.30</td>\n",
       "      <td>White</td>\n",
       "      <td>Fruity</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>Pinot Blanc</td>\n",
       "      <td>France</td>\n",
       "      <td>13.16</td>\n",
       "      <td>3.16</td>\n",
       "      <td>White</td>\n",
       "      <td>Citrus</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>Shiraz</td>\n",
       "      <td>Chile</td>\n",
       "      <td>14.91</td>\n",
       "      <td>3.39</td>\n",
       "      <td>Red</td>\n",
       "      <td>Berry</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>Malbec</td>\n",
       "      <td>Argentina</td>\n",
       "      <td>13.83</td>\n",
       "      <td>3.28</td>\n",
       "      <td>Red</td>\n",
       "      <td>Fruity</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   Bottle         Grape     Origin  Alcohol    pH Colour   Aroma\n",
       "0       1    Chardonnay  Australia    14.23  3.51  White  Floral\n",
       "1       2  Pinot Grigio      Italy    13.20  3.30  White  Fruity\n",
       "2       3   Pinot Blanc     France    13.16  3.16  White  Citrus\n",
       "3       4        Shiraz      Chile    14.91  3.39    Red   Berry\n",
       "4       5        Malbec  Argentina    13.83  3.28    Red  Fruity"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "url = 'https://raw.githubusercontent.com/TomasBeuzen/toy-datasets/master/wine_1.csv'\n",
    "pd.read_csv(url)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Other\n",
    "Pandas can read data from all sorts of other file types including HTML, JSON, Excel, Parquet, Feather, etc. There are generally dedicated functions for reading these file types, see the [Pandas documentation here](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-tools-text-csv-hdf5)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Common DataFrame Operations"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrames have built-in functions for performing most common operations, e.g., `.min()`, `idxmin()`, `sort_values()`, etc. They're all documented in the [Pandas documentation here](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) but I'll demonstrate a few below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>16 Sep 2019, 13:57:48</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2272</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>17 Sep 2019, 00:15:47</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1973</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>17 Sep 2019, 13:43:34</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2285</td>\n",
       "      <td>12.60</td>\n",
       "      <td>Raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>18 Sep 2019, 13:49:53</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2903</td>\n",
       "      <td>14.57</td>\n",
       "      <td>Raining today</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>18 Sep 2019, 00:15:52</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2101</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Pumped up tires</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>19 Sep 2019, 00:30:01</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>48062</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>19 Sep 2019, 13:52:09</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2090</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Getting colder which is nice</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>20 Sep 2019, 01:02:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2961</td>\n",
       "      <td>12.81</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>23 Sep 2019, 13:50:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2462</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>24 Sep 2019, 00:35:42</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2076</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Oiled chain, bike feels smooth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>24 Sep 2019, 13:41:24</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2321</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Bike feeling much smoother</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>25 Sep 2019, 00:07:21</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1775</td>\n",
       "      <td>12.10</td>\n",
       "      <td>Feeling really tired</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>25 Sep 2019, 13:35:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2124</td>\n",
       "      <td>12.65</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>26 Sep 2019, 00:13:33</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1860</td>\n",
       "      <td>12.52</td>\n",
       "      <td>raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>26 Sep 2019, 13:42:43</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2350</td>\n",
       "      <td>12.91</td>\n",
       "      <td>Detour around trucks at Jericho</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>27 Sep 2019, 01:00:18</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1712</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>30 Sep 2019, 13:53:52</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2118</td>\n",
       "      <td>12.71</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1 Oct 2019, 00:15:07</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1732</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1 Oct 2019, 13:45:55</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2222</td>\n",
       "      <td>12.82</td>\n",
       "      <td>Beautiful morning! Feeling fit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2 Oct 2019, 00:13:09</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1756</td>\n",
       "      <td>NaN</td>\n",
       "      <td>A little tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>2 Oct 2019, 13:46:06</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2134</td>\n",
       "      <td>13.06</td>\n",
       "      <td>Bit tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>3 Oct 2019, 00:45:22</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1724</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>3 Oct 2019, 13:47:36</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2182</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Wet road</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type   Time  Distance  \\\n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride   2084     12.62   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride   2531     13.03   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride   1863     12.52   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride   2192     12.84   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride   1891     12.48   \n",
       "5   16 Sep 2019, 13:57:48    Morning Ride  Ride   2272     12.45   \n",
       "6   17 Sep 2019, 00:15:47  Afternoon Ride  Ride   1973     12.45   \n",
       "7   17 Sep 2019, 13:43:34    Morning Ride  Ride   2285     12.60   \n",
       "8   18 Sep 2019, 13:49:53    Morning Ride  Ride   2903     14.57   \n",
       "9   18 Sep 2019, 00:15:52  Afternoon Ride  Ride   2101     12.48   \n",
       "10  19 Sep 2019, 00:30:01  Afternoon Ride  Ride  48062     12.48   \n",
       "11  19 Sep 2019, 13:52:09    Morning Ride  Ride   2090     12.59   \n",
       "12  20 Sep 2019, 01:02:05  Afternoon Ride  Ride   2961     12.81   \n",
       "13  23 Sep 2019, 13:50:41    Morning Ride  Ride   2462     12.68   \n",
       "14  24 Sep 2019, 00:35:42  Afternoon Ride  Ride   2076     12.47   \n",
       "15  24 Sep 2019, 13:41:24    Morning Ride  Ride   2321     12.68   \n",
       "16  25 Sep 2019, 00:07:21  Afternoon Ride  Ride   1775     12.10   \n",
       "17  25 Sep 2019, 13:35:41    Morning Ride  Ride   2124     12.65   \n",
       "18  26 Sep 2019, 00:13:33  Afternoon Ride  Ride   1860     12.52   \n",
       "19  26 Sep 2019, 13:42:43    Morning Ride  Ride   2350     12.91   \n",
       "20  27 Sep 2019, 01:00:18  Afternoon Ride  Ride   1712     12.47   \n",
       "21  30 Sep 2019, 13:53:52    Morning Ride  Ride   2118     12.71   \n",
       "22   1 Oct 2019, 00:15:07  Afternoon Ride  Ride   1732       NaN   \n",
       "23   1 Oct 2019, 13:45:55    Morning Ride  Ride   2222     12.82   \n",
       "24   2 Oct 2019, 00:13:09  Afternoon Ride  Ride   1756       NaN   \n",
       "25   2 Oct 2019, 13:46:06    Morning Ride  Ride   2134     13.06   \n",
       "26   3 Oct 2019, 00:45:22  Afternoon Ride  Ride   1724     12.52   \n",
       "27   3 Oct 2019, 13:47:36    Morning Ride  Ride   2182     12.68   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride   1870     12.63   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride   2149     12.70   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride   1841     12.59   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride   2463     12.79   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride   1843     11.79   \n",
       "\n",
       "                                     Comments  \n",
       "0                                        Rain  \n",
       "1                                        rain  \n",
       "2                   Wet road but nice weather  \n",
       "3                Stopped for photo of sunrise  \n",
       "4                Tired by the end of the week  \n",
       "5                   Rested after the weekend!  \n",
       "6                        Legs feeling strong!  \n",
       "7                                     Raining  \n",
       "8                               Raining today  \n",
       "9                             Pumped up tires  \n",
       "10                               Feeling good  \n",
       "11               Getting colder which is nice  \n",
       "12                               Feeling good  \n",
       "13                  Rested after the weekend!  \n",
       "14             Oiled chain, bike feels smooth  \n",
       "15                 Bike feeling much smoother  \n",
       "16                       Feeling really tired  \n",
       "17               Stopped for photo of sunrise  \n",
       "18                                    raining  \n",
       "19            Detour around trucks at Jericho  \n",
       "20               Tired by the end of the week  \n",
       "21                  Rested after the weekend!  \n",
       "22                       Legs feeling strong!  \n",
       "23             Beautiful morning! Feeling fit  \n",
       "24      A little tired today but good weather  \n",
       "25           Bit tired today but good weather  \n",
       "26                               Feeling good  \n",
       "27                                   Wet road  \n",
       "28           Very tired, riding into the wind  \n",
       "29              Really cold! But feeling good  \n",
       "30        Feeling good after a holiday break!  \n",
       "31               Stopped for photo of sunrise  \n",
       "32  Bike feeling tight, needs an oil and pump  "
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('data/cycling_data.csv')\n",
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Date                         1 Oct 2019, 00:15:07\n",
       "Name                               Afternoon Ride\n",
       "Type                                         Ride\n",
       "Time                                         1712\n",
       "Distance                                    11.79\n",
       "Comments    A little tired today but good weather\n",
       "dtype: object"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1712"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Time'].min()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['Time'].idxmin()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Date               27 Sep 2019, 01:00:18\n",
       "Name                      Afternoon Ride\n",
       "Type                                Ride\n",
       "Time                                1712\n",
       "Distance                           12.47\n",
       "Comments    Tired by the end of the week\n",
       "Name: 20, dtype: object"
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[20]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Date        10 Sep 2019, 00:13:0410 Sep 2019, 13:52:1811 S...\n",
       "Name        Afternoon RideMorning RideAfternoon RideMornin...\n",
       "Type        RideRideRideRideRideRideRideRideRideRideRideRi...\n",
       "Time                                                   115922\n",
       "Distance                                               392.69\n",
       "Comments    RainrainWet road but nice weatherStopped for p...\n",
       "dtype: object"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some methods like `.mean()` will only operate on numeric columns:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Time        3512.787879\n",
       "Distance      12.667419\n",
       "dtype: float64"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some methods require arguments to be specified, like `.sort_values()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>27 Sep 2019, 01:00:18</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1712</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>3 Oct 2019, 00:45:22</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1724</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1 Oct 2019, 00:15:07</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1732</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2 Oct 2019, 00:13:09</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1756</td>\n",
       "      <td>NaN</td>\n",
       "      <td>A little tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>25 Sep 2019, 00:07:21</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1775</td>\n",
       "      <td>12.10</td>\n",
       "      <td>Feeling really tired</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>26 Sep 2019, 00:13:33</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1860</td>\n",
       "      <td>12.52</td>\n",
       "      <td>raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>17 Sep 2019, 00:15:47</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1973</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>24 Sep 2019, 00:35:42</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2076</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Oiled chain, bike feels smooth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>19 Sep 2019, 13:52:09</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2090</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Getting colder which is nice</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>18 Sep 2019, 00:15:52</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2101</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Pumped up tires</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>30 Sep 2019, 13:53:52</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2118</td>\n",
       "      <td>12.71</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>25 Sep 2019, 13:35:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2124</td>\n",
       "      <td>12.65</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>2 Oct 2019, 13:46:06</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2134</td>\n",
       "      <td>13.06</td>\n",
       "      <td>Bit tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>3 Oct 2019, 13:47:36</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2182</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Wet road</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1 Oct 2019, 13:45:55</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2222</td>\n",
       "      <td>12.82</td>\n",
       "      <td>Beautiful morning! Feeling fit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>16 Sep 2019, 13:57:48</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2272</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>17 Sep 2019, 13:43:34</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2285</td>\n",
       "      <td>12.60</td>\n",
       "      <td>Raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>24 Sep 2019, 13:41:24</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2321</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Bike feeling much smoother</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>26 Sep 2019, 13:42:43</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2350</td>\n",
       "      <td>12.91</td>\n",
       "      <td>Detour around trucks at Jericho</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>23 Sep 2019, 13:50:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2462</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>18 Sep 2019, 13:49:53</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2903</td>\n",
       "      <td>14.57</td>\n",
       "      <td>Raining today</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>20 Sep 2019, 01:02:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2961</td>\n",
       "      <td>12.81</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>19 Sep 2019, 00:30:01</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>48062</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type   Time  Distance  \\\n",
       "20  27 Sep 2019, 01:00:18  Afternoon Ride  Ride   1712     12.47   \n",
       "26   3 Oct 2019, 00:45:22  Afternoon Ride  Ride   1724     12.52   \n",
       "22   1 Oct 2019, 00:15:07  Afternoon Ride  Ride   1732       NaN   \n",
       "24   2 Oct 2019, 00:13:09  Afternoon Ride  Ride   1756       NaN   \n",
       "16  25 Sep 2019, 00:07:21  Afternoon Ride  Ride   1775     12.10   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride   1841     12.59   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride   1843     11.79   \n",
       "18  26 Sep 2019, 00:13:33  Afternoon Ride  Ride   1860     12.52   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride   1863     12.52   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride   1870     12.63   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride   1891     12.48   \n",
       "6   17 Sep 2019, 00:15:47  Afternoon Ride  Ride   1973     12.45   \n",
       "14  24 Sep 2019, 00:35:42  Afternoon Ride  Ride   2076     12.47   \n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride   2084     12.62   \n",
       "11  19 Sep 2019, 13:52:09    Morning Ride  Ride   2090     12.59   \n",
       "9   18 Sep 2019, 00:15:52  Afternoon Ride  Ride   2101     12.48   \n",
       "21  30 Sep 2019, 13:53:52    Morning Ride  Ride   2118     12.71   \n",
       "17  25 Sep 2019, 13:35:41    Morning Ride  Ride   2124     12.65   \n",
       "25   2 Oct 2019, 13:46:06    Morning Ride  Ride   2134     13.06   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride   2149     12.70   \n",
       "27   3 Oct 2019, 13:47:36    Morning Ride  Ride   2182     12.68   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride   2192     12.84   \n",
       "23   1 Oct 2019, 13:45:55    Morning Ride  Ride   2222     12.82   \n",
       "5   16 Sep 2019, 13:57:48    Morning Ride  Ride   2272     12.45   \n",
       "7   17 Sep 2019, 13:43:34    Morning Ride  Ride   2285     12.60   \n",
       "15  24 Sep 2019, 13:41:24    Morning Ride  Ride   2321     12.68   \n",
       "19  26 Sep 2019, 13:42:43    Morning Ride  Ride   2350     12.91   \n",
       "13  23 Sep 2019, 13:50:41    Morning Ride  Ride   2462     12.68   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride   2463     12.79   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride   2531     13.03   \n",
       "8   18 Sep 2019, 13:49:53    Morning Ride  Ride   2903     14.57   \n",
       "12  20 Sep 2019, 01:02:05  Afternoon Ride  Ride   2961     12.81   \n",
       "10  19 Sep 2019, 00:30:01  Afternoon Ride  Ride  48062     12.48   \n",
       "\n",
       "                                     Comments  \n",
       "20               Tired by the end of the week  \n",
       "26                               Feeling good  \n",
       "22                       Legs feeling strong!  \n",
       "24      A little tired today but good weather  \n",
       "16                       Feeling really tired  \n",
       "30        Feeling good after a holiday break!  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "18                                    raining  \n",
       "2                   Wet road but nice weather  \n",
       "28           Very tired, riding into the wind  \n",
       "4                Tired by the end of the week  \n",
       "6                        Legs feeling strong!  \n",
       "14             Oiled chain, bike feels smooth  \n",
       "0                                        Rain  \n",
       "11               Getting colder which is nice  \n",
       "9                             Pumped up tires  \n",
       "21                  Rested after the weekend!  \n",
       "17               Stopped for photo of sunrise  \n",
       "25           Bit tired today but good weather  \n",
       "29              Really cold! But feeling good  \n",
       "27                                   Wet road  \n",
       "3                Stopped for photo of sunrise  \n",
       "23             Beautiful morning! Feeling fit  \n",
       "5                   Rested after the weekend!  \n",
       "7                                     Raining  \n",
       "15                 Bike feeling much smoother  \n",
       "19            Detour around trucks at Jericho  \n",
       "13                  Rested after the weekend!  \n",
       "31               Stopped for photo of sunrise  \n",
       "1                                        rain  \n",
       "8                               Raining today  \n",
       "12                               Feeling good  \n",
       "10                               Feeling good  "
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(by='Time')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>19 Sep 2019, 00:30:01</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>48062</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>20 Sep 2019, 01:02:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2961</td>\n",
       "      <td>12.81</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>18 Sep 2019, 13:49:53</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2903</td>\n",
       "      <td>14.57</td>\n",
       "      <td>Raining today</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>23 Sep 2019, 13:50:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2462</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>26 Sep 2019, 13:42:43</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2350</td>\n",
       "      <td>12.91</td>\n",
       "      <td>Detour around trucks at Jericho</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>24 Sep 2019, 13:41:24</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2321</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Bike feeling much smoother</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>17 Sep 2019, 13:43:34</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2285</td>\n",
       "      <td>12.60</td>\n",
       "      <td>Raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>16 Sep 2019, 13:57:48</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2272</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1 Oct 2019, 13:45:55</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2222</td>\n",
       "      <td>12.82</td>\n",
       "      <td>Beautiful morning! Feeling fit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>3 Oct 2019, 13:47:36</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2182</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Wet road</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>2 Oct 2019, 13:46:06</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2134</td>\n",
       "      <td>13.06</td>\n",
       "      <td>Bit tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>25 Sep 2019, 13:35:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2124</td>\n",
       "      <td>12.65</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>30 Sep 2019, 13:53:52</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2118</td>\n",
       "      <td>12.71</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>18 Sep 2019, 00:15:52</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2101</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Pumped up tires</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>19 Sep 2019, 13:52:09</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2090</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Getting colder which is nice</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>24 Sep 2019, 00:35:42</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2076</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Oiled chain, bike feels smooth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>17 Sep 2019, 00:15:47</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1973</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>26 Sep 2019, 00:13:33</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1860</td>\n",
       "      <td>12.52</td>\n",
       "      <td>raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>25 Sep 2019, 00:07:21</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1775</td>\n",
       "      <td>12.10</td>\n",
       "      <td>Feeling really tired</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2 Oct 2019, 00:13:09</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1756</td>\n",
       "      <td>NaN</td>\n",
       "      <td>A little tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1 Oct 2019, 00:15:07</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1732</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>3 Oct 2019, 00:45:22</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1724</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>27 Sep 2019, 01:00:18</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1712</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type   Time  Distance  \\\n",
       "10  19 Sep 2019, 00:30:01  Afternoon Ride  Ride  48062     12.48   \n",
       "12  20 Sep 2019, 01:02:05  Afternoon Ride  Ride   2961     12.81   \n",
       "8   18 Sep 2019, 13:49:53    Morning Ride  Ride   2903     14.57   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride   2531     13.03   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride   2463     12.79   \n",
       "13  23 Sep 2019, 13:50:41    Morning Ride  Ride   2462     12.68   \n",
       "19  26 Sep 2019, 13:42:43    Morning Ride  Ride   2350     12.91   \n",
       "15  24 Sep 2019, 13:41:24    Morning Ride  Ride   2321     12.68   \n",
       "7   17 Sep 2019, 13:43:34    Morning Ride  Ride   2285     12.60   \n",
       "5   16 Sep 2019, 13:57:48    Morning Ride  Ride   2272     12.45   \n",
       "23   1 Oct 2019, 13:45:55    Morning Ride  Ride   2222     12.82   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride   2192     12.84   \n",
       "27   3 Oct 2019, 13:47:36    Morning Ride  Ride   2182     12.68   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride   2149     12.70   \n",
       "25   2 Oct 2019, 13:46:06    Morning Ride  Ride   2134     13.06   \n",
       "17  25 Sep 2019, 13:35:41    Morning Ride  Ride   2124     12.65   \n",
       "21  30 Sep 2019, 13:53:52    Morning Ride  Ride   2118     12.71   \n",
       "9   18 Sep 2019, 00:15:52  Afternoon Ride  Ride   2101     12.48   \n",
       "11  19 Sep 2019, 13:52:09    Morning Ride  Ride   2090     12.59   \n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride   2084     12.62   \n",
       "14  24 Sep 2019, 00:35:42  Afternoon Ride  Ride   2076     12.47   \n",
       "6   17 Sep 2019, 00:15:47  Afternoon Ride  Ride   1973     12.45   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride   1891     12.48   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride   1870     12.63   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride   1863     12.52   \n",
       "18  26 Sep 2019, 00:13:33  Afternoon Ride  Ride   1860     12.52   \n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride   1843     11.79   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride   1841     12.59   \n",
       "16  25 Sep 2019, 00:07:21  Afternoon Ride  Ride   1775     12.10   \n",
       "24   2 Oct 2019, 00:13:09  Afternoon Ride  Ride   1756       NaN   \n",
       "22   1 Oct 2019, 00:15:07  Afternoon Ride  Ride   1732       NaN   \n",
       "26   3 Oct 2019, 00:45:22  Afternoon Ride  Ride   1724     12.52   \n",
       "20  27 Sep 2019, 01:00:18  Afternoon Ride  Ride   1712     12.47   \n",
       "\n",
       "                                     Comments  \n",
       "10                               Feeling good  \n",
       "12                               Feeling good  \n",
       "8                               Raining today  \n",
       "1                                        rain  \n",
       "31               Stopped for photo of sunrise  \n",
       "13                  Rested after the weekend!  \n",
       "19            Detour around trucks at Jericho  \n",
       "15                 Bike feeling much smoother  \n",
       "7                                     Raining  \n",
       "5                   Rested after the weekend!  \n",
       "23             Beautiful morning! Feeling fit  \n",
       "3                Stopped for photo of sunrise  \n",
       "27                                   Wet road  \n",
       "29              Really cold! But feeling good  \n",
       "25           Bit tired today but good weather  \n",
       "17               Stopped for photo of sunrise  \n",
       "21                  Rested after the weekend!  \n",
       "9                             Pumped up tires  \n",
       "11               Getting colder which is nice  \n",
       "0                                        Rain  \n",
       "14             Oiled chain, bike feels smooth  \n",
       "6                        Legs feeling strong!  \n",
       "4                Tired by the end of the week  \n",
       "28           Very tired, riding into the wind  \n",
       "2                   Wet road but nice weather  \n",
       "18                                    raining  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "30        Feeling good after a holiday break!  \n",
       "16                       Feeling really tired  \n",
       "24      A little tired today but good weather  \n",
       "22                       Legs feeling strong!  \n",
       "26                               Feeling good  \n",
       "20               Tired by the end of the week  "
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(by='Time', ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Some methods will operate on the index/columns, like `.sort_index()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Date</th>\n",
       "      <th>Name</th>\n",
       "      <th>Type</th>\n",
       "      <th>Time</th>\n",
       "      <th>Distance</th>\n",
       "      <th>Comments</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>11 Oct 2019, 00:16:57</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1843</td>\n",
       "      <td>11.79</td>\n",
       "      <td>Bike feeling tight, needs an oil and pump</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>10 Oct 2019, 13:47:14</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2463</td>\n",
       "      <td>12.79</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>10 Oct 2019, 00:10:31</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1841</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Feeling good after a holiday break!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>9 Oct 2019, 13:55:40</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2149</td>\n",
       "      <td>12.70</td>\n",
       "      <td>Really cold! But feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>4 Oct 2019, 01:08:08</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1870</td>\n",
       "      <td>12.63</td>\n",
       "      <td>Very tired, riding into the wind</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>3 Oct 2019, 13:47:36</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2182</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Wet road</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>3 Oct 2019, 00:45:22</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1724</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>2 Oct 2019, 13:46:06</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2134</td>\n",
       "      <td>13.06</td>\n",
       "      <td>Bit tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>2 Oct 2019, 00:13:09</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1756</td>\n",
       "      <td>NaN</td>\n",
       "      <td>A little tired today but good weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>1 Oct 2019, 13:45:55</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2222</td>\n",
       "      <td>12.82</td>\n",
       "      <td>Beautiful morning! Feeling fit</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>1 Oct 2019, 00:15:07</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1732</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>30 Sep 2019, 13:53:52</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2118</td>\n",
       "      <td>12.71</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>27 Sep 2019, 01:00:18</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1712</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>26 Sep 2019, 13:42:43</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2350</td>\n",
       "      <td>12.91</td>\n",
       "      <td>Detour around trucks at Jericho</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>26 Sep 2019, 00:13:33</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1860</td>\n",
       "      <td>12.52</td>\n",
       "      <td>raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>25 Sep 2019, 13:35:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2124</td>\n",
       "      <td>12.65</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>25 Sep 2019, 00:07:21</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1775</td>\n",
       "      <td>12.10</td>\n",
       "      <td>Feeling really tired</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>24 Sep 2019, 13:41:24</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2321</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Bike feeling much smoother</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>24 Sep 2019, 00:35:42</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2076</td>\n",
       "      <td>12.47</td>\n",
       "      <td>Oiled chain, bike feels smooth</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>23 Sep 2019, 13:50:41</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2462</td>\n",
       "      <td>12.68</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>20 Sep 2019, 01:02:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2961</td>\n",
       "      <td>12.81</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>19 Sep 2019, 13:52:09</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2090</td>\n",
       "      <td>12.59</td>\n",
       "      <td>Getting colder which is nice</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>19 Sep 2019, 00:30:01</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>48062</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Feeling good</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>18 Sep 2019, 00:15:52</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2101</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Pumped up tires</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>18 Sep 2019, 13:49:53</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2903</td>\n",
       "      <td>14.57</td>\n",
       "      <td>Raining today</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>17 Sep 2019, 13:43:34</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2285</td>\n",
       "      <td>12.60</td>\n",
       "      <td>Raining</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>17 Sep 2019, 00:15:47</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1973</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Legs feeling strong!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>16 Sep 2019, 13:57:48</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2272</td>\n",
       "      <td>12.45</td>\n",
       "      <td>Rested after the weekend!</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>12 Sep 2019, 00:28:05</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1891</td>\n",
       "      <td>12.48</td>\n",
       "      <td>Tired by the end of the week</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>11 Sep 2019, 14:06:19</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2192</td>\n",
       "      <td>12.84</td>\n",
       "      <td>Stopped for photo of sunrise</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>11 Sep 2019, 00:23:50</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>1863</td>\n",
       "      <td>12.52</td>\n",
       "      <td>Wet road but nice weather</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10 Sep 2019, 13:52:18</td>\n",
       "      <td>Morning Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2531</td>\n",
       "      <td>13.03</td>\n",
       "      <td>rain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>10 Sep 2019, 00:13:04</td>\n",
       "      <td>Afternoon Ride</td>\n",
       "      <td>Ride</td>\n",
       "      <td>2084</td>\n",
       "      <td>12.62</td>\n",
       "      <td>Rain</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                     Date            Name  Type   Time  Distance  \\\n",
       "32  11 Oct 2019, 00:16:57  Afternoon Ride  Ride   1843     11.79   \n",
       "31  10 Oct 2019, 13:47:14    Morning Ride  Ride   2463     12.79   \n",
       "30  10 Oct 2019, 00:10:31  Afternoon Ride  Ride   1841     12.59   \n",
       "29   9 Oct 2019, 13:55:40    Morning Ride  Ride   2149     12.70   \n",
       "28   4 Oct 2019, 01:08:08  Afternoon Ride  Ride   1870     12.63   \n",
       "27   3 Oct 2019, 13:47:36    Morning Ride  Ride   2182     12.68   \n",
       "26   3 Oct 2019, 00:45:22  Afternoon Ride  Ride   1724     12.52   \n",
       "25   2 Oct 2019, 13:46:06    Morning Ride  Ride   2134     13.06   \n",
       "24   2 Oct 2019, 00:13:09  Afternoon Ride  Ride   1756       NaN   \n",
       "23   1 Oct 2019, 13:45:55    Morning Ride  Ride   2222     12.82   \n",
       "22   1 Oct 2019, 00:15:07  Afternoon Ride  Ride   1732       NaN   \n",
       "21  30 Sep 2019, 13:53:52    Morning Ride  Ride   2118     12.71   \n",
       "20  27 Sep 2019, 01:00:18  Afternoon Ride  Ride   1712     12.47   \n",
       "19  26 Sep 2019, 13:42:43    Morning Ride  Ride   2350     12.91   \n",
       "18  26 Sep 2019, 00:13:33  Afternoon Ride  Ride   1860     12.52   \n",
       "17  25 Sep 2019, 13:35:41    Morning Ride  Ride   2124     12.65   \n",
       "16  25 Sep 2019, 00:07:21  Afternoon Ride  Ride   1775     12.10   \n",
       "15  24 Sep 2019, 13:41:24    Morning Ride  Ride   2321     12.68   \n",
       "14  24 Sep 2019, 00:35:42  Afternoon Ride  Ride   2076     12.47   \n",
       "13  23 Sep 2019, 13:50:41    Morning Ride  Ride   2462     12.68   \n",
       "12  20 Sep 2019, 01:02:05  Afternoon Ride  Ride   2961     12.81   \n",
       "11  19 Sep 2019, 13:52:09    Morning Ride  Ride   2090     12.59   \n",
       "10  19 Sep 2019, 00:30:01  Afternoon Ride  Ride  48062     12.48   \n",
       "9   18 Sep 2019, 00:15:52  Afternoon Ride  Ride   2101     12.48   \n",
       "8   18 Sep 2019, 13:49:53    Morning Ride  Ride   2903     14.57   \n",
       "7   17 Sep 2019, 13:43:34    Morning Ride  Ride   2285     12.60   \n",
       "6   17 Sep 2019, 00:15:47  Afternoon Ride  Ride   1973     12.45   \n",
       "5   16 Sep 2019, 13:57:48    Morning Ride  Ride   2272     12.45   \n",
       "4   12 Sep 2019, 00:28:05  Afternoon Ride  Ride   1891     12.48   \n",
       "3   11 Sep 2019, 14:06:19    Morning Ride  Ride   2192     12.84   \n",
       "2   11 Sep 2019, 00:23:50  Afternoon Ride  Ride   1863     12.52   \n",
       "1   10 Sep 2019, 13:52:18    Morning Ride  Ride   2531     13.03   \n",
       "0   10 Sep 2019, 00:13:04  Afternoon Ride  Ride   2084     12.62   \n",
       "\n",
       "                                     Comments  \n",
       "32  Bike feeling tight, needs an oil and pump  \n",
       "31               Stopped for photo of sunrise  \n",
       "30        Feeling good after a holiday break!  \n",
       "29              Really cold! But feeling good  \n",
       "28           Very tired, riding into the wind  \n",
       "27                                   Wet road  \n",
       "26                               Feeling good  \n",
       "25           Bit tired today but good weather  \n",
       "24      A little tired today but good weather  \n",
       "23             Beautiful morning! Feeling fit  \n",
       "22                       Legs feeling strong!  \n",
       "21                  Rested after the weekend!  \n",
       "20               Tired by the end of the week  \n",
       "19            Detour around trucks at Jericho  \n",
       "18                                    raining  \n",
       "17               Stopped for photo of sunrise  \n",
       "16                       Feeling really tired  \n",
       "15                 Bike feeling much smoother  \n",
       "14             Oiled chain, bike feels smooth  \n",
       "13                  Rested after the weekend!  \n",
       "12                               Feeling good  \n",
       "11               Getting colder which is nice  \n",
       "10                               Feeling good  \n",
       "9                             Pumped up tires  \n",
       "8                               Raining today  \n",
       "7                                     Raining  \n",
       "6                        Legs feeling strong!  \n",
       "5                   Rested after the weekend!  \n",
       "4                Tired by the end of the week  \n",
       "3                Stopped for photo of sunrise  \n",
       "2                   Wet road but nice weather  \n",
       "1                                        rain  \n",
       "0                                        Rain  "
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_index(ascending=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Why ndarrays and Series and DataFrames?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "At this point, you might be asking why we need all these different data structures. Well, they all serve different purposes and are suited to different tasks. For example:\n",
    "- NumPy is typically faster/uses less memory than Pandas;\n",
    "- not all Python packages are compatible with NumPy & Pandas;\n",
    "- the ability to add labels to data can be useful (e.g., for time series);\n",
    "- NumPy and Pandas have different built-in functions available.\n",
    "\n",
    "My advice: use the simplest data structure that fulfills your needs!\n",
    "\n",
    "Finally, we've seen how to go from: ndarray (`np.array()`) -> series (`pd.series()`) -> dataframe (`pd.DataFrame()`). Remember that we can also go the other way: dataframe/series -> ndarray using `df.to_numpy()`."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.8"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Lecture Outline",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "305.797px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}