{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "![](../docs/banner.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Chapter 4: Style Guides, Scripts, Imports" ] }, { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Chapter Outline

\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Chapter Learning Objectives\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Describe why code style is important.\n", "- Differentiate between the role of a linter like `flake8` and an autoformatter like `black`.\n", "- Implement linting and formatting from the command line or within Jupyter or another IDE.\n", "- Write a Python module (`.py` file) in VSCode or other IDE of your choice.\n", "- Import installed or custom packages using the `import` syntax.\n", "- Explain the notion of a reference in Python.\n", "- Explain the notion of scoping in Python.\n", "- Anticipate whether changing one variable will change another in Python.\n", "- Anticipate whether a function changes the caller's version of an argument variable in Python.\n", "- Select the appropriate choice between `==` and `is` in Python." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Style Guide\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](img/chapter4/xkcd.png)\n", "\n", "Source: [xkcd.com](https://imgs.xkcd.com/comics/code_quality.png)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is incorrect to think that if your code works then you are done. Code has two \"users\" - the computer (which turns it into machine instructions) and humans, who will likely read and/or modify the code in the future. This section is about how to make your code suitable to that second audience, humans.\n", "\n", "Styling is particularly important for sharing your code to other users (including your future self!). Remember: \"Code is read much more often than it is written\". [PEP8](https://www.python.org/dev/peps/pep-0008/) is the Python Style Guide. It is worth skimming through PEP 8, but here are some highlights:\n", "- Indent using 4 spaces\n", "- Have whitespace around operators, e.g. `x = 1` not `x=1`\n", "- But avoid extra whitespace, e.g. `f(1)` not `f (1)`\n", "- Single and double quotes both fine for strings, but use `\"\"\"`triple double quotes`\"\"\"` for docstrings, not `'''`triple single quotes`'''`\n", "- Variable and function names use `underscores_between_words`\n", "- And much more...\n", "\n", "There's lots to remember but luckily **linters** & **formatters** can help you adhere to uniform styling!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linters\n", "\n", "Linting refers to highlighting programmatic and stylistic problems in your Python source code. Think of it like \"spell check\" in word processing software. Common linters include `pycodestyle`, `pylint`, `pyflakes`, `flake8`, etc. We'll use [flake8](https://flake8.pycqa.org/en/latest/) in this chapter, which, if you don't have it, you can install with:\n", "\n", "```\n", "conda install -c anaconda flake8\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`flake8` can be used from the command line to check files:\n", "\n", "```\n", "flake8 path/file_to_check.py\n", "```\n", "\n", "You can execute shell commands in Jupyter by prepending a command with an exclamation mark `!`. I've included an example script called [`bad_style.py`](https://github.com/TomasBeuzen/python-programming-for-data-science/blob/main/chapters/data/bad_style.py) in the data sub-directory of this directory, let's use `flake8` on that now:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "data/bad_style.py:1:6: E201 whitespace after '{'\n", "data/bad_style.py:1:11: E231 missing whitespace after ':'\n", "data/bad_style.py:1:14: E231 missing whitespace after ','\n", "data/bad_style.py:1:18: E231 missing whitespace after ':'\n", "data/bad_style.py:2:1: E128 continuation line under-indented for visual indent\n", "data/bad_style.py:2:4: E231 missing whitespace after ':'\n", "data/bad_style.py:4:25: E128 continuation line under-indented for visual indent\n", "data/bad_style.py:5:5: E225 missing whitespace around operator\n", "data/bad_style.py:7:80: E501 line too long (119 > 79 characters)\n", "data/bad_style.py:8:2: E111 indentation is not a multiple of four\n", "data/bad_style.py:10:2: E111 indentation is not a multiple of four\n", "data/bad_style.py:11:2: E111 indentation is not a multiple of four\n", "data/bad_style.py:12:2: E111 indentation is not a multiple of four\n", "data/bad_style.py:13:10: E701 multiple statements on one line (colon)\n", "data/bad_style.py:13:31: E261 at least two spaces before inline comment\n", "data/bad_style.py:13:31: E262 inline comment should start with '# '\n", "data/bad_style.py:14:1: E302 expected 2 blank lines, found 0\n", "data/bad_style.py:14:13: E201 whitespace after '('\n", "data/bad_style.py:14:25: E202 whitespace before ')'\n", "data/bad_style.py:15:3: E111 indentation is not a multiple of four\n", "data/bad_style.py:15:8: E211 whitespace before '('\n", "data/bad_style.py:15:19: E202 whitespace before ')'\n", "data/bad_style.py:16:11: E271 multiple spaces after keyword\n", "data/bad_style.py:17:3: E301 expected 1 blank line, found 0\n", "data/bad_style.py:17:3: E111 indentation is not a multiple of four\n", "data/bad_style.py:17:16: E231 missing whitespace after ','\n", "data/bad_style.py:18:7: E111 indentation is not a multiple of four\n", "data/bad_style.py:20:1: E305 expected 2 blank lines after class or function definition, found 0\n", "data/bad_style.py:28:2: W292 no newline at end of file\n" ] } ], "source": [ "!flake8 data/bad_style.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Formatters\n", "\n", "Formatting refers to restructuring how code appears by applying specific rules for line spacing, indents, line length, etc. Common formatters include `autopep8`, `black`, `yapf`, etc. We'll use [black](https://black.readthedocs.io/en/stable/?badge=stable) in this chapter, which, if you don't have it, you can install with:\n", "\n", "```\n", "conda install -c conda-forge black\n", "```\n", "\n", "`black` can also be used from the command line to format your files:\n", "\n", "```\n", "black path/file_to_check.py --check\n", "```\n", "\n", "The `--check` argument just checks if your code conforms to black style but doesn't reformat it in place, if you want your file reformatted, remove the argument." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1mwould reformat data/bad_style.py\u001b[0m\n", "\u001b[1mOh no! 💥 💔 💥\u001b[0m\n", "\u001b[1m1 file would be reformatted\u001b[0m.\u001b[0m\n" ] } ], "source": [ "!black data/bad_style.py --check" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here's the file content before formatting:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "x = { 'a':37,'b':42,\n", "'c':927}\n", "very_long_variable_name = {'field': 1,\n", " 'is_debug': True}\n", "this=True\n", "\n", "if very_long_variable_name is not None and very_long_variable_name[\"field\"] > 0 or very_long_variable_name['is_debug']:\n", " z = 'hello '+'world'\n", "else:\n", " world = 'world'\n", " a = 'hello {}'.format(world)\n", " f = rf'hello {world}'\n", "if (this): y = 'hello ''world'#FIXME: https://github.com/python/black/issues/26\n", "class Foo ( object ):\n", " def f (self ):\n", " return 37*-2\n", " def g(self, x,y=42):\n", " return y\n", "# fmt: off\n", "custom_formatting = [\n", " 0, 1, 2,\n", " 3, 4, 5\n", "]\n", "# fmt: on\n", "regular_formatting = [\n", " 0, 1, 2,\n", " 3, 4, 5\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "ANd here it is after formatting (note how you can toggle formatting on or off in your code using the `# fmt: off`/`# fmt: on` tags):" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "x = {\"a\": 37, \"b\": 42, \"c\": 927}\n", "very_long_variable_name = {\"field\": 1, \"is_debug\": True}\n", "this = True\n", "\n", "if (\n", " very_long_variable_name is not None\n", " and very_long_variable_name[\"field\"] > 0\n", " or very_long_variable_name[\"is_debug\"]\n", "):\n", " z = \"hello \" + \"world\"\n", "else:\n", " world = \"world\"\n", " a = \"hello {}\".format(world)\n", " f = rf\"hello {world}\"\n", "if this:\n", " y = \"hello \" \"world\" # FIXME: https://github.com/python/black/issues/26\n", "\n", "\n", "class Foo(object):\n", " def f(self):\n", " return 37 * -2\n", "\n", " def g(self, x, y=42):\n", " return y\n", "\n", "\n", "# fmt: off\n", "custom_formatting = [\n", " 0, 1, 2,\n", " 3, 4, 5\n", "]\n", "# fmt: on\n", "regular_formatting = [0, 1, 2, 3, 4, 5]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Comments\n", "\n", "Comments are important for understanding your code. While docstrings cover what a function _does_, your comments will help document _how_ your code achieves its goal. There are PEP 8 guidelines on the length, spacing, etc of comments.\n", "- **Comments**: should start with a `#` followed by a single space and be preceded by at least two spaces.\n", "- **Block Comments**: each line of a block comment should start with a `#` followed by a single space and should be indented to the same level as the code it precedes.\n", "- Generally, comments should not be unnecessarily verbose or just state the obvious, as this can be distracting and can actually make your code more difficult to read!\n", "\n", "Here is an example of a reasonable comment:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def random_walker(T):\n", " x = 0\n", " y = 0\n", "\n", " for i in range(T): \n", " \n", " # Generate a random number between 0 and 1.\n", " # Then, go right, left, up or down if the number\n", " # is in the interval [0,0.25), [0.25,0.5),\n", " # [0.5,0.75) or [0.75,1) respectively.\n", " \n", " r = random() \n", " if r < 0.25:\n", " x += 1 # Go right\n", " elif r < 0.5:\n", " x -= 1 # Go left\n", " elif r < 0.75:\n", " y += 1 # Go up\n", " else:\n", " y -= 1 # Go down\n", "\n", " print((x,y))\n", "\n", " return x**2 + y**2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are some **bad examples** of comments, because they are unnecessary or poorly formatted:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def random_walker(T):\n", " # intalize coords\n", " x = 0\n", " y = 0\n", "\n", " for i in range(T):# loop T times\n", " r = random()\n", " if r < 0.25:\n", " x += 1 # go right\n", " elif r < 0.5:\n", " x -= 1 # go left\n", " elif r < 0.75:\n", " y += 1 # go up\n", " else:\n", " y -= 1\n", "\n", " # Print the location\n", " print((x, y))\n", "\n", " # In Python, the ** operator means exponentiation.\n", " return x ** 2 + y ** 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2. Python Scripts\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Jupyter is a fantastic data science tool which allows you to code and create visualisations alongside text and images to create narratives. However, as your project grows, eventually you're going to need to create python scripts, `.py` files `.py` files are also called \"modules\" in Python and may contain functions, classes, variables, and/or runnable code. I typically start my projects in Jupyter, and then begin to offload my functions, classes, scripts, etc to `.py` files as I formalise, structure and streamline my code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### IDEs\n", "\n", "You don't need any special software to write Python modules, you can write your code using any text editor and just save your file with a `.py` extension. But software exists to make your life much easier!\n", "\n", "IDE stands for \"integrated development environment\" and refers to software that provides comprehensive functionality for code development (e.g., compiling, debugging, formatting, testing, linting, etc). In my experience the most popular out-of-the-box Python IDEs are [PyCharm](https://www.jetbrains.com/pycharm/) and [Spyder](https://www.spyder-ide.org/). There are also many editors available that can be customized with extensions to act as Python IDEs, e.g., [VSCode](https://code.visualstudio.com/), [Atom](https://atom.io/), [Sublime Text](https://www.sublimetext.com/). The benefit of these customisable editors is that they are light-weight and you can choose only the extensions you really need (as opposed to downloading a big, full-blown IDE like PyCharm).\n", "\n", "VSCode is my favourite editor at the moment and they have a [great Python tutorial online](https://code.visualstudio.com/docs/languages/python) which I'd highly recommend if you're interested! We'll do some importing of custom `.py` files in these chapters but won't do any work in an IDE." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 3. Importing\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Python can access code in another module by importing it. This is done using the `import` statement, which you've probably seen a few times already. We'll discuss importing more in DSCI 524 and you can read all about it in the [Python documentation](https://docs.python.org/3/reference/import.html) but for now, it's easiest to see it in action." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Ways of Importing Things\n", "\n", "I've written a `.py` file called `wallet.py` that contains a class `Wallet` that can be used to store, spend, and earn cash. I recommend taking a look at that file on GitHub before moving on.\n", "\n", "Let's `import` the code from `wallet.py` . We can import our `.py` file (our module) simply by:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "import wallet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can take a look at all the useable parts of that module by typing `dir(wallet)`:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['InsufficientCashError',\n", " 'Wallet',\n", " '__builtins__',\n", " '__cached__',\n", " '__doc__',\n", " '__file__',\n", " '__loader__',\n", " '__name__',\n", " '__package__',\n", " '__spec__']" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dir(wallet)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can import a package using an alias with the `as` keyword:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "import wallet as w" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w.Wallet(100)" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "wallet.InsufficientCashError()" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w.InsufficientCashError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And we can import just a specific function/class/variable from our module:" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "from wallet import Wallet" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Wallet(100) # now I can refer to it without the module name prefix" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can even mix up all these methods:" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "from wallet import Wallet as w" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 21, "metadata": {}, "output_type": "execute_result" } ], "source": [ "w(100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's also possible to import everything in a module, though this is generally not recommended:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "from wallet import *" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "Wallet(100)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "wallet.InsufficientCashError()" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "InsufficientCashError()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Importing Functions from Outside your Working Directory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "I could do `import wallet` above because `wallet.py` is in my current working directory. But there are a few extra steps needed if it is in a different location. I've included a script called `hello.py` in a `data/` sub-directory of the directory housing this notebook. All it has in it is:\n", "\n", "```python\n", "PLANET = \"Earth\"\n", "\n", "\n", "def hello_world():\n", " print(f\"Hello {PLANET}!\")\n", "```\n", "\n", "Unfortunately I can't do this:" ] }, { "cell_type": "code", "execution_count": 25, "metadata": { "tags": [ "raises-exception" ] }, "outputs": [ { "ename": "ModuleNotFoundError", "evalue": "No module named 'hello'", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mModuleNotFoundError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mhello\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mhello_world\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'hello'" ] } ], "source": [ "from hello import hello_world" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What I need to do is add this directory location to the paths that Python searches through when looking to import something. I usually do this using the `sys` module:" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['/Users/tbeuzen/GitHub/online-courses/python-programming-for-data-science/chapters',\n", " '/opt/miniconda3/lib/python37.zip',\n", " '/opt/miniconda3/lib/python3.7',\n", " '/opt/miniconda3/lib/python3.7/lib-dynload',\n", " '',\n", " '/Users/tbeuzen/.local/lib/python3.7/site-packages',\n", " '/opt/miniconda3/lib/python3.7/site-packages',\n", " '/opt/miniconda3/lib/python3.7/site-packages/IPython/extensions',\n", " '/Users/tbeuzen/.ipython',\n", " 'data/']" ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import sys\n", "sys.path.append('data/')\n", "sys.path # display the current paths Python is looking through" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See that `data/` is now a valid path. So now I can import from `hello.py`:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "from hello import hello_world, PLANET" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'Earth'" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "PLANET # note that I can import variable defined in a .py file!" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hello Earth!\n" ] } ], "source": [ "hello_world()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Packages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As your code gets more complex, grows in modules, and you wish to share it, you'll want to turn it into a Python package. Packages are logical collections of modules that can be easily imported. If you're interested in creating your own packages, take a look at the [py-pkgs book](https://ubc-mds.github.io/py-pkgs/). For now, we'll be using other people's popular data science packages, specifically, next chapter we'll look at `numpy`: \"the fundamental package for scientific computing with Python\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Importing Installed Packages" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the next few chapters we'll be using the `numpy` and `pandas` packages, which are probably the most popular for data science. When you install those packages, they are put in a location on your computer that Python already knows about, so we can simply import them at will." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([1, 2, 3])" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.array([1, 2, 3])" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([2, 6, 7])" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "np.random.randint(0, 10, 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "There are plenty of packages that come with the [Python Standard Library](https://docs.python.org/3/library/) - these do not require installation with `conda` and you'll come across them throughout your data science journey, I'll show one example, `random`, below. But for more advanced stuff you'll to install and use packages like `numpy`, `pandas` and others. If you need some specific functionality, make sure you check if there's a package for it (there often is!). For example, one functionality I often want is a progress bar when looping over a `for` loop. This is available in the [tqdm package](https://github.com/tqdm/tqdm):" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "100%|██████████| 1000000/1000000 [00:00<00:00, 2063659.83it/s]\n" ] } ], "source": [ "from tqdm import tqdm\n", "for i in tqdm(range(int(10e5))):\n", " i ** 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 4. Intriguing Behaviour in Python\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### References" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What do you think the code below will print?" ] }, { "cell_type": "code", "execution_count": 41, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "1" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = 1\n", "y = x\n", "x = 2\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And how about the next one?" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[2]" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = [1]\n", "y = x\n", "x[0] = 2\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In Python, the list `x` is a **reference** to an object in the computer's memory. When you set `y = x` these two variables now refer to the same object in memory - the one that `x` referred to. Setting `x[0] = 2` modifies the object in memory. So `x` and `y` are both modified (it makes no different if you set `x[0] = 2` or `y[0] = 2`, both modify the same memory).\n", " \n", "Here's an analogy that might help understand what's going on:\n", "- I share a Dropbox folder (or git repo) with you, and you modify it -- I sent you _the location of the stuff_ (this is like the list case)\n", "- I send you an email with a file attached, you download it and modify the file -- I sent you _the stuff itself_ (this is like the integer case)\n", "\n", "Okay, what do you think will happen here:" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1]" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = [1]\n", "y = x\n", "x = [2] # before we had x[0] = 2\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here we are not modifying the contents of `x`, we are setting `x` to refer to a new list `[2]`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Additional Weirdness" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use `id()` to return the unique id of an object in memory." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x has the value: [ 6 7 8 9 10], id: 5830667440\n", "y has the value: [1 2 3 4 5], id: 5830666800\n" ] } ], "source": [ "x = np.array([1, 2, 3, 4, 5]) # this is a numpy array which we'll learn more about next chapter\n", "y = x\n", "x = x + 5\n", "\n", "print(f\"x has the value: {x}, id: {id(x)}\")\n", "print(f\"y has the value: {y}, id: {id(y)}\")" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "x has the value: [ 6 7 8 9 10], id: 5830669856\n", "y has the value: [ 6 7 8 9 10], id: 5830669856\n" ] } ], "source": [ "x = np.array([1, 2, 3, 4, 5])\n", "y = x\n", "x += 5\n", "\n", "print(f\"x has the value: {x}, id: {id(x)}\")\n", "print(f\"y has the value: {y}, id: {id(y)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, it turns out `x += 5` is not identical `x = x + 5`. The former modifies the contents of `x`. The latter first evaluates `x + 5` to a new array of the same size, and then overwrites the name `x` with a reference to this new array.\n", "\n", "But there's good news - we don't need to memorize special rules for calling functions. Copying happens with `int`, `float`, `bool`, (maybe some other ones I'm forgetting?), the rest is \"by reference\". Now you see why we care if objects are mutable or immutable... passing around a reference can be dangerous! General rule - if you do `x = ...` then you're not modifying the original, but if you do `x.SOMETHING = y` or `x[SOMETHING] = y` or `x *= y` then you probably are." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### `copy` and `deepcopy`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can force the certain copying behaviour using the `copy` library if we want to:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [], "source": [ "import copy # part of the standard library" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/plain": [ "[2]" ] }, "execution_count": 37, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = [1]\n", "y = x\n", "x[0] = 2\n", "y" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1]" ] }, "execution_count": 38, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = [1]\n", "y = copy.copy(x) # We \"copied\" x and saved that new object as y\n", "x[0] = 2\n", "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Ok, so what do you think will happen here?" ] }, { "cell_type": "code", "execution_count": 39, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "After copy.copy():\n", "[[1], [2, 99], [3, 'hi']]\n", "[[1], [2, 99], [3, 'hi']]\n", "\n", "After modifying x:\n", "[['pikachu'], [2, 99], [3, 'hi']]\n", "[['pikachu'], [2, 99], [3, 'hi']]\n" ] } ], "source": [ "x = [[1], [2, 99], [3, \"hi\"]] # a list of lists\n", "\n", "y = copy.copy(x)\n", "print(\"After copy.copy():\")\n", "print(x)\n", "print(y)\n", "\n", "x[0][0] = \"pikachu\"\n", "print(\"\")\n", "print(\"After modifying x:\")\n", "print(x)\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But wait.. we used `copy`, why are `x` and `y` both changed in the latter example? `copy` makes the _containers_ different, i.e., only the outer list. But the outer lists contain references to objects which were not copied! This is what happens after `y = copy.copy(x)`:\n", "\n", "![](img/chapter4/copy.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use `is` to tell apart these scenarios (as opposed to `==`). `is` tells us if two objects are referring to the same object in memory, while `==` tells us if their contents are the same:" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x == y # they are both lists containing the same lists" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x is y # but they are not the *same* lists of lists" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So, by that logic we should be able to append to `y` without affecting `x`:" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[['pikachu'], [2, 99], [3, 'hi']]\n", "[['pikachu'], [2, 99], [3, 'hi'], 5]\n" ] } ], "source": [ "y.append(5)\n", "\n", "print(x)\n", "print(y)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 43, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x == y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That makes sense, as weird as it seems:\n", "\n", "![](img/chapter4/copy-append.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In short, `copy` only copies one level down. What if we want to copy everything? i.e., even the inner lists in our outer list... Enter our friend `deepcopy`:" ] }, { "cell_type": "code", "execution_count": 44, "metadata": { "collapsed": false, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[['pikachu'], [2, 99], [3, 'hi']]\n", "[[1], [2, 99], [3, 'hi']]\n" ] } ], "source": [ "x = [[1], [2, 99], [3, \"hi\"]]\n", "\n", "y = copy.deepcopy(x)\n", "\n", "x[0][0] = \"pikachu\"\n", "print(x)\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![](img/chapter4/deep-copy.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{tip}\n", "If you're interested, you can find a whole compilation of more intriguing behaviour in Python [here](https://github.com/satwikkansal/wtfpython/blob/master/README.md)!\n", "```" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.8" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Lecture Outline", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 4 }