Chapter 3: Introduction to Pytorch & Neural Networks

By Tomas Beuzen 🚀

Chapter Learning Objectives

  • Describe the difference between NumPy and torch arrays (np.array vs. torch.Tensor).

  • Explain fundamental concepts of neural networks such as layers, nodes, activation functions, etc.

  • Create a simple neural network in PyTorch for regression or classification.


import sys
import NumPy as np
import pandas as pd
import torch
from torchsummary import summary
from torch import nn, optim
from import DataLoader, TensorDataset
from sklearn.datasets import make_regression, make_circles, make_blobs
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from utils.plotting import *

1. Introduction

PyTorch is a Python-based tool for scientific computing that provides several main features:

  • torch.Tensor, an n-dimensional array similar to that of NumPy, but which can run on GPUs

  • Computational graphs and an automatic differentiation enginge for building and training neural networks

You can install PyTorch from:

2. PyTorch’s Tensor

In PyTorch a tensor is just like NumPy’s ndarray which most readers will be familiar with already (if not, check out Chapter 5 and Chapter 6 of my Python Programming for Data Science course).

A key difference between PyTorch’s torch.Tensor and NumPy’s np.array is that torch.Tensor was constructed to integrate with GPUs and PyTorch’s computational graphs (more on that next chapter though).

2.1. ndarray vs tensor

Creating and working with tensors is much the same as with NumPy ndarrays. You can create a tensor with torch.tensor():

tensor_1 = torch.tensor([1, 2, 3])
tensor_2 = torch.tensor([1, 2, 3], dtype=torch.float32)
tensor_3 = torch.tensor(np.array([1, 2, 3]))

for t in [tensor_1, tensor_2, tensor_3]:
    print(f"{t}, dtype: {t.dtype}")
tensor([1, 2, 3]), dtype: torch.int64
tensor([1., 2., 3.]), dtype: torch.float32
tensor([1, 2, 3]), dtype: torch.int64

PyTorch also comes with most of the NumPy functions you’re probably already familiar with:

torch.zeros(2, 2)  # zeroes
tensor([[0., 0.],
        [0., 0.]])
torch.ones(2, 2)  # ones
tensor([[1., 1.],
        [1., 1.]])
torch.randn(3, 2)  # random normal
tensor([[-1.1988, -0.7157],
        [-0.1942, -1.7273],
        [-1.0674,  0.4149]])
torch.rand(2, 3, 2)  # rand uniform
tensor([[[0.0583, 0.3669],
         [0.0315, 0.9852],
         [0.1880, 0.5039]],

        [[0.0234, 0.7198],
         [0.5472, 0.1252],
         [0.1728, 0.3510]]])

Just like in NumPy we can look at the shape of a tensor with the .shape attribute:

x = torch.rand(2, 3, 2, 2)
torch.Size([2, 3, 2, 2])

2.2. Tensors and Data Types

Different data types have different memory and computational implications (see Chapter 6 of Python Programming for Data Science for more). In Pytorch we’ll be building networks that require thousands or even millions of floating point calculations! In such cases, using a smaller dtype like float32 can significantly speed up computations and reduce memory requirements. The default float dtype in pytorch float32, as opposed to NumPy’s float64. In fact some operations in Pytorch will even throw an error if you pass a high-memory dtype!


But just like in NumPy, you can always specify the particular dtype you want using the dtype argument:

print(torch.tensor([3.14159], dtype=torch.float64).dtype)

2.3. Operations on Tensors

Tensors operate just like ndarrays and have a variety of familiar methods that can be called off them:

a = torch.rand(1, 3)
b = torch.rand(3, 1)

a + b  # broadcasting betweean a 1 x 3 and 3 x 1 tensor
tensor([[1.3773, 1.5033, 1.1765],
        [0.9496, 1.0756, 0.7488],
        [1.3639, 1.4899, 1.1631]])
a * b
tensor([[0.4183, 0.5349, 0.2325],
        [0.2249, 0.2876, 0.1250],
        [0.4122, 0.5271, 0.2292]])

2.4. Indexing

Once again, same as NumPy!

X = torch.rand(5, 2)
tensor([[0.2803, 0.1461],
        [0.1740, 0.9460],
        [0.1257, 0.3427],
        [0.7001, 0.3810],
        [0.6504, 0.6580]])
print(X[0, :])
print(X[:, 0])
tensor([0.2803, 0.1461])
tensor([0.2803, 0.1461])
tensor([0.2803, 0.1740, 0.1257, 0.7001, 0.6504])

2.5. NumPy Bridge

Sometimes we might want to convert a tensor back to a NumPy array. We can do that using the .numpy() method:

X = torch.rand(3,3)
X_NumPy = X.NumPy()
<class 'torch.Tensor'>
<class 'numpy.ndarray'>

2.6. GPU and CUDA Tensors

GPU stands for “graphical processing unit” (as opposed to a CPU: central processing unit). GPUs were originally developed for gaming, they are very fast at performing operations on large amounts of data by performing them in parallel (think about updating the value of all pixels on a screen very quickly as a player moves around in a game). More recently, GPUs have been adapted for more general purpose programming. Neural networks can typically be broken into smaller computations that can be performed in parallel on a GPU. PyTorch is tightly integrated with CUDA - a software layer that facilitates interactions with a GPU (if you have one). You can check if you have GPU capability using:

torch.cuda.is_available()  # my MacBook Pro does not have a GPU

When training on a machine that has a GPU, you need to tell PyTorch you want to use it. You’ll see the following at the top of most PyTorch code:

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

You can then use the device argument when creating tensors to specify whether you wish to use a CPU or GPU. Or if you want to move a tensor between the CPU and GPU, you can use the .to() method:

X = torch.rand(2, 2, 2, device=device)
#'cuda')  # this would give me an error as I don't have a GPU so I'm commenting out

We’ll revisit GPUs later in the course when we are working with bigger datasets and more complex networks. For now, we can work on the CPU just fine.

3. Neural Network Basics

It’s probably that you’ve already learned about several machine learning algorithms (kNN, Random Forest, SVM, etc.). Neural networks are simply another algorithm and actually one of the simplest in my opinion! As we’ll see, a neural network is just a sequence of linear and non-linear transformations. Often you see something like this when learning about/using neural networks:

So what on Earth does that all mean? Well we are going to build up some intuition one step at a time.

3.1. Simple Linear Regression with a Neural Network

Let’s create a simple regression dataset with 500 observations:

X, y = make_regression(n_samples=500, n_features=1, random_state=0, noise=10.0)
plot_regression(X, y)