 # Appendix B: Logistic Loss¶

## Imports¶

import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
import plotly.express as px


## 1. Logistic Regression Refresher¶

Logistic Regression is a classification model where we calculate the probability of an observation belonging to a class as:

$z=w^Tx$
$\hat{y} = \frac{1}{(1+\exp(-z))}$

And then assign that observation to a class based on some threshold (usually 0.5):

$\begin{split}\text{Class }\hat{y}=\left\{ \begin{array}{ll} 0, & \hat{y}\le0.5 \\ 1, & \hat{y}>0.5 \\ \end{array} \right.\end{split}$

## 2. Motivating the Loss Function¶

• Below is the mean squared error as a loss function for optimizing linear regression:

$f(w)=\frac{1}{n}\sum^{n}_{i=1}(\hat{y}-y_i))^2$
• That won’t work for logistic regression classification problems because it ends up being “non-convex” (which basically means there are multiple minima)

• Instead we use the following loss function:

$f(w)=-\frac{1}{n}\sum_{i=1}^ny_i\log\left(\frac{1}{1 + \exp(-w^Tx_i)}\right) + (1 - y_i)\log\left(1 - \frac{1}{1 + \exp(-w^Tx_i)}\right)$
• This function is called the “log loss” or “binary cross entropy”

• I want to visually show you the differences in these two functions, and then we’ll discuss why that loss functions works

• Recall the Pokemon dataset from Chapter 1, I’m going to load that in again (and standardize the data while I’m at it):

df = pd.read_csv("data/pokemon.csv", usecols=['name', 'defense', 'attack', 'speed', 'capture_rt', 'legendary'])
x = StandardScaler().fit_transform(df.drop(columns=["name", "legendary"]))
X = np.hstack((np.ones((len(x), 1)), x))
y = df['legendary'].to_numpy()

name attack defense speed capture_rt legendary
0 Bulbasaur 49 49 45 45 0
1 Ivysaur 62 63 60 45 0
2 Venusaur 100 123 80 45 0
3 Charmander 52 43 65 45 0
4 Charmeleon 64 58 80 45 0
• The goal here is to use the features (but not “name”, that’s just there for illustration purposes) to predict the target “legendary” (which takes values of 0/No and 1/Yes).

• So we have 4 features meaning that our logistic regression model will have 5 parameters that need to be estimated (4 feature coefficients and 1 intercept)

• At this point let’s define our loss functions:

def sigmoid(w, x):
"""Sigmoid function (i.e., logistic regression predictions)."""
return 1 / (1 + np.exp(-x @ w))

def mse(w, x, y):
"""Mean squared error."""
return np.mean((sigmoid(w, x) - y) ** 2)

def logistic_loss(w, x, y):
"""Logistic loss."""
return -np.mean(y * np.log(sigmoid(w, x)) + (1 - y) * np.log(1 - sigmoid(w, x)))

• For a moment, let’s assume a value for all the parameters execpt for $$w_1$$

• We will then calculate the mean squared error for different values of $$w_1$$ as in the code below

w1_arr = np.arange(-3, 6.1, 0.1)
losses = pd.DataFrame({"w1": w1_arr,
"mse": [mse([0.5, w1, -0.5, 0.5, -2], X, y) for w1 in w1_arr],
"log": [logistic_loss([0.5, w1, -0.5, 0.5, -2], X, y) for w1 in w1_arr]})

fig = px.line(losses.melt(id_vars="w1", var_name="loss"), x="w1", y="value", color="loss", facet_col="loss", facet_col_spacing=0.1)