Introduction

  • The MNIST database of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image.
  • This code will help help you learn the technique of image classification using Pytorch
  • The problem that we are trying to solve in this example - is to build a Neural Network that can classify handwritten images with maximum accuracy
import torch
import torchvision
from torchvision.datasets import MNIST
from torchvision.transforms import transforms
import matplotlib.pyplot as plt
from torch.utils.data import random_split
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.nn.functional as F
from torchvision.utils import make_grid
  • this line of code will download the images(60000) in the defined root folder
  • download = True - when ran once will nor re-download all the images if it finds the images in the root path
  • train = True means it is the training set
dataset = MNIST(root='Deep_Learning_Explorations/data/',train = True, download=True,transform=transforms.ToTensor())
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to Deep_Learning_Explorations/data/MNIST/raw/train-images-idx3-ubyte.gz
Extracting Deep_Learning_Explorations/data/MNIST/raw/train-images-idx3-ubyte.gz to Deep_Learning_Explorations/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to Deep_Learning_Explorations/data/MNIST/raw/train-labels-idx1-ubyte.gz
Extracting Deep_Learning_Explorations/data/MNIST/raw/train-labels-idx1-ubyte.gz to Deep_Learning_Explorations/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to Deep_Learning_Explorations/data/MNIST/raw/t10k-images-idx3-ubyte.gz
Extracting Deep_Learning_Explorations/data/MNIST/raw/t10k-images-idx3-ubyte.gz to Deep_Learning_Explorations/data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to Deep_Learning_Explorations/data/MNIST/raw/t10k-labels-idx1-ubyte.gz
Extracting Deep_Learning_Explorations/data/MNIST/raw/t10k-labels-idx1-ubyte.gz to Deep_Learning_Explorations/data/MNIST/raw

  • Dataset is a tuple of image and the label
  • indexing the dataset will show us that
  • shape of the image is 12828
  • 28 * 28 are the pixel values ranging from values 0 to 1
  • this image has a just one channel as it is a gray scale image
img,label = dataset[89]
# permute makes the channel dimension as the last dimension as it helps to visualize the image using plt.imshow
img_viz = img.permute(1,2,0)
plt.imshow(img_viz, cmap = 'gray');
img_viz = img.permute(1,2,0)
print(img_viz.shape)
# plt.imshow(img_viz, cmap = 'gray');
torch.Size([28, 28, 1])

Split Dataset - Training and Validation

# Split the dataset into a training and the validation set
tr_data, val_data = random_split(dataset,[50000,10000])
len(tr_data), len(val_data)

(50000, 10000)

Training and Validation Dataloader

# Dataloader helps in converting the dataset into batches of data by describing the batch_size
tr_loader = DataLoader(tr_data,batch_size=200,shuffle=True)
val_loader = DataLoader(val_data,batch_size=200)
# We can see that the data is a batch of 128 images and 120 labels
for data,label in tr_loader:
    print(data.shape)
    print(len(label))
    break

torch.Size([200, 1, 28, 28])
200
# x@w.t()+ bias - Use the same equation
model = nn.Linear(in_features= 784,out_features=10)

# A model will randomly initialize the parameters(wt's and biases)
model.weight.shape, model.bias.shape

(torch.Size([10, 784]), torch.Size([10]))
for img,label in tr_loader:
    # shape of the batch in the beggining
    print(img.shape)
    #this functionality can be added inside the model in the forward method
    img = img.reshape(-1,28*28)
    # shape after reshaping into a vector of 784 elements
    print(img.shape)
    # applying the model to the reshaped img
    pred = model(img)
    print(model(img).shape)
    # As we can see that the model has outputted 10 probabilities which is a prob for all elements from 0-9
    break

torch.Size([200, 1, 28, 28])
torch.Size([200, 784])
torch.Size([200, 10])
print(pred)
pred.shape

tensor([[ 0.3057,  0.0996,  0.1668,  ...,  0.1199, -0.0800, -0.5246],
        [ 0.0253,  0.1133,  0.0480,  ..., -0.1510,  0.0446, -0.1409],
        [ 0.0031,  0.1342,  0.2246,  ..., -0.1099, -0.1087, -0.2773],
        ...,
        [ 0.0612,  0.0829, -0.2208,  ..., -0.0552, -0.0889,  0.3556],
        [-0.1947,  0.0550,  0.1493,  ...,  0.0518, -0.1461,  0.0480],
        [ 0.2934, -0.0702,  0.2726,  ...,  0.0271, -0.1594, -0.1882]],
       grad_fn=<AddmmBackward0>)
torch.Size([200, 10])

Additional functionality to our NN model

To add additional functionality to the NN model we need to create a MnistModel class and inheret the nn.Module. Addiditional functionality in this is the step of reshaping the batch of data passing through the model to a vector of 784 pixels

# Add additional functionality to the NN model
# To add additional functionality to a NN model - which is the reshaping of a bunch of images - we have the use the concepts of OOP and inheritance. The forward method is the one that is applied to the bunch of images

class MnistModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(784, 10)
        
    def forward(self, xb):
        xb = xb.reshape(-1, 784)
        out = self.linear(xb)
        return out
    
model = MnistModel()
# shape of wts and bias
model.linear.weight.shape, model.linear.bias.shape

(torch.Size([10, 784]), torch.Size([10]))

Softmax

Softmax converts a vector of K real numbers into a probability distribution of K possible outcomes

It is basically calculated by taking the exponent of preds and dividing by their sum to make sure its sum is 1

# We can see that the model from the above class works
for dta,label in tr_loader:
    pred = model(dta)
    print(pred.shape)
    print(label.shape)
    break

# We will apply softmax now - which converts the probability b/w 0 and 1 and the sum is 1
torch.sum(F.softmax(pred[0])).item()

# Applying softmax on the whole batch
pred_s = F.softmax(pred,dim=1)

# torch amx function gives us the index of the max probability as well as the probability
index_prob,prob = torch.max(pred_s,dim=1)

index_prob.shape,prob.shape

torch.Size([200, 10])
torch.Size([200])
/var/folders/j4/0sh22ln930vdhyh1wkttl89m0000gp/T/ipykernel_30681/929876940.py:10: UserWarning: Implicit dimension choice for softmax has been deprecated. Change the call to include dim=X as an argument.
  torch.sum(F.softmax(pred[0])).item()
(torch.Size([200]), torch.Size([200]))

Accuracy

The predictions are converted into probabilities and the highest probability is calculated using the max function

The index of the highest probability is then compared to the actual label and the accuracy % is calculated by divding the correct predictions with the total images

def metric_acc(outputs, labels):
    _, preds = torch.max(outputs, dim=1)
    return torch.tensor(torch.sum(preds == labels).item() / len(preds))

# We will get the same value even if we do not apply the softmax as
# e^x is an increasing function, i.e., if y1 > y2, then e^y1 > e^y2. The same holds after averaging out the values to get the softmax.
metric_acc(pred_s,label)

tensor(0.0650)
# cross entropy -ve log of predicted prob
# loss function will be cross_entropy
loss = F.cross_entropy(pred, label)
print(loss)

tensor(2.3333, grad_fn=<NllLossBackward0>)

Model Training

This fit function is the training step. This training step invovles training the model on the training dataloader, calculating the loss, calculating the gradient for the train loader and updating the weights and reseting the gradient at the end.

For the second part of the loop - we validate the model on the validation dataloader. The steps include calculating the loss and accuracy after each epoch and printing them at the end. We can notice that the loss and the accuracy on the validation set improves after each epoch

class MnistModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(784, 10)
        
    def forward(self, xb):
        xb = xb.reshape(-1, 784)
        out = self.linear(xb)
        return out
    
def results_epoch(out_lst):
    val_loss_epoch = torch.stack([dct['val_loss'] for dct in out_lst]).mean()
    val_acc_epoch = torch.stack([dct['val_acc'] for dct in out_lst]).mean()
    return {'val_loss': val_loss_epoch.item(), 'val_acc': val_acc_epoch.item()}

def final_output(dct,epoch):
    print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, dct['val_loss'], dct['val_acc']))
   
def validation(batch_val,model):
    img_val,label_val = batch_val
    pred_val = model(img_val)
    # loss is computed
    loss_val = F.cross_entropy(pred_val,label_val)
    # Accuracy is computed
    acc_val = metric_acc(pred_val,label_val)
    return {'val_loss': loss_val, 'val_acc': acc_val}
    
def check_scores(val_loader,model):
    out_lst = [validation(batch_val,model) for batch_val in val_loader]
    return results_epoch(out_lst)

def fit(epochs,learning_rate,model,train_loader,val_loader,opt_func=torch.optim.SGD):
    optimizer = torch.optim.SGD(model.parameters(), lr= learning_rate , momentum=0.9)
    out_lst = []
    for epoch in range(epochs):
        
        # Training step on the training dataloader
        for batch in tr_loader:
            #extract batch of images and label
            img,label = batch
            #calculate prediction using the MNISTMODEL class initialized above
            pred = model(img)
            #Since this is a multi-label image classification model- the loss function is cross entropy
            loss = F.cross_entropy(pred,label)
            # In this step we calculate the gradient of the loss function with respect to the parameters or 784 pixels in this case
            loss.backward()
            # In this step we update the weights
            optimizer.step()
            # We make the gradient zero again so that now the gradients are not calculated untill the training is not done
            optimizer.zero_grad()
         
        # Validation on the validation dataloader   
        output = check_scores(val_loader,model)
        out_lst.append(output)
        final_output(output,epoch)
        
    return out_lst
model = MnistModel()
starting =  check_scores(val_loader,model)
print(starting)
out_lst = fit(20,learning_rate=0.002,model= model,train_loader = tr_loader,val_loader = val_loader)

{'val_loss': 2.3184239864349365, 'val_acc': 0.08229999244213104}
Epoch [0], val_loss: 0.7917, val_acc: 0.8452
Epoch [1], val_loss: 0.6040, val_acc: 0.8643
Epoch [2], val_loss: 0.5287, val_acc: 0.8742
Epoch [3], val_loss: 0.4856, val_acc: 0.8790
Epoch [4], val_loss: 0.4578, val_acc: 0.8844
Epoch [5], val_loss: 0.4373, val_acc: 0.8865
Epoch [6], val_loss: 0.4217, val_acc: 0.8912
Epoch [7], val_loss: 0.4095, val_acc: 0.8921
Epoch [8], val_loss: 0.3993, val_acc: 0.8951
Epoch [9], val_loss: 0.3909, val_acc: 0.8964
Epoch [10], val_loss: 0.3836, val_acc: 0.8978
Epoch [11], val_loss: 0.3774, val_acc: 0.8984
Epoch [12], val_loss: 0.3717, val_acc: 0.8993
Epoch [13], val_loss: 0.3669, val_acc: 0.9010
Epoch [14], val_loss: 0.3626, val_acc: 0.9017
Epoch [15], val_loss: 0.3588, val_acc: 0.9026
Epoch [16], val_loss: 0.3550, val_acc: 0.9033
Epoch [17], val_loss: 0.3519, val_acc: 0.9051
Epoch [18], val_loss: 0.3489, val_acc: 0.9056
Epoch [19], val_loss: 0.3464, val_acc: 0.9059
history = [starting] + out_lst
accuracies = [result['val_acc'] for result in history]
print(accuracies)

[0.14190000295639038, 0.8405000567436218, 0.8615000247955322, 0.8730000257492065, 0.8790000081062317, 0.8833000063896179, 0.8880000114440918, 0.8910999894142151, 0.8929999470710754, 0.8945000171661377, 0.8960000872612, 0.897599995136261, 0.8993000984191895, 0.9000999927520752, 0.901699960231781, 0.9024999737739563, 0.9034000635147095, 0.9047000408172607, 0.9054000377655029, 0.9062000513076782, 0.9076000452041626]

Visualization

This visualization shows us how much the model learns with each epoch

plt.plot(accuracies,'-X')
plt.xlabel('epoch')
plt.ylabel('accuracy_on_the_validation_set')

Text(0, 0.5, 'accuracy_on_the_validation_set')

We can clearly see that with each epoch the loss and accuracies improve. You can try to use different number of epochs and learning rate to see if you can improve the accuracy

Testing on the test set

img_test = MNIST(root='Deep_Learning_Explorations/data/',train = False,transform=transforms.ToTensor())

Prediction Function on the test set

def pred_function(img,model):
    img.shape
    inp = img.unsqueeze(0)
    out = model(inp)
    prob , preds = torch.max(out,dim=1)
    return preds[0].item()
img, label = img_test[200]
plt.imshow(img[0], cmap='gray')
print('Label:', label, ', Predicted:', pred_function(img, model))

Label: 3 , Predicted: 3
img, label = img_test[123]
plt.imshow(img[0], cmap='gray')
print('Label:', label, ', Predicted:', pred_function(img, model))

Label: 6 , Predicted: 6

Testing on all images - the test set

# define the test set
test_loader = DataLoader(img_test, batch_size=20)
# Accuracy and loss on the test data set
check_scores(test_loader,model)

{'val_loss': 0.33200451731681824, 'val_acc': 0.9089999198913574}

Feed-Forward NN - Adding non-linearity to further improve the model

class MnistModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(784, 90)
        self.linear2 = nn.Linear(90,10)
        
    def forward(self, xb):
        xb =  xb.view(xb.size(0),-1)
        out = self.linear1(xb)
        out = F.relu(out)
        out = self.linear2(out)
        return out
    
model = MnistModel()

We can observe the weights and params of the different layers using the parameters method

We can see the weights and params of the different layers below

for param in model.parameters():
    print(param.shape)

torch.Size([90, 784])
torch.Size([90])
torch.Size([10, 90])
torch.Size([10])
model.linear1.weight.shape,model.linear2.weight.shape

(torch.Size([90, 784]), torch.Size([10, 90]))

Model Validation Feed-Forward Neural Network

We can observe that just by adding more linear layers and non-linearity to the model the accuracy of the model improves on the training and the test set

for images, _ in tr_loader:
    print('images.shape:', images.shape)
    plt.figure(figsize=(8,8))
    plt.axis('off')
    plt.imshow(make_grid(images, nrow=14).permute((1, 2, 0)))
    break

images.shape: torch.Size([200, 1, 28, 28])
model = MnistModel()
starting =  check_scores(val_loader,model)
print(starting)
out_lst = fit(20,learning_rate=0.002,model= model,train_loader = tr_loader,val_loader = val_loader)

{'val_loss': 2.3049933910369873, 'val_acc': 0.11880001425743103}
Epoch [0], val_loss: 1.0729, val_acc: 0.8041
Epoch [1], val_loss: 0.5940, val_acc: 0.8636
Epoch [2], val_loss: 0.4683, val_acc: 0.8818
Epoch [3], val_loss: 0.4118, val_acc: 0.8909
Epoch [4], val_loss: 0.3793, val_acc: 0.8971
Epoch [5], val_loss: 0.3575, val_acc: 0.8999
Epoch [6], val_loss: 0.3414, val_acc: 0.9040
Epoch [7], val_loss: 0.3293, val_acc: 0.9055
Epoch [8], val_loss: 0.3176, val_acc: 0.9098
Epoch [9], val_loss: 0.3091, val_acc: 0.9119
Epoch [10], val_loss: 0.3005, val_acc: 0.9149
Epoch [11], val_loss: 0.2932, val_acc: 0.9175
Epoch [12], val_loss: 0.2865, val_acc: 0.9191
Epoch [13], val_loss: 0.2810, val_acc: 0.9203
Epoch [14], val_loss: 0.2752, val_acc: 0.9239
Epoch [15], val_loss: 0.2695, val_acc: 0.9248
Epoch [16], val_loss: 0.2647, val_acc: 0.9251
Epoch [17], val_loss: 0.2601, val_acc: 0.9260
Epoch [18], val_loss: 0.2558, val_acc: 0.9272
Epoch [19], val_loss: 0.2517, val_acc: 0.9281

Result on the test set

check_scores(test_loader,model)

{'val_loss': 0.24423734843730927, 'val_acc': 0.9310998916625977}