MNIST dataset - NN example - Gradient Descent - Feedforward NN - Pytorch.
This is a tutorial of image classification on a MNIST dataset
- Introduction
- Split Dataset - Training and Validation
- Training and Validation Dataloader
- Additional functionality to our NN model
- Softmax
- Accuracy
- Model Training
- Visualization
- Testing on the test set
- Prediction Function on the test set
- Testing on all images - the test set
- Feed-Forward NN - Adding non-linearity to further improve the model
- Model Validation Feed-Forward Neural Network
![]()
Introduction
- The MNIST database of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. The digits have been size-normalized and centered in a fixed-size image.
- This code will help help you learn the technique of image classification using Pytorch
- The problem that we are trying to solve in this example - is to build a Neural Network that can classify handwritten images with maximum accuracy
import torch
import torchvision
from torchvision.datasets import MNIST
from torchvision.transforms import transforms
import matplotlib.pyplot as plt
from torch.utils.data import random_split
from torch.utils.data import DataLoader
import torch.nn as nn
import torch.nn.functional as F
from torchvision.utils import make_grid
- this line of code will download the images(60000) in the defined root folder
- download = True - when ran once will nor re-download all the images if it finds the images in the root path
- train = True means it is the training set
dataset = MNIST(root='Deep_Learning_Explorations/data/',train = True, download=True,transform=transforms.ToTensor())
- Dataset is a tuple of image and the label
- indexing the dataset will show us that
- shape of the image is 12828
- 28 * 28 are the pixel values ranging from values 0 to 1
- this image has a just one channel as it is a gray scale image
img,label = dataset[89]
# permute makes the channel dimension as the last dimension as it helps to visualize the image using plt.imshow
img_viz = img.permute(1,2,0)
plt.imshow(img_viz, cmap = 'gray');
img_viz = img.permute(1,2,0)
print(img_viz.shape)
# plt.imshow(img_viz, cmap = 'gray');
# Split the dataset into a training and the validation set
tr_data, val_data = random_split(dataset,[50000,10000])
len(tr_data), len(val_data)
# Dataloader helps in converting the dataset into batches of data by describing the batch_size
tr_loader = DataLoader(tr_data,batch_size=200,shuffle=True)
val_loader = DataLoader(val_data,batch_size=200)
# We can see that the data is a batch of 128 images and 120 labels
for data,label in tr_loader:
print(data.shape)
print(len(label))
break
# x@w.t()+ bias - Use the same equation
model = nn.Linear(in_features= 784,out_features=10)
# A model will randomly initialize the parameters(wt's and biases)
model.weight.shape, model.bias.shape
for img,label in tr_loader:
# shape of the batch in the beggining
print(img.shape)
#this functionality can be added inside the model in the forward method
img = img.reshape(-1,28*28)
# shape after reshaping into a vector of 784 elements
print(img.shape)
# applying the model to the reshaped img
pred = model(img)
print(model(img).shape)
# As we can see that the model has outputted 10 probabilities which is a prob for all elements from 0-9
break
print(pred)
pred.shape
# Add additional functionality to the NN model
# To add additional functionality to a NN model - which is the reshaping of a bunch of images - we have the use the concepts of OOP and inheritance. The forward method is the one that is applied to the bunch of images
class MnistModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(784, 10)
def forward(self, xb):
xb = xb.reshape(-1, 784)
out = self.linear(xb)
return out
model = MnistModel()
# shape of wts and bias
model.linear.weight.shape, model.linear.bias.shape
# We can see that the model from the above class works
for dta,label in tr_loader:
pred = model(dta)
print(pred.shape)
print(label.shape)
break
# We will apply softmax now - which converts the probability b/w 0 and 1 and the sum is 1
torch.sum(F.softmax(pred[0])).item()
# Applying softmax on the whole batch
pred_s = F.softmax(pred,dim=1)
# torch amx function gives us the index of the max probability as well as the probability
index_prob,prob = torch.max(pred_s,dim=1)
index_prob.shape,prob.shape
def metric_acc(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))
# We will get the same value even if we do not apply the softmax as
# e^x is an increasing function, i.e., if y1 > y2, then e^y1 > e^y2. The same holds after averaging out the values to get the softmax.
metric_acc(pred_s,label)
# cross entropy -ve log of predicted prob
# loss function will be cross_entropy
loss = F.cross_entropy(pred, label)
print(loss)
Model Training
This fit function is the training step. This training step invovles training the model on the training dataloader, calculating the loss, calculating the gradient for the train loader and updating the weights and reseting the gradient at the end.
For the second part of the loop - we validate the model on the validation dataloader. The steps include calculating the loss and accuracy after each epoch and printing them at the end. We can notice that the loss and the accuracy on the validation set improves after each epoch
class MnistModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(784, 10)
def forward(self, xb):
xb = xb.reshape(-1, 784)
out = self.linear(xb)
return out
def results_epoch(out_lst):
val_loss_epoch = torch.stack([dct['val_loss'] for dct in out_lst]).mean()
val_acc_epoch = torch.stack([dct['val_acc'] for dct in out_lst]).mean()
return {'val_loss': val_loss_epoch.item(), 'val_acc': val_acc_epoch.item()}
def final_output(dct,epoch):
print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, dct['val_loss'], dct['val_acc']))
def validation(batch_val,model):
img_val,label_val = batch_val
pred_val = model(img_val)
# loss is computed
loss_val = F.cross_entropy(pred_val,label_val)
# Accuracy is computed
acc_val = metric_acc(pred_val,label_val)
return {'val_loss': loss_val, 'val_acc': acc_val}
def check_scores(val_loader,model):
out_lst = [validation(batch_val,model) for batch_val in val_loader]
return results_epoch(out_lst)
def fit(epochs,learning_rate,model,train_loader,val_loader,opt_func=torch.optim.SGD):
optimizer = torch.optim.SGD(model.parameters(), lr= learning_rate , momentum=0.9)
out_lst = []
for epoch in range(epochs):
# Training step on the training dataloader
for batch in tr_loader:
#extract batch of images and label
img,label = batch
#calculate prediction using the MNISTMODEL class initialized above
pred = model(img)
#Since this is a multi-label image classification model- the loss function is cross entropy
loss = F.cross_entropy(pred,label)
# In this step we calculate the gradient of the loss function with respect to the parameters or 784 pixels in this case
loss.backward()
# In this step we update the weights
optimizer.step()
# We make the gradient zero again so that now the gradients are not calculated untill the training is not done
optimizer.zero_grad()
# Validation on the validation dataloader
output = check_scores(val_loader,model)
out_lst.append(output)
final_output(output,epoch)
return out_lst
model = MnistModel()
starting = check_scores(val_loader,model)
print(starting)
out_lst = fit(20,learning_rate=0.002,model= model,train_loader = tr_loader,val_loader = val_loader)
history = [starting] + out_lst
accuracies = [result['val_acc'] for result in history]
print(accuracies)
plt.plot(accuracies,'-X')
plt.xlabel('epoch')
plt.ylabel('accuracy_on_the_validation_set')
We can clearly see that with each epoch the loss and accuracies improve. You can try to use different number of epochs and learning rate to see if you can improve the accuracy
img_test = MNIST(root='Deep_Learning_Explorations/data/',train = False,transform=transforms.ToTensor())
def pred_function(img,model):
img.shape
inp = img.unsqueeze(0)
out = model(inp)
prob , preds = torch.max(out,dim=1)
return preds[0].item()
img, label = img_test[200]
plt.imshow(img[0], cmap='gray')
print('Label:', label, ', Predicted:', pred_function(img, model))
img, label = img_test[123]
plt.imshow(img[0], cmap='gray')
print('Label:', label, ', Predicted:', pred_function(img, model))
# define the test set
test_loader = DataLoader(img_test, batch_size=20)
# Accuracy and loss on the test data set
check_scores(test_loader,model)
class MnistModel(nn.Module):
def __init__(self):
super().__init__()
self.linear1 = nn.Linear(784, 90)
self.linear2 = nn.Linear(90,10)
def forward(self, xb):
xb = xb.view(xb.size(0),-1)
out = self.linear1(xb)
out = F.relu(out)
out = self.linear2(out)
return out
model = MnistModel()
We can observe the weights and params of the different layers using the parameters method
We can see the weights and params of the different layers below
for param in model.parameters():
print(param.shape)
model.linear1.weight.shape,model.linear2.weight.shape
We can observe that just by adding more linear layers and non-linearity to the model the accuracy of the model improves on the training and the test set
for images, _ in tr_loader:
print('images.shape:', images.shape)
plt.figure(figsize=(8,8))
plt.axis('off')
plt.imshow(make_grid(images, nrow=14).permute((1, 2, 0)))
break
model = MnistModel()
starting = check_scores(val_loader,model)
print(starting)
out_lst = fit(20,learning_rate=0.002,model= model,train_loader = tr_loader,val_loader = val_loader)
Result on the test set
check_scores(test_loader,model)