Jeong's Laboratory

ImageNet Classification with Deep Convolutional Neural Networks - Implementation

-------------------------------------------------------------------

from torchvision import datasets, transforms

from torch.utils.data import DataLoader

# Define preprocessing for the MNIST dataset

# MNIST images are grayscale, so the number of channels is 1, and the image size is 28x28.

transform = transforms.Compose([

transforms.ToTensor(),

transforms.Normalize((0.5,), (0.5,))

])

# Download the MNIST dataset

# Training dataset

train_dataset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

# Test dataset

test_dataset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create data loaders

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Print information about the downloaded datasets and data loaders

train_dataset, test_dataset, train_loader, test_loader

-------------------------------------------------------------------

The provided code utilizes PyTorch and the torchvision library to download a dataset and prepare it for training and testing a model. The dataset used here is MNIST.

* Library Imports:

datasets and transforms are imported from the torchvision library. These are used for handling datasets and performing data preprocessing.

DataLoader is a class from PyTorch responsible for loading data.

* Definition of Data Preprocessing:

transforms.Compose chains together multiple transformation steps. In this case, it consists of two steps.

transforms.ToTensor(): Converts the image to a PyTorch tensor.

transforms.Normalize((0.5,), (0.5,)): Normalizes the tensor values. Here, the mean and standard deviation are set to 0.5.

* Downloading the MNIST Dataset:

The datasets.MNIST is used to download the training and test MNIST datasets.

root='./data': Specifies the path where the dataset will be stored.

train=True or False: Specifies whether it's the training or test dataset.

download=True: Downloads the dataset if it's not present at the specified path.

transform=transform: Applies the defined preprocessing to the dataset.

*Creating Data Loaders:

The DataLoader class is used to create data loaders for the dataset.

batch_size=64: Specifies the number of data samples to load in each batch.

shuffle=True or False: Determines whether to shuffle the data. Training data is shuffled, while test data is kept in order.

* Print Information about Datasets and Data Loaders

The last line prints information about the training and test datasets, along with details about their respective data loaders.

-------------------------------------------------------------------

import torch.nn as nn

class AlexNet(nn.Module):

def __init__(self, num_classes=1000):

super(AlexNet, self).__init__()

self.features = nn.Sequential(

# 1st Convolutional Layer

nn.Conv2d(3, 64, kernel_size=11, stride=4, padding=2),

nn.ReLU(inplace=True),

nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75, k=2),

nn.MaxPool2d(kernel_size=3, stride=2),

# 2nd Convolutional Layer

nn.Conv2d(64, 192, kernel_size=5, padding=2),

nn.ReLU(inplace=True),

nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75, k=2),

nn.MaxPool2d(kernel_size=3, stride=2),

# 3rd Convolutional Layer

nn.Conv2d(192, 384, kernel_size=3, padding=1),

nn.ReLU(inplace=True),

# 4th Convolutional Layer

nn.Conv2d(384, 256, kernel_size=3, padding=1),

nn.ReLU(inplace=True),

# 5th Convolutional Layer

nn.Conv2d(256, 256, kernel_size=3, padding=1),

nn.ReLU(inplace=True),

nn.LocalResponseNorm(size=5, alpha=0.0001, beta=0.75, k=2),

nn.MaxPool2d(kernel_size=3, stride=2),

)

self.avgpool = nn.AdaptiveAvgPool2d((6, 6))

self.classifier = nn.Sequential(

# 1st Fully Connected Layer

nn.Dropout(),

nn.Linear(256 * 6 * 6, 4096),

nn.ReLU(inplace=True),

# 2nd Fully Connected Layer

nn.Dropout(),

nn.Linear(4096, 4096),

nn.ReLU(inplace=True),

# 3rd Fully Connected Layer

nn.Linear(4096, num_classes),

)

def forward(self, x):

x = self.features(x)

x = self.avgpool(x)

x = torch.flatten(x, 1)

x = self.classifier(x)

return x

# Create an instance of the AlexNet model

model = AlexNet()

model

-------------------------------------------------------------------

Considering hardware constraints, execution speed, memory usage, etc., the original architecture has been modified.

1st Convolutional Layer: 96 kernels of size 11x11x3, stride 4.

2nd Convolutional Layer: 256 kernels of size 5x5x48.

3rd Convolutional Layer: 384 kernels of size 3x3x256.

4th Convolutional Layer: 384 kernels of size 3x3x192.

5th Convolutional Layer: 256 kernels of size 3x3x192.

Fully Connected Layer: 4096 neurons each.

- 1st Convolutional Layer: 64 kernels of size 11x11x3, stride 4.

- 2nd Convolutional Layer: 192 kernels of size 5x5.

- 3rd Convolutional Layer: 384 kernels of size 3x3.

- 4th Convolutional Layer: 256 kernels of size 3x3.

- 5th Convolutional Layer: 256 kernels of size 3x3.

* Class Definition

Inherits from nn.Module to define the model. This class defines the structure of the model and the forward pass method.

* Convolutional Layers (self.features)

1st Convolutional Layer: 3 input channels, 64 output channels, kernel size 11x11, stride 4, padding 2. Uses ReLU activation, Local Response Normalization (LRN), and Max Pooling.

2nd Convolutional Layer: 64 input channels, 192 output channels, kernel size 5x5. Uses ReLU, LRN, and Max Pooling.

3rd Convolutional Layer: 192 input channels, 384 output channels, kernel size 3x3. Uses ReLU activation.

4th and 5th Convolutional Layers: 384 input channels each, 256 output channels, kernel size 3x3. Uses ReLU activation. Applies LRN and Max Pooling after the 5th layer.

* Average Pooling Layer (self.avgpool)

nn.AdaptiveAvgPool2d((6, 6)): Fixes the size of the feature map to 6x6.

* Fully Connected Layers (self.classifier)

1st Fully Connected Layer: Takes 256 * 6 * 6 input features, produces 4096 output neurons. Uses ReLU activation and Dropout.

2nd Fully Connected Layer: Takes 4096 input neurons, produces 4096 output neurons. Uses ReLU and Dropout.

3rd Fully Connected Layer: Takes 4096 input neurons, produces num_classes output neurons. This layer performs the final classification task.

* Forward Pass Definition

The forward method defines how input data x passes through the model. This method configures the data to pass through convolutional layers and average pooling, then through fully connected layers.

* Model Instance Creation

Creates an instance of the AlexNet class. This instance can be used for actual image classification tasks.

-------------------------------------------------------------------

import torch

import torch.nn as nn

import torch.optim as optim

# Define the neural network model

class SimpleNet(nn.Module):

def __init__(self):

super(SimpleNet, self).__init__()

self.fc1 = nn.Linear(28*28, 500)

self.fc2 = nn.Linear(500, 10)

def forward(self, x):

x = x.view(-1, 28*28)

x = torch.relu(self.fc1(x))

x = self.fc2(x)

return x

# Initialize the model, loss function, and optimizer

model = SimpleNet()

criterion = nn.CrossEntropyLoss()

optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop

def train(model, train_loader, criterion, optimizer, epochs):

for epoch in range(epochs):

for images, labels in train_loader:

# Forward pass

outputs = model(images)

loss = criterion(outputs, labels)

# Backward pass and optimization

optimizer.zero_grad()

loss.backward()

optimizer.step()

print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}')

# Execute training

train(model, train_loader, criterion, optimizer, epochs=5)

-------------------------------------------------------------------

This code demonstrates the process of training the model.

* Neural Network Model Definition (SimpleNet class)

Inherits from nn.Module to define a neural network model in PyTorch.

self.fc1: The first fully connected (linear) layer with input size 28x28 (flattened MNIST image) and output size 500.

self.fc2: The second fully connected layer with input size 500 and output size 10 (there are 10 classes in the MNIST dataset).

forward method: Defines how the model processes input data x. It flattens the input and passes it through two fully connected layers, returning the result.

* Initializing Model, Loss Function, and Optimizer

Creates an instance of SimpleNet to initialize the model.

nn.CrossEntropyLoss(): Uses cross-entropy loss suitable for multi-class classification.

optim.Adam: Uses the Adam optimizer with a learning rate (lr) of 0.001.

* Training Loop

The train function trains the model, iterating over the specified number of epochs.

In each epoch, it iterates through batches via the data loader (train_loader).

Forward Pass: Uses the model to compute the output for each batch and calculates the loss using the loss function.

Backward Pass and Optimization: Computes gradients with respect to the loss, sets gradients to zero, and updates the model's weights using the optimizer.

* Executing Training

Calls the train function to actually train the model. In this case, it performs training for 5 epochs.

* Features and Considerations

SimpleNet uses a straightforward fully connected neural network for MNIST data, which might have limited performance on more complex images.

optimizer.zero_grad() is called to zero out gradients before processing each batch to prevent gradient accumulation.

Parameters like learning rate (lr) and the number of epochs significantly impact training efficiency and performance and can be adjusted experimentally.

train_loader is a DataLoader previously created for loading and preprocessing the MNIST dataset.

-------------------------------------------------------------------

def evaluate(model, test_loader, criterion):

model.eval() # Set the model to evaluation mode

total_loss = 0

correct = 0

total = 0

with torch.no_grad(): # Disable gradient calculation

for images, labels in test_loader:

outputs = model(images)

loss = criterion(outputs, labels)

total_loss += loss.item()

_, predicted = torch.max(outputs.data, 1)

total += labels.size(0)

correct += (predicted == labels).sum().item()

avg_loss = total_loss / len(test_loader)

accuracy = 100 * correct / total

print(f'Average loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%')

# Evaluate the model on the test dataset

evaluate(model, test_loader, criterion)

-------------------------------------------------------------------

This code defines and executes the evaluation process of a trained neural network model on the test dataset.

* Function Definition (evaluate)

model: The model to be evaluated.

test_loader: DataLoader for the test dataset, used during the model evaluation.

criterion: The loss function used to calculate the loss between the model's output and the actual labels.

* Setting the Model to Evaluation Mode

model.eval(): Sets the model to evaluation mode. This is necessary when certain layers (e.g., dropout, batch normalization) behave differently during training and evaluation.

* Disabling Gradient Calculation

torch.no_grad(): Disables gradient calculation during evaluation, reducing memory usage and improving computational speed.

* Evaluation on the Test Dataset

Fetches batches of data from the test_loader and computes the model's output.

criterion(outputs, labels): Calculates the loss between the computed output and the actual labels.

torch.max(outputs.data, 1): Finds the index with the highest value in the model's output to determine the predicted class.

Accuracy Calculation: Computes accuracy by dividing the number of correctly predicted labels by the total number of labels.

Result Output

Calculates and prints the average loss (avg_loss) and accuracy (accuracy).

* Execution of the Evaluation Function

evaluate(model, test_loader, criterion): Uses the defined evaluation function to evaluate the trained model on the test dataset.

Next	Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" by Silver et al. (2017)
Prev	ImageNet Classification with Deep Convolutional Neural Networks - Analysis

Post List