Materials for Applied Data Science profile course INFOMDA2 Battling the curse of dimensionality.
In this practical, we will create a feed-forward neural network as well as a convolutional neural network to analyze the famous MNIST dataset.
library(tidyverse)
library(keras)
In this section, we will develop a deep feed-forward neural network for MNIST.
1. Load the built-in MNIST dataset by running the following code.
Then, describe the structure and contents of the mnist
object.
mnist <- dataset_mnist()
Plotting is very important when working with image data. We have defined a convenient plotting function for you.
2. Use the plot_img()
function below to plot the first training
image. The img
parameter has to be a matrix with dimensions
(28, 28)
. NB: indexing in 3-dimensional arrays works the same as
indexing in matrices, but you need an extra comma x[,,]
.
plot_img <- function(img, col = gray.colors(255, start = 1, end = 0), ...) {
image(t(img), asp = 1, ylim = c(1.1, -0.1), col = col, bty = "n", axes = FALSE, ...)
}
It is usually a good idea to normalize your features to have a manageable, standard range before entering them in neural networks.
3. As a preprocessing step, ensure the brightness values of the images in the training and test set are in the range (0, 1)
The simplest model is a multinomial logistic regression model, where we have no hidden layers and 10 outputs (0-1). That model is shown below.
4. Display a summary of the multinomial model using the summary()
function. Describe why this model has 7850 parameters.
multinom <-
keras_model_sequential(input_shape = c(28, 28)) %>% # initialize a sequential model
layer_flatten() %>% # flatten 28*28 matrix into single vector
layer_dense(10, activation = "softmax") # softmax outcome == logistic regression for each of 10 outputs
multinom$compile(
loss = "sparse_categorical_crossentropy", # loss function for multinomial outcome
optimizer = "adam", # we use this optimizer because it works well
metrics = list("accuracy") # we want to know training accuracy in the end
)
5. Train the model for 5 epochs using the code below. What accuracy do we obtain in the validation set? (NB: the multinom object is changed “in-place”, which means you don’t have to assign it to another variable)
multinom %>% fit(x = mnist$train$x, y = mnist$train$y, epochs = 5, validation_split = 0.2, verbose = 1)
6. Train the model for another 5 epochs. What accuracy do we obtain in the validation set?
7. Create and compile a feed-forward neural network with the following properties. Ensure that the model has 50890 parameters.
You may reuse code from the multinomial model
7. Train the model for 10 epochs. What do you see in terms of validation accuracy, also compared to the multinomial model?
8. Create predictions for the test data using the two trained models (using the function below). Create a confusion matrix and compute test accuracy for these two models. Write down any observations you have.
class_predict <- function(model, x_train) predict(model, x = x_train) %>% apply(1, which.max) - 1
9. OPTIONAL: if you have time, create and estimate (10 epochs) a deep feed-forward model with the following properties. Compare this model to the previous models on the test data.
Convolution layers in Keras need a specific form of data input. For each
example, they need a (width, height, channels)
array (tensor). For a
colour image with 28*28 dimension, that shape is usually (28, 28, 3)
,
where the channels indicate red, green, and blue. MNIST has no colour
info, but we still need the channel dimension to enter the data into a
convolution layer with shape (28, 28, 1)
. The training dataset
x_train
should thus have shape (60000, 28, 28, 1)
.
10. add a “channel” dimension to the training and test data using the following code. Plot an image using the first channel of the 314th training example (this is a 9).
# add channel dimension to input (required for convolution layers)
dim(mnist$train$x) <- c(dim(mnist$train$x), 1)
dim(mnist$test$x) <- c(dim(mnist$test$x), 1)
11. Create and compile a convolutional neural network using the following code. Describe the different layers in your own words.
cnn <-
keras_model_sequential(input_shape = c(28, 28, 1)) %>%
layer_conv_2d(filters = 6, kernel_size = c(5, 5)) %>%
layer_max_pooling_2d(pool_size = c(4, 4)) %>%
layer_flatten() %>%
layer_dense(units = 32, activation = "relu") %>%
layer_dense(10, activation = "softmax")
cnn %>%
compile(
loss = "sparse_categorical_crossentropy",
optimizer = "adam",
metrics = c("accuracy")
)
12. Fit this model on the training data (10 epochs) and compare it to the previous models.
13. Create another CNN which has better validation performance within 10 epochs. Compare your validation accuracy to that of your peers.
Here are some things you could do: