Introduction to Neural Networks |
By the end of this practical you will know how to:
Keras
to build deepforward neural networks.Keras
to fit and evaluate deepforward neural networks. .Keras
to optimize the predictive performance of deepforward neural networks.Open your neuralnets
R project.
Open a new R script. Save it as a new file called deep_feedforward_practical.R
in the 2_Code
folder.
Using library()
load the the packages tidyverse
and keras
# install.packages("tidyverse")
# install.packages("keras")
# Load packages necessary for this exercise
library(tidyverse)
library(keras)
fashion.RDS
dataset as a new object.# MNIST fashion data
fashion <- readRDS(file = "1_Data/fashion.RDS")
fashion
object using str()
. You will see a list with two elements named train
and test
, which consist of two elements x
(the images) and y
(the item depicted).# Inspect contents
str(digit)
source()
the helper.R
file in your 2_Code
folder.# Load helper.R
source("2_Code/helper.R")
images
and items
separately for training and test. Use the code below.# split digit train
c(fashion_train_images, fashion_train_items) %<-% fashion$train
# split digit test
c(fashion_test_images, fashion_test_items) %<-% fashion$test
array_reshape
function to serialize the images of both training and test, such that every image is a vector of 28*28=784
elements (and resulting object a matrix with that many columns). Use the code below.# reshape images
fashion_train_images_serialized <- array_reshape(fashion_train_images, c(nrow(fashion_train_images), 784))
fashion_test_images_serialized <- array_reshape(fashion_test_images, c(nrow(fashion_test_images), 784))
# rescale images
fashion_train_images_serialized <- fashion_train_images_serialized / 255
fashion_test_images_serialized <- fashion_test_images_serialized / 255
1
sitting in the position of the integer and 0
s otherwise.# expand criterion
fashion_train_items_onehot <- to_categorical(fashion_train_items, 10)
fashion_test_items_onehot <- to_categorical(fashion_test_items, 10)
head(fashion_train_items_onehot)
to inspect the first few rows and compare them to head(fashion_train_items)
. Do things line up?# fashion items
fashion_labels = c('T-shirt/top',
'Trouser',
'Pullover',
'Dress',
'Coat',
'Sandal',
'Shirt',
'Sneaker',
'Bag',
'Ankle boot')
plt_imgs()
function, which you have loaded earlier with the helper.R
script, to illustrate the images. You have add a 1
, because the indices in fashion_train_items
start at 0
.# rescale images
plt_imgs(fashion_train_images[1:25,,],fashion_labels[fashion_train_items[1:25]+1])
keras_model_sequential()
to start building a network.# begin building network
net <- keras_model_sequential()
layer_dense()
. Inside the function you will have to specify three arguments: input_shape
and activation
. For a moment, think about what values to use for these three given the kind of data that you wish to model. The answers come in a sec.# add layer
net %>% layer_dense(
input_shape = XX,
units = XX,
activation = "XX"
)
The correct solutions are input_shape = 784
to specify that there must be784
input nodes, one for each pixel, units = 10
to specify that there must be 10 different output nodes, and activation = 'softmax'
to specify that the final activation should be a probability that sums to 1
across all output nodes. After you have entered these values, use summary(net)
to see the model information.
Take a look at the Param #
column in the print out. Why is the number 7850
rather than 7840 = 784 * 10
? Any ideas?
Yes, keras
automatically adds a biases to each node in a layer.
compile()
to compile the network. You will need to specify at least two arguments: optimizer
and loss
. Think about what we’ve used in the presentation. Would the same values make sense here?# loss, optimizers, & metrics
net %>% compile(
optimizer = 'XX',
loss = 'XX',
metrics = c('accuracy')
)
optimizer = 'adam'
and loss = categorical_crossentropy
and run the chunk. You see I’ve also added 'accuracy'
as an additional metric, which can be useful to track during fitting, as it is much easier to interpret than crossentropy
.fit()
. Specify, the arguments x
, y
, batch_size
, and epoch
. Think for a moment, what the appropriate values for these arguments could be.# loss, optimizers, & metrics
history <- net %>% fit(
x = XX,
y = XX,
batch_size = XX,
epochs = XX
)
The arguments x
and y
specify the training features and training criterion, respectively, so x = fashion_train_images_serialized
and y = fashion_train_items_onehot
. The arguments batch_size
and epochs
control how often the weights will be updated and for how many iterations of the data set. Useful (and somewhat arbitrary) values are batch_size = 32
and epochs = 10
. Use these values and then run the chunk.
The fit
function automatically provides with useful information on the progression of fit indices. You can additionally use the history
object to get the same illustration to create a ggplot
. Run plot(history) + theme_minimal()
. When you inspect the plot, what tells you that the network indeed has learned? And how good is the final performance of the network?
The network has learned, which can be gathered by the decrease in loss
and the increase in accuracy
. The non-linear pattern is very characteristic, almost always most of the gains are achieved in the first epoch or two. To get accurate values on the final performance you can simply print history
. What do you think, how well will this network perform in predicting fashion items out-of-sample? Find out in the next section.
evaluate()
while supplying the function with the test images and items.# evaluate
net %>% evaluate(XX, XX, verbose = 0)
The network is slightly worse than in training, but still pretty good given that guessing performance is only 10%
accuracy. This, again, is very characteristic. In machine learning “simple” models often get a long way towards the desired level of performance. Though, one might question whether a model with 750 parameters can still be considered “simple”.
Compare the predictions made by the network with the actual fashion labels. Do you note any patterns? Can you maybe understand the errors that the network has made.
# compare predictions to truth
pred = net %>% predict_classes(fashion_test_images_serialized)
table(fashion_labels[fashion_test_items+1], fashion_labels[pred+1])
'relu'
activation function. See template below. The final layer will again be the output layer and must be supplied with the same values as before. Plot the summary at the end.# initialize deepnet
deepnet <- keras_model_sequential()
# add layers
deepnet %>%
layer_dense(input_shape = 784, units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX")
# model information
summary(deepnet)
How many parameters are there now in the network?
A whole lot more parameters. There are more than 300 times as many parameters as before. Let’s see what this network can achieve. Compile and train the network using the exact same steps as before and then evaluate the network on the test data. The only change in the code you need to make is to replace net
with deepnet
.
The test performance has improved by about 7% points. Not bad. Also the drop from fitting to testing performance is small suggesting minimal overfitting of the data. Let’s the see how far we can take it. Let’s build a real deep network
# initialize realdeepnet
realdeepnet <- keras_model_sequential()
# add layers
realdeepnet %>%
layer_dense(input_shape = 784, units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = XX, activation = "XX")
# model information
summary(realdeepnet)
The number of parameters in network only increased by a factor of 2, because the lion share stems from the weights between the input and the first hidden layer, which remained unchanged. Nonetheless, there is reason to believe that this network fares differently. Try it out using the exact same steps as before.
The model did not fare a whole lot better. The fit performance increased a bit, but the predictive accuracy more or less remained constant. Fit the model again. To do this you can simply run the fit function again, which will lead to a continuation of training from where training ended before. After training has finished, evaluate the predictive accuracy again.
Not all too much happening. Though, both fitting and prediction performance went up by another notch. This is again characteristic of neural networks. To really max out on performance, many epochs of training will often be necessary. However, at the same time risks to give the model opportunity to overfit the data.
crazycomplexnet
which should have twice as many nodes in the first hidden layer than there are input nodes. Add as many hidden layers as you like and then run the network for 20
epochs for a batch size of 100
and evaluate the network’s performance.# initialize crazycomplexnet
crazycomplexnet <- keras_model_sequential()
# add layers
crazycomplexnet %>%
layer_dense(input_shape = 784, units = 1568, activation = "relu") %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dense(units = 10, activation = "softmax")
# model information
summary(crazycomplexnet)
layour_dropout(rate = .3)
in between the layer_dense()
helps preserve some more of the fitting performance for test. Layer dropout sets the activation of a random subset of nodes to zero, which effectively eliminates momentarily the weights emitting from the node and, thus, constrains model flexibility.# initialize crazycomplexnet
crazycomplexnet <- keras_model_sequential()
# add layers
crazycomplexnet %>%
layer_dense(input_shape = 784, units = 1568, activation = "relu") %>%
layer_dropout(rate = XX) %>%
layer_dense(units = XX, activation = "XX") %>%
layer_dropout(rate = XX)
layer_dense(units = 10, activation = "softmax")
# model information
summary(crazycomplexnet)