DataTechNotes: Classification Example with Keras Deep Learning API in R

Keras is neural networks API to build the deep learning models. In this tutorial, we'll learn how to build Keras deep learning classification model in R. TensorFlow is a backend engine of Keras R interface. For more information about the library, please refer to this link. To install 'keras' library, we need to run below command in RStudio.

> devtools::install_github("rstudio/keras")
> library(keras)
> install_keras()

First, we'll generate sample dataset for this tutorial and split it into the train and test parts.

library(keras)
set.seed(123)
n=2000  # number of sample data
a <- sample(1:20, n, replace = T)
b <- sample(1:50, n, replace = T)
c <- sample(1:100, n, replace = T)
flag <- ifelse(a > 15 & b > 30 & c > 60, "red", 
                ifelse(a<=9 & b<25& c<=35, "yellow", "green"))
df <- data.frame(a = a,
                  b = b, 
                  c = c, 
                  flag = as.factor(flag))
> tail(df,15)
      a  b  c   flag
1986  3 50 91  green
1987  9 12 56  green
1988 10 21 14  green
1989 13  6 22  green
1990  6 14  9 yellow
1991 10 27 86  green
1992  4 16  6 yellow
1993 18 31 33  green
1994  4 50 51  green
1995  2 31 34  green
1996 18  8 88  green
1997  7 36 89  green
1998 16 34 91    red
1999  9 17 80  green
2000  9 22 91  green

indexes = sample(1:nrow(df), size = .95 * nrow(df))
 
train <- df[indexes, ]
test <- df[-indexes, ]

Next, we'll convert X input data into the matrix type and Y output labels into the numerical category type.

 
train.x <- as.matrix(train[, 1:3], c(1,3,nrow(train)))
train.y <- to_categorical(matrix(as.numeric(train[,4])-1))
 
test.x <- as.matrix(test[, 1:3], c(1,3,nrow(test)))
test.y <- to_categorical(matrix(as.numeric(test[,4])-1))

Building a model

Here, input_shape is 3 (a, b, c count), units number is 3 (red, green yellow labels count), activation is 'softmax' (for multi-class categorical type).

model <- keras_model_sequential() 
 
model %>% layer_dense(units=64, activation = "relu", input_shape = c(3))

      %>% layer_dense(units =3, activation = "softmax")  
 
model %>% compile(optimizer = "rmsprop", 
                  loss = "categorical_crossentropy",  
                  metric=c("accuracy"))
 
> print(model)
Model
_________________________________________________________________________
Layer (type)                    Output Shape                  Param #    
=========================================================================
dense_356 (Dense)               (None, 64)                    256        
_________________________________________________________________________
dense_357 (Dense)               (None, 3)                     195        
=========================================================================
Total params: 451
Trainable params: 451
Non-trainable params: 0
_________________________________________________________________________

We'll fit the model with train data and then predict a test data with a model.

model %>% fit(train.x, train.y,

               epochs = 50, 
               batch_size = 50)

pred <- model %>% predict(test.x)

To make the results readable, I'll change the format of the output.

pred <- format(round(pred, 2), nsamll = 4)
result <- data.frame("green"=pred[,1], "red"=pred[,2], "yellow"=pred[,3], 
          "predicted" = ifelse(max.col(pred[ ,1:3])==1, "green",
                        ifelse(max.col(pred[ ,1:3])==2, "red", "yellow")),
          original = test[ ,4])

>  head(result,20)
   green  red yellow predicted original
1   1.00 0.00   0.00     green    green
2   1.00 0.00   0.00     green    green
3   1.00 0.00   0.00     green    green
4   1.00 0.00   0.00     green    green
5   0.45 0.55   0.00       red      red
6   1.00 0.00   0.00     green    green
7   0.93 0.00   0.07     green    green
8   0.52 0.36   0.12     green    green
9   0.96 0.04   0.00     green    green
10  1.00 0.00   0.00     green    green
11  0.28 0.04   0.68    yellow   yellow
12  1.00 0.00   0.00     green    green
13  1.00 0.00   0.00     green    green
14  1.00 0.00   0.00     green    green
15  1.00 0.00   0.00     green    green
16  1.00 0.00   0.00     green    green
17  0.73 0.27   0.00     green    green
18  1.00 0.00   0.00     green    green
19  0.52 0.38   0.10     green    green
20  0.34 0.00   0.66    yellow   yellow

Evaluating the model accuracy and loss.

scores <- model %>% evaluate(test.x, test.y)

> print(scores)
$loss
[1] 0.08444449

$acc
[1] 0.99

Confusion matrix check with a caret

> cfm=caret::confusionMatrix(result$predicted, result$original)
> print(cfm)
Confusion Matrix and Statistics

          Reference
Prediction green red yellow
    green     89   0      0
    red        1   2      0
    yellow     0   0      8

Overall Statistics
                                          
               Accuracy : 0.99            
                 95% CI : (0.9455, 0.9997)
    No Information Rate : 0.9             
    P-Value [Acc > NIR] : 0.0003217       
                                          
                  Kappa : 0.9479          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: green Class: red Class: yellow
Sensitivity                0.9889     1.0000          1.00
Specificity                1.0000     0.9898          1.00
Pos Pred Value             1.0000     0.6667          1.00
Neg Pred Value             0.9091     1.0000          1.00
Prevalence                 0.9000     0.0200          0.08
Detection Rate             0.8900     0.0200          0.08
Detection Prevalence       0.8900     0.0300          0.08
Balanced Accuracy          0.9944     0.9949          1.00

The full source code is listed below.

	library(keras)

	set.seed(12)
	n=2000 # number of sample data
	a <- sample(1:20, n, replace = T)
	b <- sample(1:50, n, replace = T)
	c <- sample(1:100, n, replace = T)
	flag <- ifelse(a > 15 & b > 30 & c > 60, "red",
	ifelse(a<=9 & b<25& c<=35, "yellow", "green"))
	df <- data.frame(a = a,
	b = b,
	c = c,
	flag = as.factor(flag))

	# split data into train and test part (90/10).
	indexes = sample(1:nrow(df), size = .95 * nrow(df))

	train <- df[indexes, ]
	test <- df[-indexes, ]

	# prepering input output data, labels are converted to numeric category
	train.x <- as.matrix(train[, 1:3], c(1,3,nrow(train)))
	train.y <- to_categorical(matrix(as.numeric(train[,4])-1))

	test.x <- as.matrix(test[, 1:3], c(1,3,nrow(test)))
	test.y <- to_categorical(matrix(as.numeric(test[,4])-1))

	# keras model creating
	model <- keras_model_sequential()

	#
	model %>% layer_dense(units=64, activation = "relu", input_shape = c(3)) %>%
	layer_dense(units =3, activation = "softmax") #sigmoid

	model %>% compile(optimizer = "rmsprop",
	loss = "categorical_crossentropy", #"categorical_crossentropy",
	metric=c("accuracy"))

	# fitting train data
	model %>% fit(train.x, train.y,
	epochs = 50,
	batch_size = 50)

	# predicting test data
	pred <- model %>% predict(test.x)

	# we check accuracy of model
	scores <- model %>% evaluate(test.x, test.y)

	# formatting decimal places of output
	pred <- format(round(pred, 2), nsamll = 4)

	# collecting everything in data frame to read it easily
	result <- data.frame("green" = pred[,1],
	"red" = pred[,2],
	"yellow" = pred[,3],
	"predicted" = ifelse(max.col(pred[ ,1:3]) == 1, "green",
	ifelse(max.col(pred[ ,1:3]) == "2", "red", "yellow")),
	original = test[ ,4])

	print(result)
	print(scores)

view raw KerasDeepLearningSample.R hosted with ❤ by GitHub

If you have any comments about the post, please leave it below, thank you!
Thank you for reading!

2 comments:

UnknownSeptember 25, 2020 at 9:58 AM
Hi,
I do not understand this part of your code.
"# collecting everything in data frame to read it easily
result <- data.frame("green" = pred[,1],
"red" = pred[,2],
"yellow" = pred[,3],
"predicted" = ifelse(max.col(pred[ ,1:3]) == 1, "green",
ifelse(max.col(pred[ ,1:3]) == "2", "red", "yellow")),
original = test[ ,4])"

My database includes 60 columns which the last column is the label. Also, I do have 11 class variables in my label column. Could you please help me with this issue?.

Pages

Classification Example with Keras Deep Learning API in R

2 comments: