Outlier detection with boxplot.stats function in R

   The outlier is the element located far away from the majority of observation data.
There are many ways to detect the outliers in a given dataset. In this post, we'll learn how to detect the outlier in a given dataset with boxplot.stat() function in R . You may find more information about this function with ?boxplot.stats command in R.
   We'll start with generating sample dataset for this tutorial.

> m <- rnorm(100)
 
> head(m, 20)
 [1]  1.07766774 -0.26540253  0.74722125 -1.46459965
 [5]  0.56082679  0.24564791 -0.53357662 -0.62622695
 [9]  0.77265093 -2.99791152  0.76610916  0.06503116
[13]  0.60389088  0.87890446  1.73781867  1.56105272
[17]  0.61592582 -0.86839875  1.51704497  1.58302684

Next, we'll get statistics of m data with boxplot.stat() function.

> st <- boxplot.stats(m)
 
> st  
 $stats
[1] -2.0254878 -0.6059728  0.1234828  0.7693800
[5]  2.3504506

$n
[1] 100

$conf
[1] -0.09382298  0.34078853

$out
[1] -2.997912 -2.673720 -2.981618 

The outliers are defined in an out property of the st object. We'll find the indexes of those elements.

> st$out
[1] -2.997912 -2.673720 -2.981618

> out_index <- which(m %in% st$out)
 
> m[out_index]
[1] -2.997912 -2.673720 -2.981618
 
> out_index
[1]  10  85 100

Finally, we'll plot m vector and highlight the outliers.

> plot(m, type = "l", col = "blue")
 
> points(x = out_index, y = m[out_index], pch = 19, col = "red")
 

In this post, we have learned how to detect outliers with boxplot.stat function in R. Thank you for reading!

Outlier detection with Local Outlier Factor with R

Outlier check with SVM novelty detection in R

Outlier check with kmeans distance calculation in R

No comments:

Post a Comment