How to Forecast Time-series Data in R

    Forecasting the future direction of time series data like the price, sales, or trend is an interesting topic in data analysis. Time series data forecasting is to create forecast data for future trend based on historical inputs.
    In this tutorial, we'll briefly learn how to forecast time series data and plot it in R by using the forecast package. The tutorial covers:
  1. Preparing the data
  2. Forecasting data
  3. Source code listing
    We'll start by loading the required libraries. You can install them by typing 'install.packages("forecast")' and 'install.packages("ggplot2")' in R command prompt.

library(forecast)
library(ggplot2)


Preparing the data
 
    We use a simple simulated time-series data in this tutorial. First, we'll generate data that contains a daily trend of price starting from 2017/01/01 to 2018/04/01. Based on that period of data, we will forecast a price trend for the coming 30 days.

actual = seq.Date(from = as.Date("2017/01/01"),
                  to = as.Date("2018/04/01"),
                  by = "day")
 
forecas = seq.Date(from = as.Date("2018/04/02"),
                   to = as.Date("2018/05/01"),
                   by = "day")
 
n = length(actual_days)
s = seq(.1, n/10, .1)
price = 10 + s*sin(s/500) + rnorm(n) + runif(n)
df = data.frame(date = actual_days, price = price)


We can visualize the simulated data and check the trend.

ggplot(df, aes(x = date, y = price)) + 
   geom_line(color="blue") +
   scale_x_date(date_labels="%Y-%m", date_breaks="months") +
   theme(axis.text.x = element_text(angle=70, hjust=1))



Forecasting data

   First, we need to create time-series object. A 'ts' function helps to create a time-series object from the give vector or matrix of observation data. Here, the price will be observation data and we'll set 7 for frequency parameter to sample daily base.

ts_price = ts(price, frequency = 7)
str(ts_price)
 Time-Series [1:456] from 1 to 66: 12.3 10.2 10.3 9.7 10.8 ... 

Now we can forecast ts_price object by using forecast() function. A forecast() function forecasts time-series data. To set the target period to forecast we use the h parameter and set 30 for 30 days.

fc = forecast(ts_price, h=30)
names(fc)
 [1] "model"     "mean"      "level"     "x"         "upper"    
 [6] "lower"     "fitted"    "method"    "series"    "residuals"


You can check the above attributes of the 'fc' object to know more about them.

Next, we'll visualize the forecasted data in a plot.

plot(fc, ylab = "price", xaxt = "n")
lines(fc$fitted, col = "red", lwd = 2)


   The plot shows forecast data in a default view format. To visualize dates, we can arrange the output data with the following methods and draw it again.

fc_df = rbind(df, data.frame(date=forecast_days, price = NA))
fc_df = cbind(fc_df, fitted = c(fc$fitted, fc$mean))
 
ggplot() + 
  geom_line(aes(date, price), fc_df,color = "blue") +  
  geom_line(aes(date, fitted), fc_df, color = "red", lwd = 1) +
  scale_x_date(date_labels = "%Y-%m", date_breaks = "months")+
  theme(axis.text.x = element_text(angle = 70, hjust = 1))



   The plot shows the period and the forecast trend. In this post, we've briefly learned how to forecast time-series data by using the 'forecast' package and plot it in ggplot in R. The full source code is listed below.


Source code listing


 
library(forecast)
library(ggplot2)
 
actual = seq.Date(from = as.Date("2017/01/01"),
                  to = as.Date("2018/04/01"),
                  by = "day")
 
forecas = seq.Date(from = as.Date("2018/04/02"),
                   to = as.Date("2018/05/01"),
                   by = "day")
 
n = length(actual_days)
s = seq(.1, n/10, .1)
price = 10 + s*sin(s/500) + rnorm(n) + runif(n)
df = data.frame(date = actual_days, price = price)
 
ggplot(df, aes(x = date, y = price)) + 
  geom_line(color="blue") +
  scale_x_date(date_labels="%Y-%m", date_breaks="months") +
  theme(axis.text.x = element_text(angle=70, hjust=1))
 
ts_price = ts(price, frequency = 7)
str(ts_price) 
 
fc = forecast(ts_price, h=30)
names(fc)
 
plot(fc, ylab = "price", xaxt = "n")
lines(fc$fitted, col = "red", lwd = 2)
 
fc_df = rbind(df, data.frame(date=forecast_days, price = NA))
fc_df = cbind(fc_df, fitted = c(fc$fitted, fc$mean))
 
ggplot() + 
   geom_line(aes(date, price), fc_df,color = "blue") +  
   geom_line(aes(date, fitted), fc_df, color = "red", lwd = 1) +
   scale_x_date(date_labels = "%Y-%m", date_breaks = "months")+
   theme(axis.text.x = element_text(angle = 70, hjust = 1))
 

No comments:

Post a Comment