Fitting Polynomial Regression Data in R

   Polynomial regression is a nonlinear relationship between independent x and dependent y variables. Fitting such type of regression is essential when we analyze fluctuated data with some bends. 

    In this post, we'll learn how to fit and plot polynomial regression data in R. We use an lm() function in this regression model. Although it is a linear regression model function, lm() works well for polynomial models by changing the target formula type. The tutorial covers:

  1. Preparing the data
  2. Fitting the model
  3. Finding the best fit
  4. Source code listing

 
  Preparing the data


    We'll start by preparing test data for this tutorial as below.

peq = function(x) x^3+2*x^2+5
 
x = seq(-0.99, 1, by = .01)
y = peq(x) + runif(200)
 
df = data.frame(x = x, y = y)
head(df)
 
      x        y
1 -0.99 6.635701
2 -0.98 6.290250
3 -0.97 6.063431
4 -0.96 6.632796
5 -0.95 6.634153
6 -0.94 6.896084

We can visualize the the 'df' data to check visually in a plot. Our task is to fit this data with the best curve.

plot(df$x, df$y, pch=20, col="gray")
 


Fitting the model


   We build a model with lm() function with a formula. I(x^2) represents x2 in a formula. We can also use poly(x,2) function and it is the same with the expression of I(x^2).

model = lm(y~x+I(x^3)+I(x^2), data = df)
summary(model)

Call:
lm(formula = y ~ x + I(x^3) + I(x^2), data = df)

Residuals:
        Min          1Q      Median          3Q         Max 
-0.49598082 -0.21488892 -0.01301059  0.18515573  0.58048188 

Coefficients:
              Estimate Std. Error  t value
(Intercept)  4.3634157  0.1091087 39.99144
x           -0.1078152  0.9309088 -0.11582
I(x^3)      -0.5925309  1.3905638 -0.42611
I(x^2)       3.6462591  2.1359770  1.70707
                        Pr(>|t|)    
(Intercept) < 0.0000000000000002 ***
x                       0.908039    
I(x^3)                  0.670983    
I(x^2)                  0.091042 .  
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2626079 on 96 degrees of freedom
Multiple R-squared:  0.9243076, Adjusted R-squared:  0.9219422 
F-statistic: 390.7635 on 3 and 96 DF,  p-value: < 0.00000000000000022204

Next, we'll predict data with a trained model.

pred = predict(model,data=df)
 


Finding the best fit

   Finding the best-fitted curve is important. We check the model with various possible functions. Here, we apply four types of function to fit and check their performance.

    The orange line (linear regression) and yellow curve are the wrong choices for this data. The pink curve is close, but the blue curve is the best match for our data trend. Thus, I use the y~x3+x2 formula to build our polynomial regression model.
   You may find the best-fit formula for your data by visualizing them in a plot.



The source code of a plot is listed below.

windows(width=8, height=6)
plot(x=df$x, y=df$y, pch=20, col="grey")

lines(df$x, predict(lm(y~x, data=df)), type="l", col="orange1", lwd=2)
lines(df$x, predict(lm(y~I(x^2), data=df)), type="l", col="pink1", lwd=2)
lines(df$x, predict(lm(y~I(x^3), data=df)), type="l", col="yellow2", lwd=2)
lines(df$x, predict(lm(y~poly(x,3)+poly(x,2), data=df)), type="l", col="blue", lwd=2)
 
legend("topleft", 
        legend = c("y~x,  - linear","y~x^2", "y~x^3", "y~x^3+x^2"), 
        col = c("orange","pink","yellow","blue"),
        lty = 1, lwd=3
 ) 


Plotting the result

1. Plotting with a plot() function.

pred = predict(model,data = df)
lines(df$x, pred, lwd = 3, col = "blue")



2. Plotting with a ggplot().

Polynomial regression data can be easily fitted and plotted with ggplot().

library(ggplot2)
ggplot(data=df, aes(x,y)) +
       geom_point() + 
       geom_smooth(method="lm", formula=y~I(x^3)+I(x^2))



   In this tutorial, we have briefly learned how to fit polynomial regression data and plot the results with a plot() and ggplot() functions in R. The full source code is listed below.

 

Source code listing

peq = function(x) x^3+2*x^2+5
 
x = seq(-0.99, 1, by = .01)
y = peq(x) + runif(200)
 
df = data.frame(x = x, y = y)
head(df)
 
plot(df$x, df$y, pch=20, col="gray")
 
model = lm(y~x+I(x^3)+I(x^2), data = df)
summary(model)
 
pred = predict(model,data=df)  
 
windows(width=8, height=6)
plot(x=df$x, y=df$y, pch=20, col="grey")

lines(df$x, predict(lm(y~x, data=df)), type="l", col="orange1", lwd=2)
lines(df$x, predict(lm(y~I(x^2), data=df)), type="l", col="pink1", lwd=2)
lines(df$x, predict(lm(y~I(x^3), data=df)), type="l", col="yellow2", lwd=2)
lines(df$x, predict(lm(y~poly(x,3)+poly(x,2), data=df)), type="l", col="blue", lwd=2)
 
legend("topleft", 
        legend = c("y~x,  - linear","y~x^2", "y~x^3", "y~x^3+x^2"), 
        col = c("orange","pink","yellow","blue"),
        lty = 1, lwd=3
 
 
pred = predict(model,data = df)
lines(df$x, pred, lwd = 3, col = "blue")  

library(ggplot2) 
ggplot(data=df, aes(x,y)) + geom_point() 
       + geom_smooth(method="lm", formula=y~I(x^3)+I(x^2))


Polynomial Regression Fitting in Python

No comments:

Post a Comment