1 Refer to the former blog

Recall the example of using the polynomial to approximate the sin function. The order of polynomial we choose is vital to get a proper fitting. So actually, every parameter corresponds to a feature. We want to know if the feature is related to the outputs.

In statistics, a lack-of-fit test is any of many tests of a null hypothesis that a proposed statistical model fits well.

2 Decomposing the Error

Again, suppose there is a real model \(f(x)\), and we get the prediction function based on the data points drawn from \(y=f(x)+\N(0,\sigma^2)\). Suppose the noise is normally distributed and at each single value of \(x\) we have the same σ without drift. For every single value of \(x_i\) we may actually get several different responses \(y_{ij}\), where \(i \in [1,2,..C]\) and \(j \in [1,2,..N_i]\). \(N_i\) is the number of the responses corresponding to \(x_i\), \(C\) is the number of distinct \(x\) in the data set. According to wikipedia.

We break down the residual error ("error sum of squares" — denoted SSE) into two components: a component that is due to lack of model fit ("lack of fit sum of squares" — denoted SSLF) a component that is due to pure random error ("pure error sum of squares" — denoted SSPE) If the lack of fit sum of squares is a large component of the residual error, it suggests that a linear function is inadequate.