# 样条回归

High-degree polynomials allow us to capture complicated nonlinear relationships in the data but are therefore more likely to overfit the training set.

A spline is a piecewise polynomial function. This means it splits the predictor variable into regions and fits a separate polynomial within each region, which regions connect to each other via knots.

restrictions need to be imposed so that the spline is continuous (i.e., there is no gap in the spline curve) and “smooth” at each knot。A restricted cubic spline has the additional property that the curve is linear before the first knot and after the last knot.

The number of knots used in the spline is determined by the user, but in practice we have found that generally five or fewer knots are sufficient. The location of the knots also needs to be specified by the user, but it is common that the knot with the smallest value is relatively close to the smallest value of the variable being modelled (e.g., the 5th percentile), while the largest knot is in the neighbourhood of the largest value of the variable being modelled (e.g., the 95th percentile).

# 广义可加模型

GAMs automatically learn a nonlinear relationship between each predictor variable and the outcome variable, and then add these effects together linearly, along with the intercept.

x2的作用我们就可以解释为在其它变量不变的情况下，x2和结局之间的关系是线性的，xp对左边的结局在某个点之前也基本是线性增加的，然乎某个点之后xp对结局就无影响了，这个就是将模型相加后才可能实现的解释性。

The level of smoothness is determined by the smoothing parameter, which we denote by λ. The higher the value of λ, the smoother the curve

# 实例操练

``````ggplot(train.data, aes(lstat, medv) ) +
geom_point() +
stat_smooth()``````

``lm(medv ~ lstat + I(lstat^2), data = train.data)``

``knots <- quantile(train.data\$lstat, p = c(0.25, 0.5, 0.75))``

``model <- lm (medv ~ bs(lstat, knots = knots), data = train.data)``

``````b1 <- gam(y ~ s(x1, bs='ps', sp=0.6) + s(x2, bs='ps', sp=0.6) + x3, data = dat)
summary(b1)``````