数学模型8
1 回归分析Ⅰ
解:首先我们做如下假设:
Y01X1
即血压与年龄成一元线性关系。
编写R程序如下:
x=matrix(c(27,73,21,66,22,63,24,75,25,71,23,70,20,65,20,70,29,79,24,72,25,68,28,67,26,79,38,91,32,76,33,69,31,66,34,73,37,78,38,87,33,76,35,79,30,73,31,80,37,68,39,75,46,89,49,101,40,70,42,72,43,80,46,83,43,75,44,71,46,80,47,96,45,92,49,80,48,70,4
0,90,42,85,55,76,54,71,57,99,52,86,53,79,56,92,52,85,50,71,59,90,50,91,52,100,58,80,57,109),nrow=54,ncol=2,byrow=T,dimnames=list(1:54,c(\"C\ outputcost=as.data.frame(x) plot(outputcost$C,outputcost$E)
下面在之前数据录入的基础上做回归分析(程序接前文,下同) 可见X与Y的关系是明显的线性的,做回归,并检视回归结果。
lm.sol = lm(E~C,data = outputcost) summary(lm.sol)
运行结果:
Call:
lm(formula = E ~ C, data = outputcost)
Residuals:
Min 1Q Median 3Q Max -16.4786 -5.7877 -0.0784 5.6117 19.7813
Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 56.15693 3.99367 14.061 < 2e-16 *** C 0.58003 0.09695 5.983 2.05e-07 *** ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.146 on 52 degrees of freedom
Multiple R-squared: 0.4077, Adjusted R-squared: 0.3963 F-statistic: 35.79 on 1 and 52 DF, p-value: 2.05e-07
常数项0=56.157,变量的系数1=0.580 得到回归方程:y=56.157+0.580x
在上面的结果中sd(0)=3.99367 sd(1)=0.09695。而对应于两个系数的P值2e-16,2.05e-07,故是非常显著的。
关于方程的检验,残差的标准差=8.146。相关系数的平方R2=0.4077。关于F分布的P值为2.05e-07,也是非常显著的。 将得到的直线方程画在散点图上,程序如下:
abline(lm.sol)
下面分析残差:
y.res=residuals(lm.sol); plot(y.res)
得到残差图:
用残差分析剔除异常点
plot(lm.sol,which=1:4)
得到的四个图依次为: 普通残差与拟合值的残差图
正态QQ的残差图(若残差是来自正态总体分布的样本,则QQ图中的点应该在一条直线上)
标准化残差开方与拟合值的残差图(对于近似服从正态分布的标准化残差,应该有95%的样本点落在[-2,2]的区间内。这也是判断异常点的直观方法)
cook统计量的残差图(cook统计量值越大的点越可能是异常值,但具体阀值是多少较难判别)
从图中可见,28,43,54三个样本存在异常,需要剔除。 因此方差满足正态性要求。 检验异方差:
res.test<-residuals(lm.sol) library(lm.sol)
gqtest(lm.sol)
(运行不出来,不知道为什么老是说lm.sol不存在) 去除掉28,43,54三个点之后,编程如下:
i=1:54;
outputcost2=as.data.frame(x[i!=28,43,54]) lm2=lm(E~C,data=outputcost2) summary(lm2)
去除掉28,43,54三个点之后,总程序如下:
x=matrix(c(27,73,21,66,22,63,24,75,25,71,23,70,20,65,20,70,29,79,24,72,25,68,28,67,26,79,38,91,32,76,33,69,31,66,34,73,37,78,38,87,33,76,35,79,30,73,31,80,37,68,39,75,46,89,40,70,42,72,43,80,46,83,43,75,44,71,46,80,47,96,45,92,49,80,48,70,40,90,42,85,55,76,57,99,52,86,53,79,56,92,52,85,50,71,59,90,50,91,52,100,58,80),nrow=51,ncol=2,byrow=T,dimnames=list(1:51,c(\"C\ outputcost=as.data.frame(x) plot(outputcost$C,outputcost$E) lm.sol = lm(E~C,data = outputcost) summary(lm.sol)
运行结果:
Call:
lm(formula = E ~ C, data = outputcost)
Residuals:
Min 1Q Median 3Q Max -13.2860 -5.5024 0.2077 4.8396 14.6403
Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 57.44586 3.59495 15.980 < 2e-16 *** C 0.53680 0.08908 6.026 2.13e-07 *** ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.14 on 49 degrees of freedom
Multiple R-squared: 0.4257, Adjusted R-squared: 0.4139 F-statistic: 36.31 on 1 and 49 DF, p-value: 2.133e-07
abline(lm.sol)
得到最后的散点图和回归直线:
得到回归方程:
y=57.44586+0.53680x
2 回归分析Ⅱ
解:首先我们做如下假设:
Y01X12X23X3
即土壤所含可给态磷的浓度与土壤中所含无机磷的浓度、土壤内溶于K2CO3溶液并受溴化物水解的有机磷的浓度、以及土壤内溶于K2CO3溶液但不溶于溴化物水解的有机磷的浓度成多元线性关系。
(1)要建立多元线性回归方程模型,可以用函数lm()求解,并用函数summary()提取信息。
编写R程序如下:
phosphorus<-data.frame(
X1=c( 0.4, 0.4, 3.1, 0.6, 4.7, 1.7, 9.4, 10.1, 11.6, 12.6,10.9,23.1,23.1,21.6,23.1, 1.9, 26.8, 29.9), X2=c( 52, 23, 19, 34, 24, 65, 44, 31, 29, 58, 37, 46, 50, 44, 56, 36, 58, 51),
X3=c( 158, 163, 37, 157, 59, 123, 46, 117, 173, 112, 111, 114, 134, 73, 168, 143, 202, 124), Y= c( 64, 60, 71, 61, 54, 77, 81, 93, 93, 51, 76, 96, 77, 93, 95, 54, 168, 99) )
lm.sol<-lm(Y ~ 1+X1+X2+X3, data=phosphorus) summary(lm.sol)
运行程序可得到如下结果:
Call:
lm(formula = Y ~ 1 + X1 + X2 + X3, data = phosphorus)
Residuals:
Min 1Q Median 3Q Max -28.349 -11.383 -2.659 12.095 48.807
Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 43.65007 18.05442 2.418 0.02984 * X1 1.78534 0.53977 3.308 0.00518 ** X2 -0.08329 0.42037 -0.198 0.84579 X3 0.16102 0.11158 1.443 0.17098 ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 19.97 on 14 degrees of freedom
Multiple R-squared: 0.5493, Adjusted R-squared: 0.4527 F-statistic: 5.688 on 3 and 14 DF, p-value: 0.009227
ˆ18.05442和ˆ1.78534是显著的,而从结果可以得到,回归系数01ˆ0.08329和ˆ0.16102是不显著的。回归方程为: 23Y18.0541.785X10.083X20.161X3
(2)用函数step()做逐步回归,编写R程序并得到运行结果如下:
> lm.step<-step(lm.sol) Start: AIC=25.01 Y ~ X1 + X2 + X3
Df Sum of Sq RSS AIC > summary(lm.step) Call: lm(formula = Y ~ X1 + X2 + X3, data = cement) Residuals: Min 1Q Median 3Q Max -3.2543 -1.4726 0.1755 1.5409 3.9711 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 48.19363 3.91330 12.315 6.17e-07 *** X1 1.69589 0.20458 8.290 1.66e-05 *** X2 0.65691 0.04423 14.851 1.23e-07 *** X3 0.25002 0.18471 1.354 0.209 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 2.312 on 9 degrees of freedom Multiple R-squared: 0.9823, Adjusted R-squared: 0.9764 F-statistic: 166.3 on 3 and 9 DF, p-value: 3.367e-08 ˆ和ˆ更加显著了,ˆ也显著了。从运行结果可以看出,回归系数而且系数102可以得到新的回归方程如下: Y48.1931.696X10.657X20.25X3 因篇幅问题不能全部显示,请点此查看更多更全内容