您的当前位置:首页正文

数学模型8

来源:华佗小知识
八.多元分析实验

1 回归分析Ⅰ

解:首先我们做如下假设:

Y01X1

即血压与年龄成一元线性关系。

编写R程序如下:

x=matrix(c(27,73,21,66,22,63,24,75,25,71,23,70,20,65,20,70,29,79,24,72,25,68,28,67,26,79,38,91,32,76,33,69,31,66,34,73,37,78,38,87,33,76,35,79,30,73,31,80,37,68,39,75,46,89,49,101,40,70,42,72,43,80,46,83,43,75,44,71,46,80,47,96,45,92,49,80,48,70,4

0,90,42,85,55,76,54,71,57,99,52,86,53,79,56,92,52,85,50,71,59,90,50,91,52,100,58,80,57,109),nrow=54,ncol=2,byrow=T,dimnames=list(1:54,c(\"C\ outputcost=as.data.frame(x) plot(outputcost$C,outputcost$E)

下面在之前数据录入的基础上做回归分析(程序接前文,下同) 可见X与Y的关系是明显的线性的,做回归,并检视回归结果。

lm.sol = lm(E~C,data = outputcost) summary(lm.sol)

运行结果:

Call:

lm(formula = E ~ C, data = outputcost)

Residuals:

Min 1Q Median 3Q Max -16.4786 -5.7877 -0.0784 5.6117 19.7813

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 56.15693 3.99367 14.061 < 2e-16 *** C 0.58003 0.09695 5.983 2.05e-07 *** ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.146 on 52 degrees of freedom

Multiple R-squared: 0.4077, Adjusted R-squared: 0.3963 F-statistic: 35.79 on 1 and 52 DF, p-value: 2.05e-07

常数项0=56.157,变量的系数1=0.580 得到回归方程:y=56.157+0.580x

在上面的结果中sd(0)=3.99367 sd(1)=0.09695。而对应于两个系数的P值2e-16,2.05e-07,故是非常显著的。

关于方程的检验,残差的标准差=8.146。相关系数的平方R2=0.4077。关于F分布的P值为2.05e-07,也是非常显著的。 将得到的直线方程画在散点图上,程序如下:

abline(lm.sol)



下面分析残差:

y.res=residuals(lm.sol); plot(y.res)

得到残差图:

用残差分析剔除异常点

plot(lm.sol,which=1:4)

得到的四个图依次为: 普通残差与拟合值的残差图

正态QQ的残差图(若残差是来自正态总体分布的样本,则QQ图中的点应该在一条直线上)

标准化残差开方与拟合值的残差图(对于近似服从正态分布的标准化残差,应该有95%的样本点落在[-2,2]的区间内。这也是判断异常点的直观方法)

cook统计量的残差图(cook统计量值越大的点越可能是异常值,但具体阀值是多少较难判别)

从图中可见,28,43,54三个样本存在异常,需要剔除。 因此方差满足正态性要求。 检验异方差:

res.test<-residuals(lm.sol) library(lm.sol)

gqtest(lm.sol)

(运行不出来,不知道为什么老是说lm.sol不存在) 去除掉28,43,54三个点之后,编程如下:

i=1:54;

outputcost2=as.data.frame(x[i!=28,43,54]) lm2=lm(E~C,data=outputcost2) summary(lm2)

去除掉28,43,54三个点之后,总程序如下:

x=matrix(c(27,73,21,66,22,63,24,75,25,71,23,70,20,65,20,70,29,79,24,72,25,68,28,67,26,79,38,91,32,76,33,69,31,66,34,73,37,78,38,87,33,76,35,79,30,73,31,80,37,68,39,75,46,89,40,70,42,72,43,80,46,83,43,75,44,71,46,80,47,96,45,92,49,80,48,70,40,90,42,85,55,76,57,99,52,86,53,79,56,92,52,85,50,71,59,90,50,91,52,100,58,80),nrow=51,ncol=2,byrow=T,dimnames=list(1:51,c(\"C\ outputcost=as.data.frame(x) plot(outputcost$C,outputcost$E) lm.sol = lm(E~C,data = outputcost) summary(lm.sol)

运行结果:

Call:

lm(formula = E ~ C, data = outputcost)

Residuals:

Min 1Q Median 3Q Max -13.2860 -5.5024 0.2077 4.8396 14.6403

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 57.44586 3.59495 15.980 < 2e-16 *** C 0.53680 0.08908 6.026 2.13e-07 *** ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.14 on 49 degrees of freedom

Multiple R-squared: 0.4257, Adjusted R-squared: 0.4139 F-statistic: 36.31 on 1 and 49 DF, p-value: 2.133e-07

abline(lm.sol)

得到最后的散点图和回归直线:

得到回归方程:

y=57.44586+0.53680x

2 回归分析Ⅱ

解:首先我们做如下假设:

Y01X12X23X3

即土壤所含可给态磷的浓度与土壤中所含无机磷的浓度、土壤内溶于K2CO3溶液并受溴化物水解的有机磷的浓度、以及土壤内溶于K2CO3溶液但不溶于溴化物水解的有机磷的浓度成多元线性关系。

(1)要建立多元线性回归方程模型,可以用函数lm()求解,并用函数summary()提取信息。

编写R程序如下:

phosphorus<-data.frame(

X1=c( 0.4, 0.4, 3.1, 0.6, 4.7, 1.7, 9.4, 10.1, 11.6, 12.6,10.9,23.1,23.1,21.6,23.1, 1.9, 26.8, 29.9), X2=c( 52, 23, 19, 34, 24, 65, 44, 31, 29, 58, 37, 46, 50, 44, 56, 36, 58, 51),

X3=c( 158, 163, 37, 157, 59, 123, 46, 117, 173, 112, 111, 114, 134, 73, 168, 143, 202, 124), Y= c( 64, 60, 71, 61, 54, 77, 81, 93, 93, 51, 76, 96, 77, 93, 95, 54, 168, 99) )

lm.sol<-lm(Y ~ 1+X1+X2+X3, data=phosphorus) summary(lm.sol)

运行程序可得到如下结果:

Call:

lm(formula = Y ~ 1 + X1 + X2 + X3, data = phosphorus)

Residuals:

Min 1Q Median 3Q Max -28.349 -11.383 -2.659 12.095 48.807

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 43.65007 18.05442 2.418 0.02984 * X1 1.78534 0.53977 3.308 0.00518 ** X2 -0.08329 0.42037 -0.198 0.84579 X3 0.16102 0.11158 1.443 0.17098 ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 19.97 on 14 degrees of freedom

Multiple R-squared: 0.5493, Adjusted R-squared: 0.4527 F-statistic: 5.688 on 3 and 14 DF, p-value: 0.009227

ˆ18.05442和ˆ1.78534是显著的,而从结果可以得到,回归系数01ˆ0.08329和ˆ0.16102是不显著的。回归方程为: 23Y18.0541.785X10.083X20.161X3

(2)用函数step()做逐步回归,编写R程序并得到运行结果如下:

> lm.step<-step(lm.sol) Start: AIC=25.01 Y ~ X1 + X2 + X3

Df Sum of Sq RSS AIC 48.11 25.01 - X3 1 9.79 57.90 25.42 - X1 1 367.33 415.44 51.04 - X2 1 1178.96 1227.07 65.12 >

> summary(lm.step) Call:

lm(formula = Y ~ X1 + X2 + X3, data = cement)

Residuals:

Min 1Q Median 3Q Max -3.2543 -1.4726 0.1755 1.5409 3.9711

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 48.19363 3.91330 12.315 6.17e-07 *** X1 1.69589 0.20458 8.290 1.66e-05 *** X2 0.65691 0.04423 14.851 1.23e-07 *** X3 0.25002 0.18471 1.354 0.209 ---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.312 on 9 degrees of freedom

Multiple R-squared: 0.9823, Adjusted R-squared: 0.9764 F-statistic: 166.3 on 3 and 9 DF, p-value: 3.367e-08

ˆ和ˆ更加显著了,ˆ也显著了。从运行结果可以看出,回归系数而且系数102可以得到新的回归方程如下:

Y48.1931.696X10.657X20.25X3

因篇幅问题不能全部显示,请点此查看更多更全内容