It's straightforward to calculate these quantities in SAS and R. We'll demonstrate with data from the HELP study, modeling PCS as a function of MCS and homelessness among female subjects.

**SAS**

In SAS, standardized coefficients are available as the

`stb`option for the

`model`statement in

`proc reg`.

proc reg data="c:\book\help";

where female eq 1;

model pcs = mcs homeless / stb;

run;

The REG Procedure

Model: MODEL1

Dependent Variable: PCS

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 39.62619 2.49830 15.86 <.0001

MCS 1 0.21945 0.07644 2.87 0.0050

HOMELESS 1 -2.56907 1.95079 -1.32 0.1908

Parameter Estimates

Standardized

Variable DF Estimate

Intercept 1 0

MCS 1 0.26919

HOMELESS 1 -0.12348

**R**

In R we demonstrate the use of the

`lm.beta()`function in the

`QuantPsyc`package (due to Thomas D. Fletcher of State Farm). The function is short and sweet, and takes a linear model object as argument:

>lm.beta

function (MOD)

{

b <- summary(MOD)$coef[-1, 1]

sx <- sd(MOD$model[-1])

sy <- sd(MOD$model[1])

beta <- b * sx/sy

return(beta)

}

Here we apply the function to data from the HELP study.

ds = read.csv("http://www.math.smith.edu/r/data/help.csv")

female = subset(ds, female==1)

lm1 = lm(pcs ~ mcs + homeless, data=female)

The results, in terms of unstandardized regression parameters are the same as in SAS:

> summary(lm1)

Call:

lm(formula = pcs ~ mcs + homeless, data = female)

Residuals:

Min 1Q Median 3Q Max

-28.163 -5.821 -1.017 6.775 29.979

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 39.62619 2.49830 15.861 < 2e-16 ***

mcs 0.21945 0.07644 2.871 0.00496 **

homeless -2.56907 1.95079 -1.317 0.19075

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.761 on 104 degrees of freedom

Multiple R-squared: 0.0862, Adjusted R-squared: 0.06862

F-statistic: 4.905 on 2 and 104 DF, p-value: 0.009212

To generate the standardized parameter estimates, we use the

`lm.beta()`function.

library(QuantPsyc)

lm.beta(lm1)

This generates the following output:

mcs homeless

0.2691888 -0.1234776

A change in 1 standard deviation of MCS has more than twice the impact on PCS than a 1 standard deviation change in the HOMELESS variable. This example points up another potential weakness of standardized regression coefficients, however, in that the homeless variable can take on values of 0 or 1, and a 1 standard deviation change is hard to interpret.

## 6 comments:

Regarding the interpretation problem at the end, Andrew Gelman makes a compelling argument for standardizing variables by 2 standard deviations so that the variance is similar to a binary variable (provided p is not too far from 0.5):

http://onlinelibrary.wiley.com/doi/10.1002/sim.3107/abstract

The arm package implements a standardize() function that appears to work similarly to lm.beta.

I think it would make more sense to only standardize the continuous ones-- 2sd makes sense for them. I would leave the categorical covars as is, and also would not touch the outcome.

Just curious, what is the rationale/support for the statement: "such an assessment ignores the confidence limits associated with each pairwise association"? Cheers!

May I ask, how to get 95% confidence interval from standardized coefficients obtained from linear regression?

You can run "Make.Z()" in the QuantPsyc package to convert your data (then lm() would do this for you automatically).

I have a question that is and R question and a statistical question:

I am analysing sales of a retailer. These sales are related to some vars: var1, var2, var3.., varN

Most of the vars are continuos.

I want to analyze the relationship between sales and the vars. I have made a linear regression with R:

rg<-lm(sales ~ var1 + var2 + var3 + var4, data=sales_2017)

summary(rg)

Now I want to know which is the most important variable in sales, and to know the percent of importance of each var. I am doing this (caret package):

varImp(rg, scale = FALSE)

rsimp <- varImp(rg, scale = FALSE)

plot(rsimp)

Is this a good method to obtain variables importance??, is good way in R?

Thanks in advance. Any advice will be greatly apreciated.

Juan

Post a Comment