SAS and R: Example 8.14: generating standardized regression coefficients

Monday, November 15, 2010

Example 8.14: generating standardized regression coefficients

Standardized (or beta) coefficients from a linear regression model are the parameter estimates obtained when the predictors and outcomes have been standardized to have variance = 1. Alternatively, the regression model can be fit and then standardized post-hoc based on the appropriate standard deviations. The parameters are thus interpreted as change in the outcome, in standard deviations, per standard deviation change in the predictors. However they're calculated, standardized coefficients facilitate an assessment of which variables have the greatest association with the outcome (or response) variable, though such an assessment ignores the confidence limits associated with each pairwise association.

It's straightforward to calculate these quantities in SAS and R. We'll demonstrate with data from the HELP study, modeling PCS as a function of MCS and homelessness among female subjects.

SAS

In SAS, standardized coefficients are available as the stb option for the model statement in proc reg.


proc reg data="c:\book\help";
   where female eq 1;
   model pcs = mcs homeless / stb;
run;
                       The REG Procedure
                         Model: MODEL1
                    Dependent Variable: PCS

                      Parameter Estimates
                   Parameter      Standard
Variable    DF      Estimate         Error   t Value   Pr > |t|

Intercept    1      39.62619       2.49830     15.86     <.0001
MCS          1       0.21945       0.07644      2.87     0.0050
HOMELESS     1      -2.56907       1.95079     -1.32     0.1908

                      Parameter Estimates
                                  Standardized
                 Variable    DF       Estimate

                 Intercept    1              0
                 MCS          1        0.26919
                 HOMELESS     1       -0.12348

R
In R we demonstrate the use of the lm.beta() function in the QuantPsyc package (due to Thomas D. Fletcher of State Farm). The function is short and sweet, and takes a linear model object as argument:


>lm.beta
function (MOD) 
{
    b <- summary(MOD)$coef[-1, 1]
    sx <- sd(MOD$model[-1])
    sy <- sd(MOD$model[1])
    beta <- b * sx/sy
    return(beta)
}

Here we apply the function to data from the HELP study.


ds = read.csv("http://www.math.smith.edu/r/data/help.csv")
female = subset(ds, female==1)
lm1 = lm(pcs ~ mcs + homeless, data=female)

The results, in terms of unstandardized regression parameters are the same as in SAS:


> summary(lm1)

Call:
lm(formula = pcs ~ mcs + homeless, data = female)

Residuals:
    Min      1Q  Median      3Q     Max 
-28.163  -5.821  -1.017   6.775  29.979 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 39.62619    2.49830  15.861  < 2e-16 ***
mcs          0.21945    0.07644   2.871  0.00496 ** 
homeless    -2.56907    1.95079  -1.317  0.19075    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 9.761 on 104 degrees of freedom
Multiple R-squared: 0.0862, Adjusted R-squared: 0.06862 
F-statistic: 4.905 on 2 and 104 DF,  p-value: 0.009212

To generate the standardized parameter estimates, we use the lm.beta() function.


library(QuantPsyc)
lm.beta(lm1)

This generates the following output:


       mcs   homeless 
 0.2691888 -0.1234776

A change in 1 standard deviation of MCS has more than twice the impact on PCS than a 1 standard deviation change in the HOMELESS variable. This example points up another potential weakness of standardized regression coefficients, however, in that the homeless variable can take on values of 0 or 1, and a 1 standard deviation change is hard to interpret.

6 comments:

Mark M. Fredrickson said...: Regarding the interpretation problem at the end, Andrew Gelman makes a compelling argument for standardizing variables by 2 standard deviations so that the variance is similar to a binary variable (provided p is not too far from 0.5):

http://onlinelibrary.wiley.com/doi/10.1002/sim.3107/abstract

The arm package implements a standardize() function that appears to work similarly to lm.beta.; November 16, 2010 at 2:28 PM
Ken Kleinman said...: I think it would make more sense to only standardize the continuous ones-- 2sd makes sense for them. I would leave the categorical covars as is, and also would not touch the outcome.; November 17, 2010 at 12:30 PM
Anonymous said...: Just curious, what is the rationale/support for the statement: "such an assessment ignores the confidence limits associated with each pairwise association"? Cheers!; June 20, 2013 at 5:12 PM
Saz said...: May I ask, how to get 95% confidence interval from standardized coefficients obtained from linear regression?; October 12, 2015 at 4:11 AM
Nick Horton said...: You can run "Make.Z()" in the QuantPsyc package to convert your data (then lm() would do this for you automatically).; October 12, 2015 at 1:40 PM
J.V. said...: I have a question that is and R question and a statistical question:
I am analysing sales of a retailer. These sales are related to some vars: var1, var2, var3.., varN
Most of the vars are continuos.
I want to analyze the relationship between sales and the vars. I have made a linear regression with R:

rg<-lm(sales ~ var1 + var2 + var3 + var4, data=sales_2017)
summary(rg)

Now I want to know which is the most important variable in sales, and to know the percent of importance of each var. I am doing this (caret package):

varImp(rg, scale = FALSE)
rsimp <- varImp(rg, scale = FALSE)
plot(rsimp)

Is this a good method to obtain variables importance??, is good way in R?
Thanks in advance. Any advice will be greatly apreciated.

Juan; July 4, 2017 at 11:21 AM

Post a Comment

Reviews (from the first edition)

"By placing the R and SAS solutions together and by covering a vast array of tasks in one book, Kleinman and Horton have added surprising value and searchability to the information in their book. … a home run, and it is a book I am grateful to have sitting, dust-free, on my shelf."
—Robert Alan Greevy, Jr, Teaching of Statistics in the Health Sciences

"I use SAS and R on a daily basis. Each has strengths and weaknesses, and using both of them gives the advantage of being able to do almost anything when it comes to data manipulation, analysis, and graphics. If you use both SAS and R on a regular basis, get this book. If you know one of the packages and are learning the other, you may need more than this book, but get this book, too. "

Charles Heckler, University of Rochester, Technometrics

"Excellent cross-referencing to other topics and end-of-chapter worked examples on the ‘Health evaluation and linkage to primary care’ data set are given with each topic. … users who are proficient in either of the software packages but with the need to use the other will find this book useful."
—Frances Denny, Journal of the Royal Statistical Society, Series A

About the authors

Nicholas Horton is a Professor of Statistics at Amherst College. He is a biostatistician with expertise in missing data methods, longitudinal regression, statistical computing and statistical education. Nick's home page; Nick's Google Scholar author page

Ken Kleinman is an Associate Professor with the Department of Biostatistics and Epidemiology at the University of Massachusetts, Amherst. He is a consulting biostatistician with expertise in group-randomized trials and disease surveillance; he also offers R training courses. Ken's home page; Ken's Google Scholar author page.

SAS and R

Catalogs of posts

Monday, November 15, 2010

Example 8.14: generating standardized regression coefficients

6 comments:

About SAS and R

Topics discussed