V = sqrt(X^2 / [nobs * (min(ncols, nrows) - 1)])

where X^2 is the Pearson chi-square,

As an example, we'll revisit the table of homelessness vs. gender we present in Section 2.6.3.

**SAS**

In SAS, Cramer's V is provided when the

`chisq`option to the

`tables`statement is used, in

`proc freq`.

proc freq data = "c:\book\help.sas7bdat";

tables female*homeless / chisq;

run;

resulting in

Statistics for Table of FEMALE by HOMELESS

Statistic DF Value Prob

------------------------------------------------------

Chi-Square 1 4.3196 0.0377

Likelihood Ratio Chi-Square 1 4.3654 0.0367

Continuity Adj. Chi-Square 1 3.8708 0.0491

Mantel-Haenszel Chi-Square 1 4.3101 0.0379

Phi Coefficient -0.0977

Contingency Coefficient 0.0972

Cramer's V -0.0977

where (as usual) several additional values are also included. The negative value shown for Cramer's V is odd-- it's unclear what rationale should be used for using the negative root. According to the documentation, this is only a possibility for 2 by 2 tables.

**R**

As far as we know, Cramer's V is not included in base R. Of course, it is easy to assemble directly. We found one version on line. However, this requires a table as input, so we've rewritten it here to accept vector input instead.

Here's the function, which uses

`unique()`(section 1.4.16) to extract the values of the rows and columns and

`length()`(Section 1.4.15) to find their number and the number of observations. A more bullet-proof version of the function would check to ensure the two vectors are of equal length (or allow the input in a variety of formats).

cv.test = function(x,y) {

CV = sqrt(chisq.test(x, y, correct=FALSE)$statistic /

(length(x) * (min(length(unique(x)),length(unique(y))) - 1)))

print.noquote("Cramér V / Phi:")

return(as.numeric(CV))

}

So we can get Cramer's V as

helpdata = read.csv("http://www.math.smith.edu/r/data/help.csv")

with(helpdata, cv.test(female, homeless)

[1] Cramér V / Phi:

[1] 0.09765063

## 5 comments:

be good to have ci's as well for Cramer's V

This could certainly be done easily using a bootstrapping procedure:

require(mosaic)

cv.test = function(x,y) {

CV = sqrt(chisq.test(x, y, correct=FALSE)$statistic /

(length(x) * (min(length(unique(x)),length(unique(y))) - 1)))

print.noquote("Cramér V / Phi:")

return(as.numeric(CV))

}

helpdata = read.csv("http://www.math.smith.edu/r/data/help.csv")

with(helpdata, cv.test(female, homeless))

res = do(5000)* with(resample(helpdata), cv.test(female, homeless))

qdata(c(.025, .975), res$result)

> with(helpdata, cv.test(female, homeless))

[1] Cramér V / Phi:

[1] 0.09765063

> qdata(c(.025, .975), res$result)

2.5% 97.5%

0.01315987 0.18717591

Also note that the "vcd" package has an "assocstats()" function which calculates Cramer's V (and other statistics.

you call that easy? and where is the interpretation?

It's easier than if you had to roll your own code!

My interpretation is that we observed a Cramer's V of 0.098 (very weak association). We're 95% confidence that the true V is captured by the interval 0.013 to 0.187.

You could get similar results using a conversion to Fisher Z and then back again. I.e.:

#"mat" being a r x c matrix/table

chicalc <- chisq.test(mat)$statistic

# calculate Cramer's v -

K <- min(nrow(mat),ncol(mat))

crv <- sqrt(chicalc / sum(mat)*(K-1))

# convert the Cramer's V to a Fisher's Z

fz <- 0.5 * log((1 + crv)/(1 - crv))

# calculate 95% conf.int around Fisher Z

conf.level <- 0.05

se <- 1/sqrt(sum(mat)-3) * qnorm(1-(conf.level/2))

cifz <- fz + c(-se,se)

# convert it back to conf.int around Cramer's V

cicrv <- (exp(2 * cifz) - 1)/(1 + exp(2 * cifz))

Post a Comment