SAS
data test;
p1 = .1; p2 = .2; p3 = .3;
do i = 1 to 10000;
x = uniform(0);
mycat = (x ge 0) + (x gt p1) + (x gt p1 + p2)
+ (x gt p1 + p2 + p3);
output;
end;
run;
Here the parenthetical logical tests in the mycat = line resolve to 1 if the test is true and 0 otherwise, as discussed in section 1.4.9.
The (x ge 0) makes the categories range from 1 to 4, rather than 0 to 3.
The results can be assessed using proc freq:
proc freq data=test; tables mycat; run;
Cumulative Cumulative
mycat Frequency Percent Frequency Percent
----------------------------------------------------------
1 947 9.47 947 9.47
2 2061 20.61 3008 30.08
3 3039 30.39 6047 60.47
4 3953 39.53 10000 100.00
R
In contrast, the R syntax to get the results is rather dense.
p <- c(.1,.2,.3)
x <- runif(10000)
mycat <- numeric(10000)
for (i in 0:length(p)) {
mycat <- mycat + (x >= sum(p[0:i]))
}
We can display the results using the summary() function.
summary(factor(mycat))
1 2 3 4
990 2047 2978 3985
11 comments:
Or, you could just use
mycat <- cut(runif(10000), c(0, 0.1, 0.3, 0.6, 1), labels=FALSE)
Thanks, Douglas! Much better.
It looks like if I omit the labels=FALSE, the factor labels are very useful, too.
> mycat <- cut(runif(10000), c(0, 0.1, 0.3, 0.6, 1))
> summary(mycat)
(0,0.1] (0.1,0.3] (0.3,0.6] (0.6,1]
987 1993 3047 3973
Sample may be a better function to simulate categorical data:
> sample(1:4,10000,rep=TRUE,prob=c(.1,.2,.3,.4))
> table(sample)
1 2 3 4
1012 2074 2924 3990
Hello,
how could I simulate data from a multinomial logit model depending on a metric variable.
I'm not sure what you're asking. You can simulate data from a multinomial logistic model using a process similar to what we show for logistic regression in this entry: http://sas-and-r.blogspot.com/2009/06/example-72-simulate-data-from-logistic.html. What do you mean by a "metric" variable, though?
Hello,
Can I simulate variables with a known Pearson covariance matrix?
I need to simulate categorical, continuous and binary variables based on the pearson covariance matrix? thanks
In example 6.3 in our book, we show correlated binary variables, based on Lipsitz et al, Stats in Med 1990, 9:1517-1525. You'll find many cites if you search with "simulate correlated" as your base.
Thanks for the response.
There is an R package called "bindata". It performs almost perfect to create correlated binary variables, with known marginal probabilities and correlations.
What I need is the simulation of correlated continuous and categorical variables using a single multivariate distribution.
Good to know about that one, thanks. I don't know of a technique to do what you need, offhand. A brief search turned up this thread: http://stats.stackexchange.com/questions/22856/how-to-generate-correlated-test-data-that-has-bernoulli-categorical-and-contin where copulas are suggested. And also this paper: http://www.springerlink.com/content/011x633m554u843g/. Let me know what you end up doing.
There is a literature that might be relevant. A starting point might be Cox, D. R. and Wermuth, N. (1992). Response models for mixed binary and quantitative variables. Biometrika, 79, 441-461. They propose a flexible multivariate distribution which might be useful.
What is the variance of the error term when a multinomial logit is simulated in this way?
Post a Comment