Comments on SAS and R: Example 7.2: Simulate data from a logistic regression

I'd suggest turning our code into a function t...

2019-08-04T09:26:30.584-04:00

I'd suggest turning our code into a function then iterating over each of the values in your dataset.

Can someone help me? How do I repeat the simulatio...

2019-07-30T12:41:25.147-04:00

Can someone help me? How do I repeat the simulation like 30 times having different datasets in one file for example
simulation id y x1 x2
1 1 1 0 18
1 2 0 1 20
1 3 0 0 24
1 4 1 1 28
1 5 0 1 40
2 1 1 1 44
2 2 0 0 25
2 3 1 1 38
2 4 1 1 39
2 5 1 0 41
3 1 1 1 43
3 2 1 0 45
3 3 1 0 43
3 4 0 0 41
3 5 0 1 40

How to simulate a binary response (obese or not ob...

2019-07-09T14:40:55.407-04:00

How to simulate a binary response (obese or not obese) variable given a distribution of body weight?

This is very helpful and informative. I do I intro...

2019-04-13T12:27:59.770-04:00

This is very helpful and informative. I do I introduce error terms to the ytest?

This is very helpful. Thank you for the post! Is i...

2018-06-09T21:44:32.702-04:00

This is very helpful. Thank you for the post!
Is it possible to modify your program to simulate a case-control data with P(Y=1)=0.5? Many thanks!!

You can use whatever probabilities you like: the s...

2017-11-30T06:05:08.042-05:00

You can use whatever probabilities you like: the simulation can be structured to track the scenario of interest.

Thanks, that was what I meant. What I am intereste...

2017-11-30T04:23:20.579-05:00

Thanks, that was what I meant. What I am interested in is the probabilities of treatment assignment. Is it that I would have one set of probabilities representative of all the datasets or each dataset would have it's own set computed separately?

Safiya, you could certainly use your dataset as th...

2017-11-27T07:45:40.552-05:00

Safiya, you could certainly use your dataset as the basis of your simulations and create new Y's using the approach we've described. Is that what you mean by "simulate several logistic regression results"?

Supposing I already have a dataset, can I use same...

2017-11-27T05:08:11.853-05:00

Supposing I already have a dataset, can I use same to simulate several logistic regression results?

Mr Ken Kleinman I used your post how to generate ...

2016-10-13T19:36:33.851-04:00

Mr Ken Kleinman

I used your post how to generate data from a logistic regression in SAS. Its very helpful posts for new users. I use your post of generating data from logistic regression I generate 1000 random numbers, Now I want to replicate this results 100 times, how i can do this. Any suggestions will be pretty helpful. Thanks

Possibly. Please contact me privately via e-mail....

2014-06-25T11:46:47.777-04:00

Possibly. Please contact me privately via e-mail.

My primary reason for wanting to do post hoc power...

2014-06-24T11:12:32.836-04:00

My primary reason for wanting to do post hoc power was to retain the relationships between the variables we have. I'm having trouble simulating variables and accounting for the relationships between them. I have one categorical variable and one continuous one, and I'm interested in looking at the interaction between them. Any suggestions?

Post-hoc power assessment is fairly controversial ...

2014-06-19T11:44:44.235-04:00

Post-hoc power assessment is fairly controversial and is frowned upon by many statisticians. But it's not hard to adapt our code to simulate your setting and then to assess power to detect an interaction. I'll write up a new blog post to address this shortly, and thanks for the question

I'm interested in doing a post-hoc power test ...

2014-06-18T10:18:19.146-04:00

I'm interested in doing a post-hoc power test for a logistic regression, but I also have an interaction between continuous and categorical variables. Any suggestions on how to incorporate this into your code? Sorry, I realize this is several years after the original post, just hoping to get some advice as I've only been able to find information on assessing power for an interaction OR for a logistic regression, but not both. Thanks!

2012-11-07T13:41:33.607-05:00

This comment has been removed by the author.

Hi, Your program is very helpful, but if I would...

2012-11-07T13:40:57.011-05:00

Hi,

Your program is very helpful, but if I would like to make a restriction of proportion difference like P(Y=1|X=1)-P(Y=1|X=0) = a, is there any way to generate such data controlled by this kind of proportion difference? Any suggestion will be pretty helpful. Thanks!

With 30 observations, your power is very poor. Th...

2011-05-25T08:19:49.873-04:00

With 30 observations, your power is very poor. That's the most likely issue. Try simulating and fitting your sample of 30 one thousand times. You should find each parameter with p < .05 more (just slightly more) than 5% of the time. As you increase the sample from 30 to 300, the proportion of rejections should increase-- that's your power increasing.

Mr Ken Kleinman i used your post to generate data ...

2011-05-24T22:06:44.365-04:00

Mr Ken Kleinman
i used your post to generate data logistic. I want to generate x1,x2, and x3. x1 has 2 category (binary), x2 has 3 category, and x3 has normal distribution.
When i generate 300, it gave a good model. But when i generate 30, the parameter did't significant. Can you tell me the reason?

Right you are. Very reasonable for it to work tha...

2011-03-15T20:47:24.577-04:00

Right you are. Very reasonable for it to work that way. (I didn't try to make it work that way, obviously, and the documentation doesn't clarify that p can be a vector.)

Using your code would be both easier to understand and more "the R way."

If I am not mistaken, I think if p is a vector and...

2011-03-15T18:16:07.691-04:00

If I am not mistaken, I think if p is a vector and rbinom has its first argument is equal to the length of p, this will return a vector with each entry equal to the value in p at that index.

rbinom(length(p),1,p)

Absolutely. I thought of that last night and wond...

2011-03-15T09:23:21.351-04:00

Absolutely. I thought of that last night and wondered why didn't write it that way to begin with. I think in R you'd have to use an apply function to allow a different p for each binomial. It would look something like:

y = sapply(prob, function(p) {rbinom(1,1,p)})

So the binomial part wouls be easier to understand, but the code might be less accessible.

I see, that is excellent! The light bulb went on :...

2011-03-14T21:02:41.290-04:00

I see, that is excellent! The light bulb went on :)

I was really thrown by the comparison to the random uniform value. Could you also take a draw from a binomial distribution with p=prob instead?

Hi Brian-- ifelse() is a vectorwise logic test. ...

2011-03-14T20:39:42.782-04:00

Hi Brian--

ifelse() is a vectorwise logic test. It tests whether the i element of the runis vector is less than the i element of the prob vector. If it is, then the i element of ytest will be 1, otherwise it will be 0.

The prob vector contains the probability of the outcome being 1, given the covariate value and intercept. The ifelse() is like flipping a coin with probability of heads specified by the prob vector.

For a sample size calculation via Monte Carlo methods, you would solve prob = exp(intercept)/(1 + exp(intercept)) with prob = known proportion in the population when the covariate=0. You also need to choose the covariate distribution that best mimics your anticipation. Then simulate many times; fit a logistic regression model to each, and see what proportion of times the null is rejected. That's your power. Vary the sample size and beta to find the sample size and beta that achieve the power you need.

Great post, I am confused by ytest = ifelse(runis ...

2011-03-14T18:20:14.518-04:00

Great post, I am confused by ytest = ifelse(runis < prob,1,0). What is the logic of this step?

If you are using this for simulation to determine sample size, would one experiment with the proper intercept (versus zero) to get the probability to average around the expected probability of success in the population?