## Friday, April 29, 2011

### Example 8.36: Quadratic equation with real roots

We often simulate data in SAS or R to confirm analytical results. For example, consider the following problem from the excellent text by Rice:

Let U1, U2, and U3 be independent random variables uniform on [0, 1]. What is the probability that the roots of the quadratic U1*x^2 + U2*x + U3 are real?

Recall that for a quadratic equation A*x^2 + B*x + C to have real roots we need the discriminant (B^2-4AC) to be non-negative.

The answer given in the second and third editions of Rice is 1/9. Here's how you might get there:

Since, B^2 > 4*A*C <=> B > 2*sqrt(A*C), we need to integrate B over the range 2*sqrt(a*c) to 1, then integrate over all possible values for A and C (each from 0 to 1).

Another answer can be found by taking y = b^2 and w = 4ac and integrating over their joint distribution (they're independent, of course). That leads to an answer of approximately 0.254. Here's how to calculate this in R:
`f = function(x) {  A = x; B = x; C = x;  return(as.numeric(B^2 > 4*A*C))}library(cubature)adaptIntegrate(f, c(0,0,0), c(1,1,1), tol=0.0001, max=1000000)`

which generates the following output:
`\$integral 0.2543692\$error 0.005612558\$functionEvaluations 999999\$returnCode -1`

We leave the details of the calculations aside for now, but both seem equally plausible, at first glance. A quick simulation can suggest which is correct.

For those who want more details, here's a more complete summary of this problem and solution.

SAS

Neither the SAS nor the R code is especially challenging.

`data test;  do trial = 1 to 10000;    u1 = uniform(0); u2 = uniform(0); u3 = uniform(0);    res = u2**2 - 4*u1*u3;    realroot = (res ge 0);    output;  end;run;proc print data=test (obs=10); run;proc means data=test;  var realroot;run;`

Leading to the result:
`                        The MEANS Procedure                   Analysis Variable : realroot     N           Mean        Std Dev        Minimum        Maximum ----------------------------------------------------------------- 10000      0.2556000      0.4362197              0      1.0000000 -----------------------------------------------------------------`

R

`numsim = 10000u1 = runif(numsim); u2 = runif(numsim); u3 = runif(numsim)res = u2^2 - 4*u1*u3realroots = res>=0table(realroots)/numsim`

With the result
`realrootsFALSE  TRUE 0.747 0.253 `

The simulation demonstrates both that the first solution is incorrect. Here the simulation serves as a valuable check for complicated analysis.

Insights into where the 1/9 solution fails would be welcomed in the comments.

#### 3 comments:

Anonymous said...

FYI - Prof. Rice has posted an erratum for this problem (http://www.stat.berkeley.edu/~rice/Book3ed/index.html) which lists the corrected answer as

5/36 + log(2)/6

instead of the 1/9 printed on p. A34 of the book. This corrected answer agrees with the analytical result you derived in your detailed writeup.

Great example of why you should always double-check your results. Thanks for sharing!

Sarah Anoke said...

I really enjoyed the use of simulations to complement the analytical solution. Definitely something I will keep in my statistical toolbox.

Anonymous said...

I know I'm late to the game, but when I originally did the problem, I came up with the answer of 1/9. This is wrong and it's easy to see if you use the trick used in Casella and Berger exercise 4.53.

A,B,C are uniform[0,1] so -log(A), -log(B), -log(C) are exponential(1). And so, B^2 >= 4*A*C is the same as -2log(B) <= -log(4) -log(B) - log(C).

Let X = -2log(B), so X is exponential(2) and let Y = -log(B) - log(C), so Y is gamma(2,1). So now we are solving X <= -log(4) + Y. This is a simple double integral (easy b/c X and Y are independent).

You get 1/9 if you incorrectly specify the limits of integration of Y to be [0,infinity). Y has to be greater than log(4) since X <= -log(4) + Y. So I am assuming Rice did the same thing I did to get 1/9.