45-733 PROBABILITY AND STATISTICS I Topic #7C


22 February 2000, 24 February 2000



Confidence Intervals

  1. Sampling From a Normal Distribution With unknown m and known s2.
         The basic form of the confidence interval is:
       _
    P[|Xn - m| < c] = 1 - a, or
      _            _
    P[Xn - c < m < Xn + c] = 1 - a, where 
    a = .01 or .05 or .10 typically.
    Note that the random variable here is a line length.  The proper 
    interpretation of it is that, in the long run, 1 - a 
                                            _
    percent of the time the interval around Xn will contain m.
                                               _
    Also note that, once we insert a value for Xn we no 
    longer have a random variable -- we have a defined interval.  
    Hence, we say that we are "1 - a confident 
    that the true mean is in the interval."
    
  2. To find the value of c:
       _                     _
    P[|Xn - m| < c] = P[-c < Xn - m < c] = 
                 _
    P[-cn1/2/s < (Xn - m)/s/n1/2 < cn1/2/s] = 
    P[-cn1/2/s < Z < cn1/2/s] = F(cn1/2/s) - F(-cn1/2/s) = 
    F(za/2) - F(-za/2) = 1 - a
    Hence:  za/2 = c/s/n1/2  and  c = za/2s/n1/2
        
    Which gives us our confidence interval:
      _                    _
    P[Xn - za/2s/n1/2 < m < Xn + za/2s/n1/2] = 1 - a
    
  3. Example: Suppose we take a random sample of 25 from N(m, 1). Construct a 95% confidence interval for m.
         We are given that a = .05, so that a/2 = .025 and z.025 = 1.96. Hence
         za/2 s/n1/2 = (1.96*1)/5 = .392, so the interval is:
         _
         Xn ± .392
  4. Sampling From a Normal Distribution With unknown m and unknown s2 with large sample size (n > 30).
         In this case we simply substitute s2 for s2 by appeal to the Central Limit Theorem and
         obtain the confidence interval:
      _                    _
    P[Xn - za/2s/n1/2 < m < Xn + za/2s/n1/2] = 1 - a
    
  5. Suppose we have a large order of bolts delivered to our factory. We are concerned about the precision with which these bolts have been machined. In particular, we want to construct 95% confidence limits for the true mean length of the bolts. Assume that the length of the bolts is normally distributed.
                            _
    We are given:  n = 500, Xn = 6.1cm, s = .1cm, 
    1 - a = .95, a = .05, a/2 = .025, 
    Hence, z.025 = 1.96
    So the confidence limits are: 6.1 ± (1.96*.1)/5001/2 or

    (6.091, 6.109)

    We are 95 percent confident that the true mean length of the bolts is in the interval.

  6. Large Sample (n > 50) Confidence Interval for proportions.
         With large sample sizes, we can appeal to the Central Limit Theorem and assume:
    Ù        Ù     Ù
    p ~ N[p, p(1 - p)/n] so that the confidence interval is:
    Ù Ù Ù Ù Ù Ù P{p - za/2[p(1 - p)/n]1/2 < p < p + za/2[p(1 - p)/n]1/2} = 1 - a
  7. Problem 8.47 p.346
                             Ù
    We are given:  n = 1506, p = .73, 
    1 - a = .95, a = .05, a/2 = .025.  
    Hence, z.025 = 1.96
    
    So the confidence limits are: .73 ± 1.96[(.73*.27)/1506]1/2
    .73 ± .0224 or

    (.7076, .7524)

    We are 95 percent confident that the true proportion is in the interval.

  8. Confidence Interval for s2 when the random sample is drawn from a Normal Distribution.
    Ideally, the confidence interval would be built around the probability distribution for s2 -- our unbiased estimator for s2. Unfortunately, this distribution is not so easily used. However, the distribution of (n-1)s2/ s2 is known to be Chi-Square with n-1 degrees of freedom.

    Since the Chi-Square is an asymmetric distribution we define c1 to be a point below which a/2 of the probability lies, and define c2 be a point above which a/2 of the probability lies. Hence:
    P[c1 < (n-1)s2/ s2 < c2] = P[(n-1)s2/c2 < s2 < (n-1)s2/c1] = 1 - a

  9. Problem 8.79 p.362
    We are given: n = 6, df = n - 1 = 5,
    1 - a = .99, a = .1, a/2 = .05. Hence, c1 = 1.145476 and c2 = 11.0705
    s2 = .502667

    (n-1)s2/c2 = (5*.502667)/11.0705 = .227, and
    (n-1)s2/c1 = (5*.502667)/1.145476 = 2.194

    Hence: (.227, 2.194)

    We are 90 percent confident that s2 is in the interval.

  10. Sampling From a Normal Distribution With unknown m and unknown s2 with small sample size (n < 30).
    In this case we build our confidence interval from the t distribution. In particular,
     _
    (Xn - m)/s/n1/2 ~ tn-1
    So that we can write the confidence interval as:
      _                   _
    P[Xn - ta/2s/n1/2 < m < Xn + ta/2s/n1/2] = 1 - a
    
  11. Problem 8.68 p.358
         We are given: n = 20, s = 57, 1 - a = .9, a = .1, a/2 = .05. Hence, t.05,19df = 1.729
    1. Confidence Limits: 419 ± (1.729*57)/201/2 = 419 ± 22.04
               or (396.96, 441.04)
    2. Yes, the population mean is in the interval. All values in the interval have 90% confidence.
    3. Confidence Limits: 455 ± (1.729*69)/201/2
               or (428.33, 481.67)
  12. Confidence Interval for the Difference Between Two Means when sampling from two separate, independent, Normal Distributions with known variances.
    Here we use the same technique as the the testing problem discussed in notes #10 (1). Namely, let the distributions of the sample means be:
    _                    _
    Xn ~ N[mx, sx2/n] and Ym ~ N[my, sy2/m]
           _    _
    Then:  Xn - Ym ~ N[mx - my, sx2/n + sy2/m]
         
    And the confidence interval is:
      _    _                                    _   _
    P{Xn - Ym - za/2[sx2/n + sy2/m]1/2 < mx - my < Xn - Ym + za/2[sx2/n + sy2/m]1/2} 
    = 1 - a
    
  13. Confidence Interval for the Difference Between Two Means when sampling from two separate, independent, Normal Distributions with unknown variances but large (n > 30 and m > 30) sample sizes.
         Here the confidence interval is the same as in (3) but sx2 and sy2 are used in the formula. Namely:
      _    _                                    _   _
    P{Xn - Ym - za/2[sx2/n + sy2/m]1/2 < mx - my < Xn - Ym + za/2[sx2/n + sy2/m]1/2} 
    = 1 - a
    


  14. Problem 8.52 p.347
    1.                                  _           _
      We are given:  n = 252, m = 307, Xn = 11.48, Ym = 13.21,
      sx = 5.69, sy = 5.31,  
      1 - a = .95, a = .05, a/2 = .025.
      Hence, z.025 = 1.96
      
      Our confidence limits are:
      11.48 - 13.21 ± 1.96[5.692/252 + 5.312/307]1/2 =
      -1.73 ± .92

      Which produces the interval, (-2.65, -.81)

    2.                                  _           _
      We are given:  n = 252, m = 307, Xn = 22.05, Ym = 25.96,
      sx = 5.12, sy = 5.07,  
      1 - a = .90, a = .10, a/2 = .05.
      Hence, z.05 = 1.645
      
      Our confidence limits are:
      22.05 - 25.96 ± 1.645[5.122/252 + 5.072/307]1/2 =
      -3.91 ± .71

      Which produces the interval, (-4.62, -3.20)

    3. Note that both intervals do not include 0. Hence, we are 95 and 90 percent confident respectively, that there is a significant difference between men and women on these two scales.

  15. Confidence Interval for the Difference Between Two Proportions with large (n > 50 and m > 50) sample sizes.
         Here we appeal to the Central Limit Theorem to write:
    Ù    Ù              Ù      Ù       Ù      Ù
    p1 - p2 ~ N[p1 - p2, p1(1 - p1)/n + p2(1 - p2)/m]
    And the confidence limits can be computed from:
    Ù    Ù        Ù      Ù       Ù      Ù
    p1 - p2 ± za/2[p1(1 - p1)/n + p2(1 - p2)/m]1/2
    
  16. Problem 8.49 p.347
                   Ù         Ù
    We are given:  p1 = .19, p2 = .70, n = 1250, m = 1251, 
    1 - a = .9, a = .1, a/2 = .05.
    Hence, z.05 = 1.645.
    
    The confidence limits are:

    .19 - .70 ± 1.645[(.19*.81)/1250 + (.7*.3)/1251] = -.51 ± .028

    The interval, (-.538, -.482), is well below 0 so we are 95% confident, based upon this evidence, that there was a change of opinion between the two periods.

  17. Problem 8.50 p.347
                   Ù         Ù
    We are given:  p1 = .67, p2 = .90, n = 1250, m = 1251, 
    1 - a = .98, a = .02, a/2 = .01.
    Hence, z.01 = 2.33.
    
    The confidence limits are:

    .67 - .90 ± 2.33[(.67*.33)/1250 + (.90*.1)/1251] = -.23 ± .0368

    The interval, (-.2668, -.1932), is well below 0 so we are 98% confident, based upon this evidence, that there was a change of opinion regarding smoke detectors between the two periods.

  18. Confidence Interval for the Difference Between Two Means when sampling from two separate, independent, Normal Distributions with unknown variances and small (n < 30 and m < 30) sample sizes.

    Here we must assume that sx2 = sy2 and use this assumption to combine the two sample sum of squares to obtain s2. Namely:

    s2 = [(n - 1)(sx)2 + (m - 1)(sy)2]/(n + m - 2)

    The Confidence Interval is:
      _    _                                  _   _
    P{Xn - Ym - ta/2s(1/n + 1/m)1/2 < mx - my < Xn - Ym + ta/2s(1/n + 1/m)1/2} = 
    1 - a
    
  19. Problem 8.71 p.359
                   _        _
    We are given:  Xn = 11, Ym = 20, 
    n = 16, m = 20, sx = 6, sy = 8, 
    1 - a = .95, a = .05, a/2 = .025.
    Hence, t.025, 34df = 1.96
    
    Pooling the sample sums of squares:
         s2 = (15*36 + 19*64)/34 = 51.647
         The confidence limits are:
         11 - 12 ± 1.96[51.647(1/16 + 1/20)]1/2 = -1 ± 4.72
         For an interval of: (-5.72, 3.72)