45-733 PROBABILITY AND STATISTICS I Topic #8B

29 February 2000 and 2 March 2000

- The "intuitive" decision rule that is used in Topic #8A can be given a
solid mathematical basis by appealing to the
Neyman-Pearson Theorem. In particular,
consider the following ratio of likelihood functions corresponding to the
two hypotheses in (1):

**f(x**_ Clearly, as the sample mean,_{1}, x_{2}, ... , x_{n}| m = m_{o}) ----------------------------- = K f(x_{1}, x_{2}, ... , x_{n}| m = m_{1})**X**, gets close to_{n}**m**, __{o}**K > 1**. Conversely, as**X**, gets close to_{n}**m**,_{1}**K < 1**and**K**may get quite small in magnitude. Hence, we could set up a hypothesis test using the Likelihood Ratio in the same spirit as our "intuitive" test. Namely:**f(x**_{1}, x_{2}, ... , x_{n}| m = m_{o}) If ----------------------------- < K^{*}Then Reject**H**f(x_{0}:_{1}, x_{2}, ... , x_{n}| m = m_{1})**f(x**_{1}, x_{2}, ... , x_{n}| m = m_{o}) If ----------------------------- > K^{*}Then Do Not Reject**H**f(x_{0}:_{1}, x_{2}, ... , x_{n}| m = m_{1}). Namely:*best possible test* - Neyman-Pearson Theorem

Given the following hypothesis test:

If**c**Then Do Not Reject_{0}f(x_{1}, x_{2}, ... , x_{n}| m = m_{o}) > c_{1}f(x_{1}, x_{2}, ... , x_{n}| m = m_{1})**H**_{0}:

If**c**Then Reject_{0}f(x_{1}, x_{2}, ... , x_{n}| m = m_{o}) < c_{1}f(x_{1}, x_{2}, ... , x_{n}| m = m_{1})**H**_{0}:

Then this test minimizes**c**where_{0}a + c_{1}b**c**and_{0}> 0**c**._{1}> 0 - All of the hypothesis tests that we will use are
*the best possible*in the sense that they minimize the linear combination of the**a**and**b**errors. Note that this test can be written as:**f(x**_{1}, x_{2}, ... , x_{n}| m = m_{o}) c_{1}If ------------------------------ < --- Then Reject**H**f(x_{0}:_{1}, x_{2}, ... , x_{n}| m = m_{1}) c_{2}**f(x**_{1}, x_{2}, ... , x_{n}| m = m_{o}) c_{1}If ----------------------------- > --- Then Do Not Reject**H**f(x_{0}:_{1}, x_{2}, ... , x_{n}| m = m_{1}) c_{2}**c**is the cost of making the_{0}**a**or**Type I**error, and**c**is the cost of making the_{1}**b**or**Type II**error, then this decision rule reflects the relative costs of the errors. For example, in disease testing, the cost of the**a**error is huge (telling a patient with tuberculosis that he/she is not sick) compared to the**b**error. With respect to our "intuitive" decision rule:_ If

this assymetry of cost has the effect of moving the decision point**X**then Reject_{n}> m_{0}+ c**H**_{0}:**m**to the right in order to make_{o}+ c**a**very small. - The Hypothesis Test for a Mean of a Normal Distribution against all
possible alternative values
(
**s**is known) is:^{2}

**H**_{0}: m = m_{0}

H_{1}: m ¹ m_{0}

The decision rule for this problem is:

_ _ If

Which is equivalent to:**X**or_{n}> m_{0}+ c**X**then Reject_{n}< m_{0}- c**H**_ If_{0}:**m**then Do Not Reject_{0}- c < X_{n}< m_{0}+ c**H**_{0}:

_ _ If

**(X**or_{n}- m_{0})/s/n^{1/2}> z_{a/2}**(X**then Reject_{n}- m_{0})/s/n^{1/2}< -z_{a/2}**H**_ If_{0}:**-z**then Do Not Reject_{a/2}< (X_{n}- m_{0})/s/n^{1/2}< z_{a/2}**H**_{0}: - Example:

**H**_{0}: m = 100

H_{1}: m ¹ 100

Where**s**= 400,^{2}**n**= 25,**a**= .05,**a/2**= .025, and**z**= 1.96_{.025}

_ _ Hence:

**(X**_ _ Therefore, If_{n}- 100)/20/5 = (X_{n}- 100)/4**(X**or_{n}- 100)/4 > 1.96**(X**Then Reject_{n}- 100)/4 < -1.96**H**_{0}: - The Hypothesis Test for a Mean of a Normal Distribution against all
possible alternative values where
**s**is unknown and a large sample (n > 30) is:^{2}

**H**_{0}: m = m_{0}

H_{1}: m ¹ m_{0}

The decision rule for this problem is:

_ _ If

**(X**or_{n}- m_{0})/s/n^{1/2}> z_{a/2}**(X**then Reject_{n}- m_{0})/s/n^{1/2}< -z_{a/2}**H**_ If_{0}:**-z**then Do Not Reject_{a/2}< (X_{n}- m_{0})/s/n^{1/2}< z_{a/2}**H**_{0}: - The Hypothesis Test for a Mean of a Normal Distribution against all
possible alternative values where
**s**is unknown and a small sample (n < 30) is:^{2}

**H**_{0}: m = m_{0}

H_{1}: m ¹ m_{0}

The decision rule for this problem is:

_ _ If

**(X**or_{n}- m_{0})/s/n^{1/2}> t_{a/2}**(X**then Reject_{n}- m_{0})/s/n^{1/2}< -t_{a/2}**H**_ If_{0}:**-t**then Do Not Reject_{a/2}< (X_{n}- m_{0})/s/n^{1/2}< t_{a/2}**H**_{0}: - Problem 10.6 p.423.
Perform the following hypothesis test:

**H**"The average hourly wage of the 40 Workers is equal to population mean of $13.20"_{0}:

**H**"The average hourly wage of the 40 Workers is less than the population mean of $13.20"_{1}:

Or, stated more formally:

**H**_{0}: m = 13.20

**H**_{1}: m < 13.20

_ Where we are given that

This is known as a**n**= 40,**X**= 12.20,_{n}**s**= 2.50, and**a**= .01

in that we are only testing the simple null hpothesis against all other possible values less than the value. In this case the generic test is:*one-tail test*

_ If

Hence, since -2.53 < -2.33 we Reject**(X**then Reject_{n}- m_{0})/s/n^{1/2}< -z_{a}**H**_{0}:**-z**= -2.33 __{.01}**(X**= (12.20 - 13.20)/2.50/40_{n}- m_{0})/s/n^{1/2}^{1/2}= -2.53

**H**_{0}: - An alternative perspective on Hypothesis Testing is to compute the
P-Value corresponding to the test statistic. For
the example shown above (9):

**P-Value = F(-2.53)**= .0057

Which we can literally interpret as follows: in 57 of 10,000 random samples of size 40 from this distribution, we would obtain a sample mean of 12.20 or lower. Note that this is an**a**probability of .0057!

**In EViews you can calculate this probability by using the command:**

Scalar pval=@CNORM(-2.53)

"pval" then appears in the variables window of EViews. If you then double-click on pval the value appears at the bottom of the window.

- The Hypothesis Test for a proportion against all
possible alternative values (large sample size only, n > 50).

**H**_{0}: p = p_{0}

H_{1}: p ¹ p_{0}

The decision rule for this problem is:

If_{Ù}**(p - p**or_{0})/[p_{0}(1 - p_{0})/n]^{1/2}> z_{a/2}_{Ù}**(p - p**then Reject_{0})/[p_{0}(1 - p_{0})/n]^{1/2}< -z_{a/2}**H**_{0}:If_{Ù}**-z**then Do Not Reject_{a/2}< (p - p_{0})/[p_{0}(1 - p_{0})/n]^{1/2}< z_{a/2}**H**_{0}: - Problem 10.14 p.424. We are given

**H**_{0}: p = .45

H_{1}: p ¹ .45

Where**a**= .01,**a/2**= .005,**z**= 2.58,_{.005}**n**= 80, and

Hence_{Ù}p = 32/80 = .40

Test Statistic = (.40 - .45)/[(.45*.55)/80]^{1/2}= -.90

Because -2.58 < -.90 < 2.58, Do not Reject**H**._{0}:

The P-Value corresponding to the test statistic of -.90 is computed as:

**P-Value (Two Tail) = F(-.90) + 1 - F(.90) = .3682**

- The Hypothesis Test for the Variance of a Normal Distribution

**H**_{0}: s^{2}= s_{0}^{2}

H_{1}: s^{2}= s_{1}^{2}> s_{0}^{2}

The decision rule for this problem is:

If**(n-1)s**Then Reject^{2}/ s_{0}^{2}> C_{2}**H**_{0}:

Where**P(c**^{2}_{n-1}> C_{2}) = a - Example:

**H**_{0}: s^{2}= 400

H_{1}: s^{2}= 900

Where**n**= 25,**s**= 462.5,^{2}**a**= .05. Hence, with**c**, then^{2}_{24,.05}**C**= 36.4151._{2}

Test Statistic = (24*462.5)/400 = 27.75 < 36.4151, so we**Do Not Reject H**_{0}:

**To get the P-Value for this test use the EViews command:**

Scalar Pval=@Chisq(27.75,24)

"Pval" will appear in the variables window. Double-click on Pval and the probability will appear at the bottom of the window.

- The Hypothesis Test for a Variance of a Normal Distribution against all
possible alternative values

**H**_{0}: s^{2}= s_{0}^{2}

H_{1}: s^{2}¹ s_{0}^{2}

The decision rule for this problem is:

If**(n-1)s**or^{2}/ s_{0}^{2}> C_{2}**(n-1)s**Then Reject^{2}/ s_{0}^{2}< C_{1}**H**_{0}:

Where**P(c**and^{2}_{n-1}> C_{2}) = a/2**P(c**^{2}_{n-1}< C_{1}) = a/2 - Example:

**H**_{0}: s^{2}= 400

H_{1}: s^{2}¹ 400

Where**n**= 101,**s**= 631.266,^{2}**a**= .05,**a/2**= .025. Hence, with**c**, then^{2}_{100,.025}

**C**= 74.2219 and_{1}**C**= 129.561._{2}

Test Statistic = (100*631.266)/400 = 157.816 > 129.561, so we**Reject H**_{0}:

The**P-Value**for this test is:**2*P(c**^{2}_{100}> 157.816) = .0004

Because the Chi-Square distribution is not symmetric, the "solution" for the two-tail P-Value is to find the relevant tail above/below the test statistic and multiply it by 2.

- Problem 10.72 p.455. Perform the following hypothesis test:

**H**"The variance of the aptitude test scores is 100"_{0}:

**H**"The variance of the aptitude test scores is greater than 100"_{1}:

Or, stated more formally:

**H**_{0}: s^{2}= 100

**H**_{1}: s^{2}> 100

Where we are given that**n**= 20,**s**= 144, and^{2}**a**= .01

This is known as ain that we are only testing the simple null hpothesis against all other possible values greater than the value. In this case the generic test is:*one-tail test*

If**(n-1)s**Then Reject^{2}/ s_{0}^{2}> C_{2}**H**_{0}:

Hence, with**c**, then^{2}_{19,.01}**C**= 36.1908. Hence:_{2}

Test Statistic = (19*144)/100 = 27.36 < 36.1908, so we**Do Not Reject H**_{0}:

The**P-Value**for this test is:**P(c**^{2}_{19}> 27.36) = .09654

- Hypothesis Test for Difference in the Means of two Separate
Normally Distributed Populations with Large Sample Size (
**n**> 30 and**m**> 30)

**H**_{0}: m_{1}= m_{2}

H_{1}: m_{1}¹ m_{2}

or

**H**_{0}: m_{1}- m_{2}= 0

H_{1}: m_{1}- m_{2}¹ 0

_ _ Where

**(X**, or, using the Central Limit Theorem _ __{n}- Y_{m}) ~ N(m_{1}- m_{2}, s_{1}^{2}/n + s_{2}^{2}/m)**(X**The decision rule for this problem is: _ _ If_{n}- Y_{m}) ~ N(m_{1}- m_{2}, s_{1}^{2}/n + s_{2}^{2}/m)**(X**or _ __{n}- Y_{m})/(s_{1}^{2}/n + s_{2}^{2}/m)^{1/2}> z_{a/2}**(X**then Reject_{n}- Y_{m})/(s_{1}^{2}/n + s_{2}^{2}/m)^{1/2}< -z_{a/2}**H**_ _ If_{0}:**-z**then Do Not Reject_{a/2}< (X_{n}- Y_{m})/(s_{1}^{2}/n + s_{2}^{2}/m)^{1/2}< z_{a/2}**H**_{0}: - Example: Given
**n**= 50,_{1}**n**= 50,_{2}**s**= 5.2,_{1}**s**= 4.3_{2}_ _

Where**X**= 11.5,_{n}**Y**= 13.0_{m}**a**= .05,**a/2**= .025, and**z**= 1.96_{.025}

Test Statistic: (11.5 - 13.0)/(5.2^{2}/50 + 4.3^{2}/50)^{1/2}= -1.572.

Since -1.96 < -1.572 < 1.96, Do Not Reject**H**_{0}:

**P-Value (Two Tail) = F(-1.572) + 1 - F(1.572) = .1160**

- Hypothesis Test for Difference in the Means of two Separate
Normally Distributed Populations with Small Sample Size (
**n**< 30 and**m**< 30)

**H**_{0}: m_{1}- m_{2}= 0

H_{1}: m_{1}- m_{2}¹ 0

Here weassume that the variances of the two populations are the same. Namely:*must*

**s**_{1}^{2}= s_{2}^{2}

Our sample variance is computed by pooling the sum of squares of the two random samples. That is:

**s**^{2}= [(n - 1)s_{1}^{2}+ (m - 1)s_{2}^{2}]/(n + m - 2)

_ _ So that:

**(X**The decision rule for this problem is: _ _ If_{n}- Y_{m})/[s(1/n + 1/m)^{1/2}] ~ t_{n+m-2}**(X**or _ __{n}- Y_{m})/[s(1/n + 1/m)^{1/2}] > t_{a/2}**(X**then Reject_{n}- Y_{m})/[s(1/n + 1/m)^{1/2}] < -t_{a/2}**H**_ _ If_{0}:**-t**then Do Not Reject_{a/2}< (X_{n}- Y_{m})/[s(1/n + 1/m)^{1/2}] < t_{a/2}**H**_{0}: - Problem 10.62 p.446

**H**_{0}: m_{1}- m_{2}= 0, There is no effect.

H_{1}: m_{1}- m_{2}> 0, There is an effect.

Given**n**= 7,_{1}**n**= 7,_{2}**s**= .32,_{1}**s**= .32_{2}_ _

Where**X**= 1.26,_{n}**Y**= .78_{m}**a**= .05, and**t**= 1.782_{.05,12}

**s**= (6*.32^{2}^{2}+ 6*.32^{2})/12 = .1024

Test Statistic: (1.26 - .78)/[.1024(1/7 + 1/7)]^{1/2}= 2.806 > 1.782, So we Reject**H**._{0}:

The**P-Value**for this test is:**P(T**_{12}> 2.806) = .0079

**To get the P-Value using EViews use the command:**

Scalar Pval=@Tdist(2.806,12)

Double-click on "Pval" and the*two-tail*P-Value will appear at the bottom of the window. In this instance it is .015867. Divide this by 2 to get the correct P-Value for a one-tail test.

- Problem 10.105 p.473

**H**_{o}: m_{1}- m_{2}= 0, There is no difference between the furnaces.

H_{1}: m_{1}- m_{2}¹ 0, There is a difference.

_ _ We are given

So that**n**= 8,**m**= 6,**X**= 73.125,_{n}**Y**= 77.667,_{m}**s**= 9.554,_{1}^{2}**s**= 10.667_{2}^{2}**s**= (7*9.554 + 5*10.667)/12 = 10.018^{2}

Test Statistic: (73.125 - 77.667)/[10.018(1/8 + 1/6)]^{1/2}= -2.657

**P-Value (Two Tail) = P(T**_{12}< -2.657) + P(T_{12}> 2.657) = .0209

- The Hypothesis Test for the Equivalence of proportions from two
populations (large sample size only, n > 50, m > 50).

**H**_{o}: p_{1}- p_{2}= 0

H_{1}: p_{1}- p_{2}¹ 0

The decision rule for this problem is:_{Ù}_{Ù}_{Ù}_{Ù}_{Ù}If_{Ù}**[(p**or_{1}- p_{2}) - (p_{1}- p_{2})]/{[p_{1}(1 - p_{1})/n] + [p_{2}(1 - p_{2})/m]}^{1/2}> z_{a/2}_{Ù}_{Ù}_{Ù}_{Ù}_{Ù}_{Ù}**[(p**then Reject_{1}- p_{2}) - (p_{1}- p_{2})]/{[p_{1}(1 - p_{1})/n] + [p_{2}(1 - p_{2})/m]}^{1/2}< -z_{a/2}**H**_{o}:_{Ù}_{Ù}_{Ù}_{Ù}_{Ù}If_{Ù}**-z**then Do Not Reject_{a/2}< [(p_{1}- p_{2}) - (p_{1}- p_{2})]/{[p_{1}(1 - p_{1})/n] + [p_{2}(1 - p_{2})/m]}^{1/2}< z_{a/2}**H**_{o}:_{1}= p_{2}, we can pool the two samples to get a better estimate of the variance:

Then this produces the decision rule:_{Ù}_{Ù}_{Ù}np_{1}+ mp_{2}p = --------- n + m_{Ù}_{Ù}_{Ù}If_{Ù}**[(p**or_{1}- p_{2}) - (p_{1}- p_{2})]/[p(1 - p)(1/n + 1/m)]^{1/2}> z_{a/2}_{Ù}_{Ù}_{Ù}_{Ù}**[(p**then Reject_{1}- p_{2}) - (p_{1}- p_{2})]/[p(1 - p)(1/n + 1/m)]^{1/2}< -z_{a/2}**H**_{o}:_{Ù}_{Ù}_{Ù}If_{Ù}**-z**then Do Not Reject_{a/2}< [(p_{1}- p_{2}) - (p_{1}- p_{2})]/[p(1 - p)(1/n + 1/m)]^{1/2}< z_{a/2}**H**_{o}: - Problem 10.16 p.424.

- We are given

**H**_{o}: p_{1}- p_{2}= 0, Aspirin Use the Same in Both Years

H_{1}: p_{1}- p_{2}¹ 0, Aspirin Use Differs

Where**a**= .05,**a/2**= .025,**z**= 1.96,_{.025}**n = m**= 1000, and

Hence_{Ù}_{Ù}p_{1}= .45**p**_{2}= .34

1st Test Statistic = (.45 - .34)/{[(.45*.55)/1000][(.34*.66)/1000]}^{1/2}= 5.064

Combining the Samples:

2nd Test Statistic = (.45 - .34)/[(.395*.605)/(1/1000 + 1/1000)]_{Ù}_{Ù}_{Ù}np_{1}+ mp_{2}1000*.45 + 1000*.34 p = ---------- = -------------------- = .395 n + m 1000 + 1000^{1/2}= 5.032

In both tests we Reject**H**._{o}:

The P-Values corresponding to the two test statistics are:

**P-Value (Two Tail 1st test) = F(-5.064) + 1 - F(5.064) = .000000411**

**P-Value (Two Tail 2nd test) = F(-5.032) + 1 - F(5.032) = .000000486**

- We are given

**H**_{o}: p_{1}- p_{2}= 0, Ibuprofen Use has not Incresed

H_{1}: p_{1}- p_{2}< 0, Ibuprofen Use Has Increased

Where**a**= .05,**z**= 1.645,_{.05}**n = m**= 1000, and

Hence_{Ù}_{Ù}p_{1}= .14**p**_{2}= .26

1st Test Statistic = (.14 - .26)/{[(.14*.86)/1000][(.26*.74)/1000]}^{1/2}= -6.785

Combining the Samples:

2nd Test Statistic = (.14 - .26)/[(.2*.8)/(1/1000 + 1/1000)]_{Ù}_{Ù}_{Ù}np_{1}+ mp_{2}1000*.14 + 1000*.26 p = ---------- = -------------------- = .2 n + m 1000 + 1000^{1/2}= -6.708

In both tests we Reject**H**._{o}:

The P-Values corresponding to the two test statistics are:

**P-Value (One Tail 1st test) = F(-6.785) = .00000000000584**

**P-Value (One Tail 2nd test) = F(-6.708) = .00000000000992**

- The tests in a) and b) are related because as aspirin use goes
down, Ibuprofin use must certainly increase (not all the increase is due to
Acetaminophen.

- We are given
- The Hypothesis Test for the Equivalence of the Variances from Two
Normally Distributed Populations

**H**_{o}: s_{1}^{2}= s_{2}^{2}

H_{1}: s_{1}^{2}¹ s_{2}^{2}

Here our test statistic is the ratio of the two sample variances which is known to have an**F-Distribution**. Specifically:

**s**_{1}^{2}/s_{2}^{2}~ F_{n-1,m-1 df}

Where the F-Distribution has n-1 degrees of freedom associated with the numerator, and m-1 degrees of freedom associated with the denominator.

The decision rule for this problem is:

If**s**or_{1}^{2}/s_{2}^{2}> K_{2}**s**Then Reject_{1}^{2}/s_{2}^{2}< K_{1}**H**_{o}:

Where**P(F**and_{n-1,m-1}> K_{2}) = a/2**P(F**_{n-1,m-1}< K_{1}) = a/2

- The
**K**values are given in the F Table on pages 734-743 of MWS. To get the_{2}**K**values, reverse the order of the degrees of freedom and take the reciprocal in the table. That is:_{1}

**K**and_{2}= F_{n-1,m-2, a/2}**K**_{1}= 1/F_{m-1,n-2, a/2}

- Problem 10.73 p.455. Perform the following hypothesis test:

**H**"The Variance of the DDT samples for the Juvenile Brown Pelicans_{o}:

is the same as that for the Nestlings"

**H**"The Variance of the DDT sample for the Juveniles is greater than the_{1}:

Variance for the Nestlings"

Or, stated more formally:

**H**_{o}: s_{1}^{2}= s_{2}^{2}

**H**_{1}: s_{1}^{2}> s_{2}^{2}

Where we are given that**n**= 10,_{1}**n**= 13,_{2}**s**= .017, and_{1}**s**= .006,_{2}**a**= .01

This is ain that we are only testing the simple null hpothesis against all other possible values greater than the value. In this case the generic test is:*one-tail test*

If**s**Then Reject_{1}^{2}/s_{2}^{2}> K_{2}**H**_{o}:

Here**F**= 2.80_{9,12,.05}

The test statistic is: (.017)^{2}/(.006)^{2}= 8.03 > 2.80 so we Reject**H**_{o}:

The**P-Value = P(F**_{9,12}> 8.03) = .00072

- However, note that there is a fundamental ambiguity here. Namely, which
population do we treat as population 1?! It is clearly an
choice. Consequently, many practitioners have adopted the quite reasonable position that*arbitrary*with the larger of the two sample variances used in the numerator of the test statistic so that the test statistic is*all F-Tests Should Be One-Tail*. Note that, for a fixed*always greater than one***a**, this makes it more likely that the null hypothesis will be. This is a judgement call on the part of the practitioner. A neutral approach is to compute the*rejected***P-Value**associated with the one-tail of the test statistic and interpret it as either**a**or**a/2**depending upon the substantive circumstances of the test.

- Problem 10.103 p.473. Perform the following hypothesis test:

**H**_{o}: s_{1}^{2}= s_{2}^{2}

**H**_{1}: s_{1}^{2}¹ s_{2}^{2}

Where we are given that**n**= 10,_{1}**n**= 10,_{2}**s**= .273, and_{1}^{2}**s**= .094,_{2}^{2}**a**= .1,**a/2**= .05,

**K**= 3.18 and_{2}= F_{9,9,.05}**K**= 1/3.18 = .3145_{1}= 1/F_{9,9,.05}

The test statistic is: .273/.094 = 2.904. Hence, since

.3145 < 2.904 < 3.18 Do Not Reject**H**_{o}:

Note that if we had performed a one-tail test for this problem with**a**= .1, we would! In this case:*reject H*_{o}:

**K**= 2.44_{2}= F_{9,9,.10}

Since 2.904 > 2.44 we would**reject H**_{o}:

The**P-Value = P(F**_{9,9}> 2.904) = .06403