45-734 Probability and Statistics II (4th Mini AY 1997-98 Flex-Mode and Flex-Time)

Assignment #5: Due 23 April 1998

  1. In this first problem we are going to look at voting for President in the 1992 and 1996 elections. The data are in:


    There are 407 observations in the dataset corresponding to the 407 Congressional Districts that were not redistricted between 1993 and 1997 (recall that there are 435 members of the U.S. House of Representatives). These are the districts created after the 1990 Census and first implemented in the 1992 elections. Subsequently, various Federal Court rulings invalidated the district boundaries of 28 districts. This leaves us with 407 districts that are the same for the 1992 and 1996 Presidential elections. By law, the population in a congressional district must be as close as possible to being equal to: (total population of United States)/435. Hence, in the analyses below -- for purposes of interpreting coefficients -- you can assume that population is uniformly distributed over the 407 congressional districts.

    The variables in the dataset are: AFRAM, percent African-American population in the Congressional district; BUSH88, percent voting for George Bush for President in 1988; BUSH92, percent voting for George Bush for President in 1992; CLINT92, percent voting for Bill Clinton for President in 1992; CLINT96, percent voting for Bill Clinton in 1996; DOLE96, percent voting for Bob Dole in 1996; DUK88, percent voting for Michael Dukakis for President in 1988; HISP, percent Hispanic; INCOME, median income in the district in Thousands of 1996 dollars; LCECON103, for members of the 103rd House (1993-94), a measure of liberalism/conservatism on economic issues that ranges from -1 (liberal) to +1 (conservative); LCECON104, liberalism/conservatism on economic issues for members of the 104th House (1995-96); LCSOC103, for members of the 103rd House, a measure of liberalism/conservatism on social issues (e.g., abortion, gay marriage, etc.) that ranges from -1 (liberal) to +1 (conservative); LCSOC104, liberalism/conservatism on social issues for members of the 104th House; PEROT92, percent voting for Ross Perot for President in 1992; PEROT96, percent voting for Ross Perot in 1996; REP103, REP104, and REP105 are indicator variables that equal 1 if the representative of the district is a Republican, and 0 if the representative is a Democrat; and SOUTH, an indicator variable that is equal to 1 if the Congressional District is in a Southern state (the 11 States of the Confederacy plus Kentucky and Oklahoma).

    Note that there are three types of variables: 1) the percentage vote for the various Presidential candidates (BUSH88, BUSH92, CLINT92, CLINT96, DOLE96, DUK88, PEROT92, PEROT96, these will be our dependent variables); 2) demographic variables for the Congressional District (AFRAM, HISP, INCOME, SOUTH); and 3) variables measuring personal characteristics (ideology and party) of the district's representative in the House of Representatives (LCECON103, LCECON104, LCSOC103, LCSOC104, REP103, REP104, REP105).

    The basic theory we are going to test is that the presidential vote is a function of demographics, ideology, and political party. It is well known that in American politics, ceteris paribus, African-Americans, (non-Cuban) Hispanics, economic liberals, and social liberals tend to favor candidates of the Democratic party; people with higher incomes, Southerners, economic conservatives, and social conservatives tend to favor the Republican party.

    1. Test the basic theory on the 1992 Presidential election. Run separate regressions for Clinton, Bush, and Perot (use LCECON103 and LCSOC103 for the ideology variables, do not use party indicator variables) and interpret the coefficients. Compare the Perot coefficients with those for Clinton and Bush. What sort of voters did Perot draw his support from and who did he hurt more -- Clinton or Bush? What do the relative magnitudes of the ideological coefficients tell you about American politics?

    2. Test the basic theory on the 1996 Presidential election. Run separate regressions for Clinton, Dole, and Perot (use LCECON104 and LCSOC104 for the ideology variables, do not use party indicator variables) and interpret the coefficients. Compare the coefficents with those for 1992 (Clinton vs. Clinton, Bush vs. Dole, Perot vs. Perot). Compare the Perot coefficients with those for Clinton and Dole. What sort of voters did Perot draw his support from and who did he hurt more -- Clinton or Dole?

    3. The confrontation between the Congressional Republicans and President Clinton that shut down the government off and on from November of 1995 to January 1996 is alleged to have hurt the Republicans politically. In the 1996 elections the Republican majority in the House was cut from 236 Seats to 228 Seats (218 are needed for control). For this sample of 407 House districts, 20 districts switched from Republican to Democrat in the 1996 elections and 12 districts switched from Democrat to Republican. Create indicator variables for congressional districts that switched parties due to the 1996 elections (one for seats that switched Republican to Democrat; and one for seats that switched Democrat to Republican). You can use the indicator variables REP104 and REP105 to create the party switch indicator variables. Add these party switch indicator variables to the regressions you ran for part (b) and interpret the coefficents. Do a Wald test on the party switch indicator variables constraining one coefficient to equal the negative of the other coefficient. What does this test tell you?

  2. The data set (courtesy of Dennis Epple):


    contains information about condominium prices and characteristics for an area of central Boston that is much sought after. (The data were compiled and analyzed by Denise DiPasquale and William Wheaton in their book Urban Economics and Real Estate Markets.) The data include sale price (PRICE), floor area (SQFT), number of bedrooms (BED), number of bathrooms (BATH), number of stories in the building (STORY), distance in feet from the Boston Common (CDIST), an indicator variable denoting the availability of parking in the building (PARK), and indicator variables denoting street on which the unit is located: MDUM if on Marlborough Street, BDUM if on Beacon Street, CDUM if on Commonwealth Ave. All units not on one of these streets are on Beacon Hill.

    1. If price is proportional to floor area, explain why the following two regression equations are equivalent.

    1. Estimate the above regressions. Notice that the coefficients of the variables are approximately the same for both the regressions. Which model is better and why? Why is the R2 statistic so different?

    2. Suppose that you own a parcel of land on Marlborough Street exactly one-half mile (2640 feet) from the Boston Common and you wish to construct a condominium building. Suppose that you have decided to build a condominium with all units having three bedrooms, two bathrooms, and parking in the building. Suppose that construction cost per square foot increases with the number of floors according to the following equation: Cost = 40*STORY + 2*STORY2. Revenue per square foot per floor is given by PRICE/SQFT. Hence, revenue per square foot with more than one floor is given by STORY*(PRICE/SQFT). Using the appropriate regression, how many stories would you build to maximize the revenue per square foot of developed property? That is, find the number of stories to maximize:

      profit = STORY*(PRICE/SQFT) - (40*STORY + 2*STORY2)

      Hint: Substitute the values for the variables given above into your estimated equation for price per square foot. Note that the only unknown in the resulting expression will then be the number of stories. You can use the GENR command of EVIEWS to compute profit for each possible value of STORY to find the value that maximizes profit. An easy way to start this process is to define a new variable called FLOORS. (The data set already has a variable named STORY and you donít want to confuse your new variable with STORY!) Let the first observation take the value 1, the second two, and so forth. Then plug that into the equation.

      Here is an easy way to make FLOORS.