代写ECON3360/7360 endogeneity problem

ECON3360/7360: Homework I

ECON3360/7360: Homework I

2016 Semester2

Due: 5pm, 16 th Sep 2016

Part A. True/False/Uncertain Questions

Please provide short explanation (limit to three lines) to justify your answers for

the questions in Part A.

1. If a linear regression has omitted variables, robust variance estimation is misleading.

2. FE models automatically control for endogeneity.

3. FE estimator is consistent if RE assumptions apply.

4. There is no case for using cluster robust variance estimator for a FE regression.

5. RCT provides the gold standard for estimating causal effects.

6. For exactly identified case, IV and 2SLS estimators are the same.

7. For over-identified case with homoscedasticity assumption, 2SLS and GMM estimators

are the same.

8. We can always statistically test whether instruments are strong and valid for IV

regression.

9. Suppose that we only care about consistency or inconsistency. For panel data analysis

with a valid IV, using both fixed effects and IV could make more inconsistency (in

absolute term) than only using FE.

10. As the variance of measurement error increases to infinity, the bias of OLS also

increases.

11. 3SLS estimator is better than 2SLS because 3SLS uses more information and

estimates parameters more precisely.

12. If there is endogeneity problem, FE is preferred to pooled OLS.

13. Increasing sample size does help with mitigating multicollinearity problem.

14. Including more covariates do help with mitigating multicollinearity problem.

15. It is more costly to correct problems from internal validity than those from external

validity in experimental setting.

16. In cross-sectional data setting, we can only conditioning on observed variables.

17. In repeated cross-section data setting, we can only conditioning on observed variables.

18. IV method can be combined with FE but cannot be combined with RE.

19. Cluster robust SE allows more flexibility (i.e. valid with less assumption) than

heteroskedasticity robust SE both in cross-section and panel data.

20. IV helps to capture both direct and indirect effect of endogenous variable on outcome

variable.代写ECON3360/7360

21. RE can identify coefficient on time-invariant variable.

22. IV> OLS in absolute value could imply the endogeneity for the variable of interest due

to measurement error.

23. Over-identification test is a test for whether the variable of interest is endogenous.

24. We can get IV estimates from OLS for 1 st stage and OLS for reduced form equations.

25. Hausman-taylor estimator is a RE IV estimator for a panel data model.

26. Arellano-bond estimator is a FE IV estimator for a dynamic panel data model.

27. In the RD design, we only use observations around cut-off points.

28. We don’t want a kink in density around cut-off points for a forcing variable.

29. We don’t want a kink in outcome values around cut-off points for a forcing variable.

30. We don’t want a kink in control values around cut-off points for a forcing variable.

Short-Answer Questions (2marks for each subquestions)

I. IV Estimator

We consider the following regression model:

???(????) = ? ? + ? ? ∙ ???? + ? ? ∙ ????? + ? ? ∙ ??????? + ?

where we are interested in the return of education on wage. Suppose that ability is

unobserved. Thus, we consider the following equation instead.

???(????) = ? ? + ? ? ∙ ???? + ? ? ∙ ????? +?

For endogenous educ, a dummy variable, z, is constructed using information on the quarter

of birth, where z is 0 if born in the 1 st quarter and 1 otherwise. You are trying to use this

dummy variable as an instrument to get ? ? .

1. Derive omitted variable bias for the OLS estimator for ? ? .

2. Card (1995) instead uses college4 4 (distance from student's home to nearest 4-year

college) as an IV for educ. Can we test the validity/relevance of college4 4 as an IV for

educ?

3. If you can perform the test in 2, provide a Stata procedure to test the relevance of

college4 4 as an IV for educ.

4. Numerous studies reported that the IV estimate of ? ? is greater than the OLS

of ? ? . Infer the sign and the magnitude of ? ? by comparing OLS and IV

estimates of ? ? ?代写ECON3360/7360

5. Evaluate and compare the OLS and IV standard errors for ? ? .

6. Provide a Stata procedure to perform a test for the validity of IV.

7. Suppose you have panel data. How do you change the model to avoid endogeneity

problem?

8. Continue from 7, what is new assumption for IV method? Provide a produce to

implement IV with panel data.

QUESTIONS CONTINUE OVER PAGE

II. Panel Data Estimator

Consider the following unobserved effects model:

??????? ?? = ? ? + ? ? ∙ ???? ?? + ? ? ∙ ???? ?? + ????? ? + ???? ? + ? ?? (1)

where ??????? ?? is number of murders per 100,000 people, ???? ?? is number of executions,

???? ?? is unemployment rate for state ? at year ?. Data set is state-level (50 US states) data

for two years (1990 and 1991).

1. How many variable/variables should be included for ????? ? ? Interpret ????? ? in the

equation.

2. Provide fixed effects (FE) transformation for the equation. [Hint: No derivation is

required. All you need to provide is a transformed equation.]

3. Using the fixed effects transformed equation, state the condition/conditions for the FE

estimator for ? ? to be consistent.

4. Explain how the source of variation for identifying ? ? changes as estimate the

following estimation instead

??????? ?? = ? ?? + ? ?? ∙ ???? ?? + ? ?? ∙ ???? ?? + ???? ? + ? ?? (2)

5. Suppose the estimates for ? ? and ? ?? differ substantially. Explain the source of

difference in the estimates.

6. Write down the estimating equation for first-differenced estimator for ? ? in (1).

7. Write down the estimating equation in 6 with state fixed effects. How we can interpret

state fixed effects here?

8. Compare the FE and FD estimates for ? ? in the equation (1).

9. Construct a Hausman test statistic that compares ?

̂ ?,?? and ? ̂ ?,?? .

10. Suppose the FE and RE estimates are substantially different (Null hypothesis is

rejected in 9. What does this imply for endogeneity of ???? ?? in (2).

QUESTIONS CONTINUE OVER PAGE

III. Simultaneous equations models

A model to estimate the effect of smoking on annual income equation (1) is:

???(??????) = ? ? + ? ? ∙ ???? + ? ? ∙ ???? + ? ? ????? + ? ? (1)

where cigs is the number of cigarettes smoked per day on average and ???? is years of

education.

To reflect the fact that cigarette consumption might be jointly determined with income, a

demand for cigarettes equation (2) is also considered:

???? = ? ? + ? ? ∙ ???(??????) + ? ? ∙ ???? + ? ?? ∙ ??? (???????) + ? ?? ∙ ???????? + ? ? (2)

where cigpric is the price of a pack of cigarettes (in cents), and restaurn is a binary variable

equal to unity if the person lives in a state with restaurant smoking restrictions.

1. How do you interpret the OLS estimator ? ? in the equation (1).

2. Under what assumption/assumptions is/are the equation for the demand of cigarettes

(2) identified?

3. Provide a procedure of the test for the relevance of exclusion restrictions in

estimating (1).

Suppose we collect panel data and instead estimate the below equation for

demand for cigarettes.

???(??????) ?? = ? ? + ? ? ∙ ???? ?? + ? ? ∙ ???? ?? + ? ? ????? ?? + ? ?? + ? ? (3)

???? ?? = ? ? + ? ? ∙ ???(??????) ?? + ? ? ∙ ???? ?? + ? ?? ∙ ???(????????) ?? + ? ?? + ? ?? (4)

4. Under what assumption/assumptions is/are the equation for the demand of

cigarettes (4) identified?

5. Provide a Stata procedure of the test for the relevance of exclusion restrictions

using the reduced form regression for (4).

6. Provide the equation for the demand of cigarettes (4) using first-differenced

estimation.

7. Provide dependent and explanatory variables for the reduced form equation for

∆ cigs.

8. Provide a Stata procedure for first differenced IV estimation for (3).

QUESTIONS CONTINUE OVER PAGE

IV. Regression Discontinuity Design (RD)

The 1988 Education Act allowed English state schools to opt out of local authority

control and become "Grant Maintained" (GM) schools. GM schools are directly

funded by the central government and are governed by a governing body and the

head teacher rather than the local authority. To become a GM school, parents of

current students had to hold a secret ballot. If more than 50% of the parents cast

their vote in favour of converting to GM status the school would essentially be

automatically converted into a GM school. You have been asked to evaluate the

impact of becoming a GM school on student achievement. For this purpose, you

have collected data on student test results at age 16 for the year 1997. You use

this data to estimate a regression of the form:

????? = ? ? + ? ? ∙ ?? + ? (7)

where ????? is the result of a student in a standardized math test and GM is a dummy

variable which takes on a value of 1 if the student is enrolled in a GM school and zero

otherwise.

1. Is ? ? in the equation (7) likely to capture the causal effect of becoming a GM

school on student achievement?

After 1988, a large number of schools held ballots about conversion to GM status.

While many ballots were successful, there were also a large number of ballots where

the majority of parents were opposed to convert the school into a GM school. Suppose

your dataset also include the percentage of votes that were cast in favour of

conversion to a GM school.

2. How would you exploit that additional information for an alternative estimation

strategy to measure the causal effect of GM status on student achievement?

3. Provide the appropriate regression specification. What is the key coefficient of

interest?

4. What are the assumptions that are required to hold for the approach in question

3 to work?

5. How do you test the validity of the method in question 3? Describe a Stata

procedure for a test.

6. Suppose that you include additional control variables to the regression equation

in question 3. How does it affect to the key coefficient estimates and their

standard errors?

7. Describe the potential threats to the identification strategy in question 3.

8. Describe how you can address those concerns in question 7.

代写ECON3360/7360 endogeneity problem

代写ECON3360/7360 endogeneity problem