代写ECON3360/7360 endogeneity problem
100%原创包过,高质代写&免费提供Turnitin报告--24小时客服QQ&微信:120591129
代写ECON3360/7360 endogeneity problem
ECON3360/7360: Homework I
2016 Semester2
Due: 5pm, 16 th Sep 2016
Part A. True/False/Uncertain Questions
Please provide short explanation (limit to three lines) to justify your answers for
the questions in Part A.
1. If a linear regression has omitted variables, robust variance estimation is misleading.
2. FE models automatically control for endogeneity.
3. FE estimator is consistent if RE assumptions apply.
4. There is no case for using cluster robust variance estimator for a FE regression.
5. RCT provides the gold standard for estimating causal effects.
6. For exactly identified case, IV and 2SLS estimators are the same.
7. For over-identified case with homoscedasticity assumption, 2SLS and GMM estimators
are the same.
8. We can always statistically test whether instruments are strong and valid for IV
regression.
9. Suppose that we only care about consistency or inconsistency. For panel data analysis
with a valid IV, using both fixed effects and IV could make more inconsistency (in
absolute term) than only using FE.
10. As the variance of measurement error increases to infinity, the bias of OLS also
increases.
11. 3SLS estimator is better than 2SLS because 3SLS uses more information and
estimates parameters more precisely.
12. If there is endogeneity problem, FE is preferred to pooled OLS.
13. Increasing sample size does help with mitigating multicollinearity problem.
14. Including more covariates do help with mitigating multicollinearity problem.
15. It is more costly to correct problems from internal validity than those from external
validity in experimental setting.
16. In cross-sectional data setting, we can only conditioning on observed variables.
17. In repeated cross-section data setting, we can only conditioning on observed variables.
18. IV method can be combined with FE but cannot be combined with RE.
19. Cluster robust SE allows more flexibility (i.e. valid with less assumption) than
heteroskedasticity robust SE both in cross-section and panel data.
20. IV helps to capture both direct and indirect effect of endogenous variable on outcome
variable.代写ECON3360/7360
21. RE can identify coefficient on time-invariant variable.
22. IV> OLS in absolute value could imply the endogeneity for the variable of interest due
to measurement error.
23. Over-identification test is a test for whether the variable of interest is endogenous.
24. We can get IV estimates from OLS for 1 st stage and OLS for reduced form equations.
25. Hausman-taylor estimator is a RE IV estimator for a panel data model.
26. Arellano-bond estimator is a FE IV estimator for a dynamic panel data model.
27. In the RD design, we only use observations around cut-off points.
28. We don’t want a kink in density around cut-off points for a forcing variable.
29. We don’t want a kink in outcome values around cut-off points for a forcing variable.
30. We don’t want a kink in control values around cut-off points for a forcing variable.
Short-Answer Questions (2marks for each subquestions)
I. IV Estimator
We consider the following regression model:
???(????) = ? ? + ? ? ∙ ???? + ? ? ∙ ????? + ? ? ∙ ??????? + ?
where we are interested in the return of education on wage. Suppose that ability is
unobserved. Thus, we consider the following equation instead.
???(????) = ? ? + ? ? ∙ ???? + ? ? ∙ ????? +?
For endogenous educ, a dummy variable, z, is constructed using information on the quarter
of birth, where z is 0 if born in the 1 st quarter and 1 otherwise. You are trying to use this
dummy variable as an instrument to get ? ? .
1. Derive omitted variable bias for the OLS estimator for ? ? .
2. Card (1995) instead uses college4 4 (distance from student's home to nearest 4-year
college) as an IV for educ. Can we test the validity/relevance of college4 4 as an IV for
educ?
3. If you can perform the test in 2, provide a Stata procedure to test the relevance of
college4 4 as an IV for educ.
4. Numerous studies reported that the IV estimate of ? ? is greater than the OLS
of ? ? . Infer the sign and the magnitude of ? ? by comparing OLS and IV
estimates of ? ? ?代写ECON3360/7360
5. Evaluate and compare the OLS and IV standard errors for ? ? .
6. Provide a Stata procedure to perform a test for the validity of IV.
7. Suppose you have panel data. How do you change the model to avoid endogeneity
problem?
8. Continue from 7, what is new assumption for IV method? Provide a produce to
implement IV with panel data.
QUESTIONS CONTINUE OVER PAGE
II. Panel Data Estimator
Consider the following unobserved effects model:
??????? ?? = ? ? + ? ? ∙ ???? ?? + ? ? ∙ ???? ?? + ????? ? + ???? ? + ? ?? (1)
where ??????? ?? is number of murders per 100,000 people, ???? ?? is number of executions,
???? ?? is unemployment rate for state ? at year ?. Data set is state-level (50 US states) data
for two years (1990 and 1991).
1. How many variable/variables should be included for ????? ? ? Interpret ????? ? in the
equation.
2. Provide fixed effects (FE) transformation for the equation. [Hint: No derivation is
required. All you need to provide is a transformed equation.]
3. Using the fixed effects transformed equation, state the condition/conditions for the FE
estimator for ? ? to be consistent.
4. Explain how the source of variation for identifying ? ? changes as estimate the
following estimation instead
??????? ?? = ? ?? + ? ?? ∙ ???? ?? + ? ?? ∙ ???? ?? + ???? ? + ? ?? (2)
5. Suppose the estimates for ? ? and ? ?? differ substantially. Explain the source of
difference in the estimates.
6. Write down the estimating equation for first-differenced estimator for ? ? in (1).
7. Write down the estimating equation in 6 with state fixed effects. How we can interpret
state fixed effects here?
8. Compare the FE and FD estimates for ? ? in the equation (1).
9. Construct a Hausman test statistic that compares ?
̂ ?,?? and ? ̂ ?,?? .
10. Suppose the FE and RE estimates are substantially different (Null hypothesis is
rejected in 9. What does this imply for endogeneity of ???? ?? in (2).
QUESTIONS CONTINUE OVER PAGE
III. Simultaneous equations models
A model to estimate the effect of smoking on annual income equation (1) is:
???(??????) = ? ? + ? ? ∙ ???? + ? ? ∙ ???? + ? ? ????? + ? ? (1)
where cigs is the number of cigarettes smoked per day on average and ???? is years of
education.
To reflect the fact that cigarette consumption might be jointly determined with income, a
demand for cigarettes equation (2) is also considered:
???? = ? ? + ? ? ∙ ???(??????) + ? ? ∙ ???? + ? ?? ∙ ??? (???????) + ? ?? ∙ ???????? + ? ? (2)
where cigpric is the price of a pack of cigarettes (in cents), and restaurn is a binary variable
equal to unity if the person lives in a state with restaurant smoking restrictions.
1. How do you interpret the OLS estimator ? ? in the equation (1).
2. Under what assumption/assumptions is/are the equation for the demand of cigarettes
(2) identified?
3. Provide a procedure of the test for the relevance of exclusion restrictions in
estimating (1).
Suppose we collect panel data and instead estimate the below equation for
demand for cigarettes.
???(??????) ?? = ? ? + ? ? ∙ ???? ?? + ? ? ∙ ???? ?? + ? ? ????? ?? + ? ?? + ? ? (3)
???? ?? = ? ? + ? ? ∙ ???(??????) ?? + ? ? ∙ ???? ?? + ? ?? ∙ ???(????????) ?? + ? ?? + ? ?? (4)
4. Under what assumption/assumptions is/are the equation for the demand of
cigarettes (4) identified?
5. Provide a Stata procedure of the test for the relevance of exclusion restrictions
using the reduced form regression for (4).
6. Provide the equation for the demand of cigarettes (4) using first-differenced
estimation.
7. Provide dependent and explanatory variables for the reduced form equation for
∆ cigs.
8. Provide a Stata procedure for first differenced IV estimation for (3).
QUESTIONS CONTINUE OVER PAGE
IV. Regression Discontinuity Design (RD)
The 1988 Education Act allowed English state schools to opt out of local authority
control and become "Grant Maintained" (GM) schools. GM schools are directly
funded by the central government and are governed by a governing body and the
head teacher rather than the local authority. To become a GM school, parents of
current students had to hold a secret ballot. If more than 50% of the parents cast
their vote in favour of converting to GM status the school would essentially be
automatically converted into a GM school. You have been asked to evaluate the
impact of becoming a GM school on student achievement. For this purpose, you
have collected data on student test results at age 16 for the year 1997. You use
this data to estimate a regression of the form:
????? = ? ? + ? ? ∙ ?? + ? (7)
where ????? is the result of a student in a standardized math test and GM is a dummy
variable which takes on a value of 1 if the student is enrolled in a GM school and zero
otherwise.
1. Is ? ? in the equation (7) likely to capture the causal effect of becoming a GM
school on student achievement?
After 1988, a large number of schools held ballots about conversion to GM status.
While many ballots were successful, there were also a large number of ballots where
the majority of parents were opposed to convert the school into a GM school. Suppose
your dataset also include the percentage of votes that were cast in favour of
conversion to a GM school.
2. How would you exploit that additional information for an alternative estimation
strategy to measure the causal effect of GM status on student achievement?
3. Provide the appropriate regression specification. What is the key coefficient of
interest?
4. What are the assumptions that are required to hold for the approach in question
3 to work?
5. How do you test the validity of the method in question 3? Describe a Stata
procedure for a test.
6. Suppose that you include additional control variables to the regression equation
in question 3. How does it affect to the key coefficient estimates and their
standard errors?
7. Describe the potential threats to the identification strategy in question 3.
8. Describe how you can address those concerns in question 7.
代写ECON3360/7360 endogeneity problem