# The effect of smoking on income.

Download the SMOKE dataset using the command “bcuse smoke.” If you haven’t installed “bcuse” already, type “ssc install bcuse.” The variable definitions can be found here: http://fmwww.bc.edu/ec-p/data/wooldridge/smoke.des
We want to study the effect of smoking on income. The rationale is that people who smoke may have poor health, which affects their work performance. The equation that we want to estimate is:
income=β_0+β_1 cigs+β_2 educ+β_3 age+β_4 〖age〗^2+β_5 white+u_1 (1)
While cigarette consumption may affect income, income certainly affects cigarette consumption because cigarettes have an income elasticity. The demand equation for cigarettes is:
cigs=γ_0+γ_1 income+γ_2 educ+γ_3 age+γ_4 〖age〗^2+γ_5 cigpric+γ_6 resaurn+γ_7 white+u_2 (2)
Paste all relevant regression results from Stata.

1. Using the formula for simultaneity bias, discuss the likely direction of the bias for β_1 if the first equation is estimated using OLS. Consider two scenarios: 1) cigarette is a normal good; 2) cigarette is an inferior good.
2. In the above equations, which variables are the endogenous variables, and which are the exogenous variables? Why?
3. Explain why only equation (1) can be identified. What are the instrumental variables for it?
4. Estimate the reduced-form regression for equation (1) using OLS. Based on this regression, which of the available instrument variables look promising, and why?
5. “Manually” (not using IV commands) estimate the first-stage regression for the equation (1) using all available instruments. Test for instrumental relevance and discuss results. Do the instruments have the expected signs (explain what are the expected signs first)? Which instrument is the most useful in predicting the endogenous variable, and why?
6. “Manually” estimate the second-stage regression and interpret the results. Suppose that cigarette is a normal good. Did you get the expected bias correction compared to OLS?
7. Estimate equation (1) using 2SLS via the ivreg2 command in Stata. Request first-stage regression results as well as endogeneity test (see homework 4 for instructions.) Compare results to questions 5 and 6 to confirm that you get the same first-stage results and second-stage coefficients but different second-stage test statistics the manual estimation of 2SLS.
8. Discuss the endogeneity and over-identification test results.

