Further Exercises for MW24.1 - 09

  1. Question

    Install R and a frontend for R, e.g. R-Studio.

    Save the file DYQ.csv in your working directory. (Don’t take the detour of opening the file in your spreadsheet program (e.g. Microsoft Excel) and save it from that program. Spreadsheet programs often do considerable damage to your data. By taking the detour through your spreadsheet program you will often change numbers to what the spreadsheet program thinks is a date. If you find it difficult to save a link from your browser, try clicking with the right mouse button on the link. In many browsers this right-click opens a menu where you can choose “Save link as…” or something similar.)

    If you are not sure which directory R currently uses as a working directory, you can use the command

    getwd()

    In RStudio you can also use the menu Session / Set Working Directory / Choose Directory to choose your working directory.

    Read the file DYQ.csv with the command

    DYQ <- read.csv("DYQ.csv")

    Now the variable DYQ contains your data. In this exercise we have a brief look at your data:


    1. Use the command nrow(DYQ) to determine the number of rows in your data. How many rows do you have?
    2. Use the command names(DYQ) to determine the names of the variables in your data. How many variables do you have?
    3. Use the command mean(DYQ$X3) to determine the mean of the variable X3 in your data.
    4. Use the command median(DYQ$X3) to determine the median of the variable X3 in your data.
    5. Use the command sd(DYQ$X3) to determine the standard deviation of the variable X3 in your data. (If Moodle complains about an “incomplete answer”, please check whether the format of your answer is in line with Moodle’s expectations. Make sure that Moodle and you interpret decimal separators in the same way. Depending on the settings of your computer it is possible that Moodle expects decimal numbers like 3.14 and not like 3,14).

  2. Question

    Your sample of the random variable XX contains n=9n=9 independent and normally distributed observations: X1,,X9X_1,\ldots,X_{9}. You are looking for an estimator for E[X]E[X]. Which of the following statements are correct:

    1. The estimator 13X213X3+13X5+43X7-\frac{1}{3}X_{2}-\frac{1}{3}X_{3}+\frac{1}{3}X_{5}+\frac{4}{3}X_{7} is an unbiased estimator for E[X]E[X].

      Yes / No

    2. The estimator 2X1+2X312X5+32X6-2X_{1}+2X_{3}-\frac{1}{2}X_{5}+\frac{3}{2}X_{6} is an unbiased estimator for E[X]E[X].

      Yes / No

    3. The estimator 12X5X62X82X9\frac{1}{2}X_{5}-X_{6}-2X_{8}-2X_{9} is an unbiased estimator for E[X]E[X].

      Yes / No

    4. The estimator 2X3X4X72X_{3}-X_{4}-X_{7} is an unbiased estimator for E[X]E[X].

      Yes / No

    5. The estimator X4X_{4} dominates 16X1+23X2+23X412X6\frac{1}{6}X_{1}+\frac{2}{3}X_{2}+\frac{2}{3}X_{4}-\frac{1}{2}X_{6}.

      Yes / No

    6. The estimator 15X1+310X2+110X6+25X9\frac{1}{5}X_{1}+\frac{3}{10}X_{2}+\frac{1}{10}X_{6}+\frac{2}{5}X_{9} dominates 12X2+X3+12X5-\frac{1}{2}X_{2}+X_{3}+\frac{1}{2}X_{5}.

      Yes / No



  3. Question

    You use a level of significance of 0.01.


    1. You assume that your test statistic follows a standard normal distribution. How large (in absolute terms) can your test statistic (for a two-sided test) be, so that you still don’t reject your Null-hypothesis? (You can calcualate this value with R.)
    2. You assume that the random variable XX follows a normal distribution with unknown mean and standard deviation 6. Your sample contains 32 observations. The sample mean is -5. Your Null-hypothesis is that XX has a mean of 20. How large is the absolute value of your test statistic?
    3. You still assume that the random variable XX follows a normal distribution with unknown mean and standard deviation 6. Now you consider a sample with 32 observations and sample mean 0.7. Your Null-hypothesis is still that XX has a mean of 20. How large is the pp-value (for a two-sided test, rounded to 4 decimal places)?

  4. Question

    Your data in the file DHC.csv contains two variables: X and f. The variable f tells you which group (C or D) the observation X belongs to.

    Compare the mean of X for two groups C and D with the help of a (two-sided) tt-test.

    1. Your Null-hypothesis is that the mean of the normally distributed X is the same in both groups. How large is the pp-value for this tt-test?
    2. You use a level of significance of 5%. Do you reject your Null-hypothesis? Yes / No


  5. Question

    The random variable XN(μ,σ2)X \sim N(\mu,\sigma^2) follows a normal distribution with unknown variance σ2\sigma^2. You draw a sample with 24 observations. You find a sample mean of 7 and a sample standard deviation of 9.

    • Determine a 95%-confindence interval for your estimation of the expected value of XX: μ̂\hat{\mu}.

    1. What is the lower boundary of the interval?
    2. What is the upper boundary of the interval?

  6. Question

    A random variable XX is distributed as follows:

    • P(X=A)=θP(X=A)=\theta
    • P(X=B)=2θP(X=B)=2\theta
    • P(X=C)=13θP(X=C)=1-3\theta

    We have θ[0,1/3]\theta\in[0,1/3].

    In your sample you have the following observations:

    {B,C,B,A,B}\{ B, C, B, A, B \}.

    What is the maximum-likelihood estimator for θ\theta?


  7. Question

    The file DCF.csv contains two variables: X and Y.

    To explain Y as a linear function of X, you estimate the model

    Y=β0+β1X+uY = \beta_0 + \beta_1 X + u.

    1. Which value do you estimate for β1\beta_1?
    2. Your (two sided) Null-hypothesis is H0:β1=0H_0: \beta_1=0. Determine the pp-value for this test (report at least 4 decimal places).
    3. You use a level of significance of 10%. Can you reject your Null-hypothesis? Yes / No
    4. What is the lower boundary of the 95% confidence interval for β1\beta_1?
    5. What is the upper boundary of the 95% confidence interval for β1\beta_1?


  8. Question

    Use data from the file D03b.csv.

    You want to measure the effect X1 has on Y1, the effect X2 has on Y2, the effect X3 has on Y3, and the effect X4 has on Y4. For each case below, select the most suitable specification and provide the point estimate of the effect.


    1. By how many percentage points does Y1 change approximately when X1 changes by one unit?
    2. What is the elasticity of Y2 with respect to X2?
    3. What is the marginal effect of X3 on Y3?
    4. By which amount does Y4 change when X4 changes by 1 percentage point?

  9. Question

    Use the data from the file DUX.csv. Based on this data you estimate the following relationship:

    Y=β0+β1X1+β2X2+uY = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + u.

    1. Which value do you estimate for β2\beta_2?
    2. Your (two sided) Null-hypothesis is H0:β2=0H_0: \beta_2 = 0. Determine the pp-value for this test (report at least 4 decimal places).
    3. You use a level of significance of 1%. Can you reject your Null-hypothesis? Yes / No
    4. What is the lower boundary of the 95%-confidence-interval for β2\beta_2?
    5. What is the upper boundary of the 95%-confidence-interval for β2\beta_2?


  10. Question

    Consider the following model:

    tv=β0+β1ba+β2kh+β3bakh+utv = \beta_0 + \beta_1 \cdot ba + \beta_2 \cdot kh + \beta_3 \cdot ba \cdot kh + u

    The variable baba indicates whether you are in situation JMF or VSG: In the case of JMF you have ba=0ba=0. In the case of VSG you have ba=1ba=1.

    The variable khkh indicates whether you are in situation WEH or YCA: In the case of WEH you have kh=0kh=0. In the case of YCA you have kh=1kh=1.

    The mean values of tvtv for the four different combinations of JMF and VSG and WEH and YCA are shown in the following table:


    1. What is β0\beta_0?
    2. What is β1\beta_1?
    3. What is β2\beta_2?
    4. What is β3\beta_3?

  11. Question

    Save the file DLE.csv in your working directory. You want to estimate the mean of the absolute value of X1.


    1. What is the plug-in estimate of the mean of the absolute value of X1?
    2. Use a bootstrap (with 10000 replications) to determine the standard deviation of this estimate
    3. You assume this estimate follows a normal distribution. Use a parametric bootstrap to determine the lower boundary of a 95%-confidence interval for the mean of the absolute value of X1?

  12. Question

    The random variable TT follows a normal distribution with unknown mean μT\mu_{T} and known standard deviation σT=7\sigma_{T}=7.

    According to your prior the following holds:

    • μT=6\mu_{T}=6 with probability 2/7,
    • μT=14\mu_{T}=14 with probability 5/7.

    The probability for all other values of μT\mu_{T} is zero.

    You have one observation, T=4T=4.

    In the following you can use dnorm to calculate the density function of the normal distribution.

    E.g. dnorm(4,6,7) yields the density of the normal distribution for T=4T=4 when μT=6\mu_{T}=6 and σT=7\sigma_{T}=7.


    1. What is the posterior probability P(μT=6|T=4)P(\mu_{T}=6| T=4 )?
    2. What is the posterior probability P(μT=10|T=4)P(\mu_{T}=10| T=4 )?
    3. Now you have two observations: T={4,9}T=\{4,9\}. What is the posterior probability P(μT=6|T={4,9})P(\mu_{T}=6| T=\{ 4,9 \} )?

  13. Question

    The file DWW.csv contains a variable X. This X is a sample of the random variable XX. Here we write the normal distribution as N(μ,τ)N(\mu,\tau) where μ\mu is the mean and τ=1/σ2\tau=1/\sigma^2 is the precision. You assume that XX follows a normal distribution: XN(μ,1/σ2)X \sim N(\mu,1/\sigma^2) where σ2\sigma^2 is the variance of XX. You have the following priors: μN(0,.0001)\mu \sim N(0,.0001), τ=1/σ2Γ(.01,.01)\tau=1/\sigma^2\sim \Gamma(.01,.01) (Γ\Gamma denotes the Gamma distribution).

    To obtain the necessary precision, please use run.jags defaults. Please don’t change options or modules.


    1. What is the lower boundary of the 95%-credible-interval for μ\mu?
    2. What is the upper boundary of the 95%-credible-interval for μ\mu?
    3. What is the lower boundary of the 95%-credible-interval for σ\sigma?
    4. What is the upper boundary of the 95%-credible-interval for σ\sigma?
    5. What is the posterior probability (a number between 0 and 1) of μ>7.43\mu>7.43?
    6. What is the posterior probability (a number between 0 and 1) of 3.69<μ<5.643.69<\mu<5.64?

  14. Question

    The file DTN.csv contains an independent variable F and a dependent binary variable L.

    You estimate the following model:

    P(L=1|F=f)=F(β0+βFf)P(L=1|F=f) = F(\beta_0 + \beta_F f)

    where FF is the logistic distribution.


    1. What is your estimate for βF\beta_F?
    2. What is the marginal effect of ff for the average value of FF in your data?
    3. What is the average marginal effect of ff?
    4. What is the marginal effect of ff if f=1.738f=1.738?
    5. What are the odds for L=1L=1 if f=1.738f=1.738?

  15. Question

    The file DVU.csv contains two independent variables X1, and X2, and a dependent count variable Y. (Hint: In the following you may find the library MASS useful.)


    1. Use a Poisson model where you explain Y as a function of X1 and X2. What is the coefficient of X2?
    2. Now you use a negative binomial model to explain Y as a function of X1 and X2. What is now the coefficient of X2?
    3. In the negative binomial model, what is your estimate for the parameter θ\theta?
    4. Your Null-hypothesis is that θ=\theta=\infty, i.e. that the negative binomial model does not significantly improve the goodness of fit of the Poisson model. Use a Likelihood-Ratio test to test this hypothesis. Which pp-value do you get (rounded to 4 decimal places)?

  16. Question

    The file DAR.csv contains three variables, a, H and X. The variable a denotes to which group observations belong. X is our dependent variable.

    We write the normal distribution as N(μ,τ)N(\mu,\tau) where μ\mu is the mean and τ=1/σ2\tau=1/\sigma^2 is the precision. Γ\Gamma denotes the Gamma distribution. You use JAGS to estimate the following model with random effects:

    Xat=β0+νa+ϵatX_{at} = \beta_0 + \nu_{a} + \epsilon_{at}

    where the group specific random effect νaN(0,τν)\nu_a \sim N(0,\tau_\nu) and the residual ϵatN(0,τϵ)\epsilon_{at} \sim N(0,\tau_\epsilon). Here τν=1/σν2\tau_\nu=1/\sigma^2_\nu and τϵ=1/σϵ2\tau_\epsilon=1/\sigma^2_\epsilon are the precision of νa\nu_a and ϵat\epsilon_{at}, respectively.

    Your priors are: β0N(0,0.0001)\beta_0 \sim N(0,0.0001), τνΓ(.01,.01)\tau_\nu \sim \Gamma(.01,.01), τϵΓ(.01,.01)\tau_\epsilon \sim \Gamma(.01,.01).

    To obtain the necessary precision, please use run.jags defaults. Please don’t change the options or modules.


    1. What is the 50%-quantile of your posterior for σϵ\sigma_\epsilon?
    2. What is the 50%-quantile of your posterior for σν\sigma_\nu?
    3. What is the posterior probability (a number between 0 and 1) of σν>3.3\sigma_\nu>3.3?

  17. Question

    The file DBN.csv contains the variables X, Y and Z. You estimate the model Y=β0+β1X+uY = \beta_0 + \beta_1 X + u.


    1. Use a standard OLS model to estimate the model. What is the coefficient of XX?
    2. Provide a pp-value for the test of the Null-hypothesis that the coefficient of XX is zero (round to 4 decimal places)?
    3. Now use the variable ZZ as an instrument for XX. Use the command ivreg from the AER library to estimate the coefficient of XX for this model.
    4. For the instrumental variables model provide a pp-value for the test of the Null-hypothesis that the coefficient of XX is zero (round to 4 decimal places)?

  18. Question

    The file DJX.csv contains eight independent variables, X1, X2, X3, X4, X5, X6, X7, X8, and a dependent variable Y. You estimate the following (full) model:

    Y=β0+β1X1+β2X2+β3X3+β4X4+β5X5+β6X6+β7X7+β8X8+uY = \beta_0 + \beta_{1} X_{1} + \beta_{2} X_{2} + \beta_{3} X_{3} + \beta_{4} X_{4} + \beta_{5} X_{5} + \beta_{6} X_{6} + \beta_{7} X_{7} + \beta_{8} X_{8} + u


    1. What is the coefficient of X2 in the full model?
    2. You simplify the model and include only terms which are significant on a 5% level in the above estimation. You drop insignificant terms only once. If you find insignificant terms in your simplified model, you keep them. You also keep X2. What is now the coefficient of X2?
    3. Use the function extractAIC to obtain the AIC of this (simplified) model. (Note: the function extractAIC returns two numbers. Only one of them is the AIC).
    4. Now you use the step function to simplify the (full) model based on the AIC. If the step function has removed X2 from the model, add X2 back to your model. What is the coefficient of X2 in this model?
    5. Use the function extractAIC to obtain the AIC of this model.

  19. Question

    The file DYW.csv contains an independent variable H and a dependent binary variable V.

    You estimate the following model:

    P(V=1|[H=h)=Φ(β0+βhh) P(V=1|[H=h) = \Phi(\beta_0 + \beta_{h} h)

    where Φ\Phi is the standard normal distribution.


    1. What is your estimate for βh\beta_{h}?
    2. What is the marginal effect of hh for the average value of hh in your data?
    3. What is the average marginal effect of hh?
    4. What is the marginal effect of hh if h=0.348h=-0.348?

  20. Question

    A random variable XX follows a distribution with density function f(x|θ)=(x23)θθxf(x|\theta)=\left(\frac {x}{23}\right)^\theta \cdot \frac{\theta}{x} if x[0,23]x\in[0,23] and f(x)=0f(x)=0 otherwise.

    Your sample contains the observations {3,14,14,18}\{ 3, 14, 14, 18 \}.

    What is the Maximum-Likelihood estimator for θ\theta?


  21. Question

    The data in the file DGX.csv contains 8 variables, Ybz, Yjr, Ykv, Ysf, Xbz, Xjr, Xkv, Xsf. You investigate the effect Xbz has on Ybz, the effect Xjr has on Yjr, the effect Xkv has on Ykv, the effect Xsf has on Ysf. For each case below, select the most suitable specification and provide the point estimate of the effect.

    1. Use a specification where Ybz changes by a fixed number of percentage points when Xbz changes by one unit. By how many percentage points does Ybz change approximately when Xbz changes by one unit?
    2. Use a specification where Ysf changes by a fixed amount when Xsf changes by a given percentage. By which amount does Ysf change when Xsf changes by 1 percentage point?
    3. Use a specification where the marginal effect of Xkv on Ykv is constant. What is the marginal effect of Xkv on Ykv?
    4. Use a specification where the elasticity of Yjr with respect to Xjr is constant. What is the elasticity of Yjr with respect to Xjr?


  22. Question

    The file DTR.csv contains a variable A. This A is a sample of the random variable AA. You assume that AA follows a normal distribution: AN(μ,1/σ2)A \sim N(\mu,1/\sigma^2) where σ2\sigma^2 is the variance of AA. Your priors are μN(10.6,8)\mu \sim N(-10.6,8), τ=1/σ2Γ(.01,.01)\tau=1/\sigma^2\sim \Gamma(.01,.01). We write the normal distribution as N(μ,τ)N(\mu,\tau) where μ\mu is the mean and τ=1/σ2\tau=1/\sigma^2 is the precision. Γ\Gamma denotes the Gamma distribution.

    To obtain the necessary precision, please use run.jags defaults. Please don’t change options or modules.

    The last two questions belong to chapter 12 of the lecture! Remember that if pp is the probability of an event, then the odds are o=p1po=\frac{p}{1-p}.


    1. What is the lower boundary of the 95%-credible-interval for μ\mu?
    2. What is the upper boundary of the 95%-credible-interval for μ\mu?
    3. What is the lower boundary of the 95%-credible-interval for σ\sigma?
    4. What is the upper boundary of the 95%-credible-interval for σ\sigma?
    5. What are the posterior odds of μ>9.48\mu>-9.48?
    6. What are the posterior odds of 10.4<μ<10.2-10.4<\mu<-10.2?

  23. Question

    Use the data from the file DSQ.csv. You are interested in the interquartile range ξ\xi of the variable X2. The interquartile range is the distance between the 25% and 75% quantiles. In R you can use the function IQR(x) to determine the interquartile range of x.


    1. What is the plug-in estimate of the interquartile range of X2?
    2. Use a bootstrap (with 10000 replications) to determine the standard deviation of this estimate.

  24. Question

    The file DGP.csv contains four variables, B, Q, X and p.

    The variable p denotes the group to which an observation belongs.

    You compare the following two models: A standard OLS model and a model with a random effect.

    • Here is the OLS model:

    Xpt=β0+βBBpt+βQQpt+ϵptX_{pt} = \beta_0 + \beta_B B_{pt} + \beta_Q Q_{pt} + \epsilon_{pt}

    • Now you extend this model with a random effect νp\nu_{p}:

    Xpt=β0+βBBpt+βQQpt+νp+ϵptX_{pt} = \beta_0 + \beta_B B_{pt} + \beta_Q Q_{pt} + \nu_{p} + \epsilon_{pt}

    You use lmer from the lme4 library to estimate the model with random effects.


    1. What is your estimate for βB\beta_{B} in the OLS model?
    2. What is your estimate for βQ\beta_{Q} in the OLS model?
    3. What is your estimate for βB\beta_{B} in the model with a random effect?
    4. What is your estimate for βQ\beta_{Q} in the model with a random effect?
    5. What is your estimate for the standard deviation of the random effect νp\nu_{p}?