Further Exercises for MW24.1 - 19
- 
QuestionInstall R and a frontend for R, e.g. R-Studio. Save the file DCU.csv in your working directory. (Don’t take the detour of opening the file in your spreadsheet program (e.g. Microsoft Excel) and save it from that program. Spreadsheet programs often do considerable damage to your data. By taking the detour through your spreadsheet program you will often change numbers to what the spreadsheet program thinks is a date. If you find it difficult to save a link from your browser, try clicking with the right mouse button on the link. In many browsers this right-click opens a menu where you can choose “Save link as…” or something similar.) If you are not sure which directory R currently uses as a working directory, you can use the command getwd()In RStudio you can also use the menu Session / Set Working Directory / Choose Directory to choose your working directory. Read the file DCU.csvwith the commandDCU <- read.csv("DCU.csv")Now the variable DCUcontains your data. In this exercise we have a brief look at your data:
 - 
Use the command nrow(DCU)to determine the number of rows in your data. How many rows do you have?
- 
Use the command names(DCU)to determine the names of the variables in your data. How many variables do you have?
- 
Use the command mean(DCU$X3)to determine the mean of the variableX3in your data.
- 
Use the command median(DCU$X3)to determine the median of the variableX3in your data.
- 
Use the command sd(DCU$X3)to determine the standard deviation of the variableX3in your data. (If Moodle complains about an “incomplete answer”, please check whether the format of your answer is in line with Moodle’s expectations. Make sure that Moodle and you interpret decimal separators in the same way. Depending on the settings of your computer it is possible that Moodle expects decimal numbers like3.14and not like3,14).
 
 
- 
Use the command 
- 
QuestionYour sample of the random variable contains independent and normally distributed observations: . You are looking for an estimator for . Which of the following statements are correct: - The estimator is an unbiased estimator for . - Yes / No 
- The estimator is an unbiased estimator for . - Yes / No 
- The estimator is an unbiased estimator for . - Yes / No 
- The estimator is an unbiased estimator for . - Yes / No 
- The estimator dominates . - Yes / No 
- The estimator dominates . - Yes / No 
 
 
 
- 
QuestionYou use a level of significance of 0.05. 
 - You assume that your test statistic follows a standard normal distribution. How large (in absolute terms) can your test statistic (for a two-sided test) be, so that you still don’t reject your Null-hypothesis? (You can calcualate this value with R.)
- You assume that the random variable follows a normal distribution with unknown mean and standard deviation 3. Your sample contains 30 observations. The sample mean is 4. Your Null-hypothesis is that has a mean of 10. How large is the absolute value of your test statistic?
- You still assume that the random variable follows a normal distribution with unknown mean and standard deviation 3. Now you consider a sample with 30 observations and sample mean 3. Your Null-hypothesis is still that has a mean of 10. How large is the -value (for a two-sided test, rounded to 4 decimal places)?
 
 
- 
QuestionYour data in the file DVV.csv contains two variables: Xandt. The variablettells you which group (L or M) the observationXbelongs to.Compare the mean of Xfor two groups L and M with the help of a (two-sided) -test.- Your Null-hypothesis is that the mean of the normally distributed X is the same in both groups. How large is the -value for this -test?
- You use a level of significance of 10%. Do you reject your Null-hypothesis? Yes / No
 
 
 
- 
QuestionThe random variable follows a normal distribution with unknown variance . You draw a sample with 23 observations. You find a sample mean of 5 and a sample standard deviation of 3. - Determine a 95%-confindence interval for your estimation of the expected value of : .
 
 - What is the lower boundary of the interval?
- What is the upper boundary of the interval?
 
 
- 
QuestionA random variable is distributed as follows: We have . In your sample you have the following observations: . What is the maximum-likelihood estimator for ? 
 
- 
QuestionThe file DAA.csv contains two variables: XandY.To explain Yas a linear function ofX, you estimate the model. - Which value do you estimate for ?
- Your (two sided) Null-hypothesis is . Determine the -value for this test (report at least 4 decimal places).
- You use a level of significance of 0.1%. Can you reject your Null-hypothesis? Yes / No
- What is the lower boundary of the 95% confidence interval for ?
- What is the upper boundary of the 95% confidence interval for ?
 
 
 
- 
QuestionUse data from the file D03b.csv. You want to measure the effect X1 has on Y1, the effect X2 has on Y2, the effect X3 has on Y3, and the effect X4 has on Y4. For each case below, select the most suitable specification and provide the point estimate of the effect. 
 - 
By how many percentage points does Y1change approximately whenX1changes by one unit?
- 
What is the elasticity of Y2with respect toX2?
- 
What is the marginal effect of X3onY3?
- 
By which amount does Y4change whenX4changes by 1 percentage point?
 
 
- 
By how many percentage points does 
- 
QuestionUse the data from the file DUX.csv. Based on this data you estimate the following relationship: . - Which value do you estimate for ?
- Your (two sided) Null-hypothesis is . Determine the -value for this test (report at least 4 decimal places).
- You use a level of significance of 5%. Can you reject your Null-hypothesis? Yes / No
- What is the lower boundary of the 95%-confidence-interval for ?
- What is the upper boundary of the 95%-confidence-interval for ?
 
 
 
- 
QuestionConsider the following model: The variable indicates whether you are in situation KQP or MJN: In the case of KQP you have . In the case of MJN you have . The variable indicates whether you are in situation RVZ or YTB: In the case of RVZ you have . In the case of YTB you have . The mean values of for the four different combinations of KQP and MJN and RVZ and YTB are shown in the following table: 
 - What is ?
- What is ?
- What is ?
- What is ?
 
 
- 
QuestionSave the file DLE.csv in your working directory. You want to estimate the mean of the absolute value of X3.
 - 
What is the plug-in estimate of the mean of the absolute value of X3?
- Use a bootstrap (with 10000 replications) to determine the standard deviation of this estimate
- 
You assume this estimate follows a normal distribution. Use a parametric bootstrap to determine the lower boundary of a 99%-confidence interval for the mean of the absolute value of X3?
 
 
- 
What is the plug-in estimate of the mean of the absolute value of 
- 
QuestionThe random variable follows a normal distribution with unknown mean and known standard deviation . According to your prior the following holds: - with probability 3/5,
- with probability 2/5.
 The probability for all other values of is zero. You have one observation, . In the following you can use dnormto calculate the density function of the normal distribution.E.g. dnorm(8,6,8)yields the density of the normal distribution for when and .
 - What is the posterior probability ?
- What is the posterior probability ?
- Now you have two observations: . What is the posterior probability ?
 
 
- 
QuestionThe file DMM.csv contains a variable X. ThisXis a sample of the random variable . Here we write the normal distribution as where is the mean and is the precision. You assume that follows a normal distribution: where is the variance of . You have the following priors: , ( denotes the Gamma distribution).To obtain the necessary precision, please use run.jagsdefaults. Please don’t change options or modules.
 - What is the lower boundary of the 95%-credible-interval for ?
- What is the upper boundary of the 95%-credible-interval for ?
- What is the lower boundary of the 95%-credible-interval for ?
- What is the upper boundary of the 95%-credible-interval for ?
- What is the posterior probability (a number between 0 and 1) of ?
- What is the posterior probability (a number between 0 and 1) of ?
 
 
- 
QuestionThe file DTN.csv contains an independent variable Fand a dependent binary variableV.You estimate the following model: where is the logistic distribution. 
 - What is your estimate for ?
- What is the marginal effect of for the average value of in your data?
- What is the average marginal effect of ?
- What is the marginal effect of if ?
- What are the odds for if ?
 
 
- 
QuestionThe file DST.csv contains two independent variables X1, andX2, and a dependent count variableY. (Hint: In the following you may find the libraryMASSuseful.)
 - 
Use a Poisson model where you explain Yas a function ofX1andX2. What is the coefficient ofX2?
- 
Now you use a negative binomial model to explain Yas a function ofX1andX2. What is now the coefficient ofX2?
- In the negative binomial model, what is your estimate for the parameter ?
- Your Null-hypothesis is that , i.e. that the negative binomial model does not significantly improve the goodness of fit of the Poisson model. Use a Likelihood-Ratio test to test this hypothesis. Which -value do you get (rounded to 4 decimal places)?
 
 
- 
Use a Poisson model where you explain 
- 
QuestionThe file DBJ.csv contains three variables, b, M and R. The variable bdenotes to which group observations belong.Ris our dependent variable.We write the normal distribution as where is the mean and is the precision. denotes the Gamma distribution. You use JAGS to estimate the following model with random effects: where the group specific random effect and the residual . Here and are the precision of and , respectively. Your priors are: , , . To obtain the necessary precision, please use run.jagsdefaults. Please don’t change the options or modules.
 - What is the 50%-quantile of your posterior for ?
- What is the 50%-quantile of your posterior for ?
- What is the posterior probability (a number between 0 and 1) of ?
 
 
- 
QuestionThe file DTZ.csv contains the variables X,YandZ. You estimate the model .
 - Use a standard OLS model to estimate the model. What is the coefficient of ?
- Provide a -value for the test of the Null-hypothesis that the coefficient of is zero (round to 4 decimal places)?
- 
Now use the variable  as an instrument for . Use the command ivregfrom the AER library to estimate the coefficient of for this model.
- For the instrumental variables model provide a -value for the test of the Null-hypothesis that the coefficient of is zero (round to 4 decimal places)?
 
 
- 
QuestionThe file DJX.csv contains nine independent variables, X1, X2, X3, X4, X5, X6, X7, X8, X9, and a dependent variableY. You estimate the following (full) model:
 - 
What is the coefficient of X7in the full model?
- 
You simplify the model and include only terms which are significant on a 5% level in the above estimation. You drop insignificant terms only once. If you find insignificant terms in your simplified model, you keep them. You also keep X7. What is now the coefficient ofX7?
- 
Use the function extractAICto obtain the AIC of this (simplified) model. (Note: the functionextractAICreturns two numbers. Only one of them is the AIC).
- 
Now you use the stepfunction to simplify the (full) model based on the AIC. If thestepfunction has removedX7from the model, addX7back to your model. What is the coefficient ofX7in this model?
- 
Use the function extractAICto obtain the AIC of this model.
 
 
- 
What is the coefficient of 
- 
QuestionThe file DYW.csv contains an independent variable Eand a dependent binary variableG.You estimate the following model: where is the standard normal distribution. 
 - What is your estimate for ?
- What is the marginal effect of for the average value of in your data?
- What is the average marginal effect of ?
- What is the marginal effect of if ?
 
 
- 
QuestionA random variable follows a distribution with density function if and otherwise. Your sample contains the observations . What is the Maximum-Likelihood estimator for ? 
 
- 
QuestionThe data in the file DHB.csv contains 8 variables, Yef, Yhm, Yun, Yzd, Xef, Xhm, Xun, Xzd. You investigate the effectXefhas onYef, the effectXhmhas onYhm, the effectXunhas onYun, the effectXzdhas onYzd. For each case below, select the most suitable specification and provide the point estimate of the effect.- Use a specification where Yzdchanges by a fixed amount whenXzdchanges by a given percentage. By which amount doesYzdchange whenXzdchanges by 1 percentage point?
- Use a specification where the elasticity of Yhmwith respect toXhmis constant. What is the elasticity ofYhmwith respect toXhm?
- Use a specification where Yefchanges by a fixed number of percentage points whenXefchanges by one unit. By how many percentage points doesYefchange approximately whenXefchanges by one unit?
- Use a specification where the marginal effect of XunonYunis constant. What is the marginal effect ofXunonYun?
 
 
 
- Use a specification where 
- 
QuestionThe file DTR.csv contains a variable M. ThisMis a sample of the random variable . You assume that follows a normal distribution: where is the variance of . Your priors are , . We write the normal distribution as where is the mean and is the precision. denotes the Gamma distribution.To obtain the necessary precision, please use run.jagsdefaults. Please don’t change options or modules.The last two questions belong to chapter 12 of the lecture! Remember that if is the probability of an event, then the odds are . 
 - What is the lower boundary of the 95%-credible-interval for ?
- What is the upper boundary of the 95%-credible-interval for ?
- What is the lower boundary of the 95%-credible-interval for ?
- What is the upper boundary of the 95%-credible-interval for ?
- What are the posterior odds of ?
- What are the posterior odds of ?
 
 
- 
QuestionUse the data from the file DSQ.csv. You are interested in the interquartile range of the variable X3. The interquartile range is the distance between the 25% and 75% quantiles. In R you can use the functionIQR(x)to determine the interquartile range ofx.
 - 
What is the plug-in estimate of the interquartile range of X3?
- Use a bootstrap (with 10000 replications) to determine the standard deviation of this estimate.
 
 
- 
What is the plug-in estimate of the interquartile range of 
- 
QuestionThe file DVT.csv contains four variables, K, W, X and v. The variable vdenotes the group to which an observation belongs.You compare the following two models: A standard OLS model and a model with a random effect. - Here is the OLS model:
 - Now you extend this model with a random effect :
 You use lmerfrom thelme4library to estimate the model with random effects.
 - What is your estimate for in the OLS model?
- What is your estimate for in the OLS model?
- What is your estimate for in the model with a random effect?
- What is your estimate for in the model with a random effect?
- What is your estimate for the standard deviation of the random effect ?