 #### Introduction to R (course offered within the context of the IMPRS BeSmart Summerschool)

Asynchronous teaching
Synchronous teaching
Daily exercises (16.8.-20.8.), 11:00-12:00.

During synchronous teaching we will use RStudio and the software mentioned below.

Objectives
R is a powerful statistical programming language. The course should enable participants understand the basic structure of this language.
Handout
Topics:
1. Basics (Exercise on Mon., 16.8., 11:00)
• Installing R, RStudio, Packages
• Data Types, Numbers, Vectors, Matrices, Arrays.
2. More on Data Types (Exercise on Tue., 17.8., 11:00)
• Missings, Characters, Factors.
• Lists, Data frames
• Randomness
3. Data and Functions (Exercise on Wed., 18.8., 11:00)
• Example datasets.
• Functions.
• Closures.
4. Graphs and Files (Exercise on Mon., Thu. 19.8., 11:00)
• Introduction to Graphs.
• Graphs for Univariate and Bivarate Data.
• Files, Reading and Writing Data.
5. Control Structures, Structuring Data (Exercise on Fri., 20.8., 11:00)
• Pipes
• Conditions, Loops,Repetition.
• Structuring Data, Grouping, Summarising, Mutating.
• Selecting Variables, Sorting, Joining, Reshaping Data.
• Tables, Regression.
Software
For our practical examples (during the entire course) we will use the software environment R. I think that it is helpful to coordinate on one environment and R has the advantage of being free and rather powerful.
• Documentation for R is provided via the built in help system but also through the R Homepage. Useful are
• The R Guide, Jason Owen (Easy to read, explains R with the help of examples from basic statistics)
• Simple R, John Verzani (Explains R with the help of examples from basic statistics)
• Einführung in R, Günther Sawitzki (In German. Rather compact introduction.)
• Econometrics in R, Grant V. Farnsworth (The introduction to R is rather compact and pragmatic.)
• An Introduction to R, W. N. Venables und D. M. Smith (The focus is more on R as a programming language)
• The R language definition (Concentrates only on R as a programming language.)
• We will use the following packages: `car, Ecdat, foreign, Hmisc, tidyverse, lattice`. If, e.g., the command `library(Ecdat)` generates an error message (`Error in library(Ecdat): There is no package called 'Ecdat'`), you have to install the package.
Installing packages with Microsoft Windows:
With RStudio: Use the tab “Install”. Otherwise: Start `Rgui.exe` and install packages from the menu `Packages / Install Packages`).
Installing packages from modern operating systems:
From within R use the command `install.packages("Ecdat")`, e.g., to install the package `Ecdat`
• In the lecture we will use RStudio as a front end.
RStudio
RStudio provides a front end to R, LaTeX, git and svn.

#### Exercises

Please send your answers to the following questions as an email to `oliver@kirchkamp.de`. Don’t attach any files to your email.

#### Exercise 1. Submit before Mon., 16.8., 10:30.

Install R and RStudio. Also install the package `Ecdat` from within RStudio. The command

``help(package="Ecdat")``

gives you a list of the datasets that are provided by the package `Ecdat`.

Can you find a dataset whose name starts with the same letter as your last name and which contains at least one variable that is a number? If there is no matching dataset, find one with the next letter in the alphabet. After the letter `Z`, continue with the letter `A`.

• In your answer include you name and the name of the dataset.

• How many rows and how many columns does the dataset have?

• Choose one variable in the dataset which is a number. With which `R` command can you calculate the mean of this variable?

#### Exercise 2. Submit before Tue., 17.8., 10:30.

Following the same strategy as in the previous exercise: Find a dataset whose name starts with the same letter as your last name and that contains either a character variable or a variable that is a factor. If there is no matching dataset, proceed alphabetically, until you have found one that contains either a character variable or a variable that is a factor. Once you have reached the letter `Z`, continue with the letter `A`.

• How many variables in the dataset are characters? How many are factors?

• How can you find out whether any variable contains any missing values?

#### Exercise 3. Submit before Wed., 18.8., 10:30.

Now find a dataset that matches your first name and that includes at least one variable that is a number.

• In your answer, include your name, the name of the dataset, and the name of the variable.

• Find the median of this variable.

• Read the help page for the function `quantile`. How can you use the `quantile` function to find the median of the above variable?

• Write a function `Quantile` that behaves similar to `quantile`, except that it has different defaults. The function `Quantile` should, if the parameter `probs` is not specified, only return the minimum, the maximum and the median.

#### Exercise 4. Submit before Thu., 19.8., 10:30.

Find again a dataset that matches your first name and that includes at least two variables that are numbers.

• In your answer, include your name, the name of the dataset, and the name of the two variables.

• With which command can you produce a graph that shows the joint distribution of these two variables?

• With which command can you produce a graph that shows only the distribution of the first variable?