Oliver Kirchkamp
[A picture of Oliver Kirchkamp]

Introduction to R

Tis course is offered within the context of the IMPRS BeSmart Summerschool.
Asynchronous teaching
Synchronous teaching
Daily exercises (16.8.-20.8.), 11:00-12:00.

During synchronous teaching we will use RStudio and the software mentioned below.

Objectives
R is a powerful statistical programming language. The course should enable participants understand the basic structure of this language.
Handout
Topics:
  1. Basics
    • Installing R, RStudio, Packages
    • Data Types, Numbers, Vectors, Matrices, Arrays.
  2. More on Data Types
    • Missings, Characters, Factors.
    • Lists, Data frames
    • Randomness
  3. Data and Functions
    • Example datasets.
    • Functions.
    • Closures.
  4. Graphs and Files
    • Introduction to Graphs.
    • Graphs for Univariate and Bivarate Data.
    • Files, Reading and Writing Data.
  5. Control Structures, Structuring Data
    • Pipes
    • Conditions, Loops,Repetition.
    • Structuring Data, Grouping, Summarising, Mutating.
    • Selecting Variables, Sorting, Joining, Reshaping Data.
    • Tables, Regression.
Software
For our practical examples (during the entire course) we will use the software environment R. I think that it is helpful to coordinate on one environment. R is free, it is very powerful, and it is popular in the field.
  • Documentation for R is provided throught the built in help. You also find support on the R Homepage. You might find the following useful:
    • The R Guide, Jason Owen (Easy to read, explains R with the help of examples from basic statistics)
    • Simple R, John Verzani (Explains R with the help of examples from basic statistics)
    • Einführung in R, Günther Sawitzki (In German. Rather compact introduction.)
    • Econometrics in R, Grant V. Farnsworth (The introduction to R is rather compact and pragmatic.)
    • An Introduction to R, W. N. Venables und D. M. Smith (The focus is more on R as a programming language)
    • The R language definition (Concentrates only on R as a programming language.)
  • You can download R from the homepage of the R-project.
    Installing R with Microsoft Windows:
    Download and start the Installer. Install R on your local drive. Installing on a network drive or in the cloud (Dropbox, Onedrive,...) is possible but not recommended.
    Installing R with GNU-Linux:
    Follow the advice to install R for your distribution.
    Installing R with MacOS X:
    Here is a guide to install R with MacOS X.
  • In the lecture we use RStudio as a front end.
  • We will use the following packages: car, Ecdat, foreign, Hmisc, tidyverse, lattice.

    If, e.g., the command library(Ecdat) generates an error message (Error in library(Ecdat): There is no package called 'Ecdat'), you have to install the package.

    Installing packages with Microsoft Windows:
    With RStudio: Use the tab “Install”. Otherwise: Start Rgui.exe and install packages from the menu Packages / Install Packages).
    Installing packages from GNU-Linux or MacOS X:
    From within R use the command install.packages("Ecdat"), e.g., to install the package Ecdat
RStudio
RStudio provides a front end to R, LaTeX, git and svn.

Exercises

Please send your answers to the following questions as an email to oliver@kirchkamp.de. Don’t attach any files to your email.

Exercise 1. Submit before Mon., 16.8., 10:30.

Install R and RStudio. Also install the package Ecdat from within RStudio. The command

help(package="Ecdat")

gives you a list of the datasets that are provided by the package Ecdat.

Can you find a dataset whose name starts with the same letter as your last name and which contains at least one variable that is a number? If there is no matching dataset, find one with the next letter in the alphabet. After the letter Z, continue with the letter A.

Exercise 2. Submit before Tue., 17.8., 10:30.

Following the same strategy as in the previous exercise: Find a dataset whose name starts with the same letter as your last name and that contains either a character variable or a variable that is a factor. If there is no matching dataset, proceed alphabetically, until you have found one that contains either a character variable or a variable that is a factor. Once you have reached the letter Z, continue with the letter A.

Exercise 3. Submit before Wed., 18.8., 10:30.

Now find a dataset that matches your first name and that includes at least one variable that is a number.

Exercise 4. Submit before Thu., 19.8., 10:30.

Find again a dataset that matches your first name and that includes at least two variables that are numbers.

In your answer, only include the commands, not the graph!

Exercise 5. Submit before Fri., 20.8., 10:30.

Find a dataset that matches your last name and that includes at least one variables that is a number, and a second variable that has fewer than 12 different values. I will call these (less than 12) values “cases”.