[A picture of Oliver Kirchkamp]

Introduction to R (course offered within the context of the IMPRS BeSmart Summerschool)

Asynchronous teaching
Synchronous teaching
Daily exercises (16.8.-20.8.), 11:00-12:00.

During synchronous teaching we will use RStudio and the software mentioned below.

Objectives
R is a powerful statistical programming language. The course should enable participants understand the basic structure of this language.
Handout
Topics:
  1. Basics (Exercise on Mon., 16.8., 11:00)
    • Installing R, RStudio, Packages
    • Data Types, Numbers, Vectors, Matrices, Arrays.
  2. More on Data Types (Exercise on Tue., 17.8., 11:00)
    • Missings, Characters, Factors.
    • Lists, Data frames
    • Randomness
  3. Data and Functions (Exercise on Wed., 18.8., 11:00)
    • Example datasets.
    • Functions.
    • Closures.
  4. Graphs and Files (Exercise on Mon., Thu. 19.8., 11:00)
    • Introduction to Graphs.
    • Graphs for Univariate and Bivarate Data.
    • Files, Reading and Writing Data.
  5. Control Structures, Structuring Data (Exercise on Fri., 20.8., 11:00)
    • Pipes
    • Conditions, Loops,Repetition.
    • Structuring Data, Grouping, Summarising, Mutating.
    • Selecting Variables, Sorting, Joining, Reshaping Data.
    • Tables, Regression.
Software
For our practical examples (during the entire course) we will use the software environment R. I think that it is helpful to coordinate on one environment and R has the advantage of being free and rather powerful.
  • Documentation for R is provided via the built in help system but also through the R Homepage. Useful are
    • The R Guide, Jason Owen (Easy to read, explains R with the help of examples from basic statistics)
    • Simple R, John Verzani (Explains R with the help of examples from basic statistics)
    • Einführung in R, Günther Sawitzki (In German. Rather compact introduction.)
    • Econometrics in R, Grant V. Farnsworth (The introduction to R is rather compact and pragmatic.)
    • An Introduction to R, W. N. Venables und D. M. Smith (The focus is more on R as a programming language)
    • The R language definition (Concentrates only on R as a programming language.)
  • We will use the following packages: car, Ecdat, foreign, Hmisc, tidyverse, lattice. If, e.g., the command library(Ecdat) generates an error message (Error in library(Ecdat): There is no package called 'Ecdat'), you have to install the package.
    Installing packages with Microsoft Windows:
    With RStudio: Use the tab “Install”. Otherwise: Start Rgui.exe and install packages from the menu Packages / Install Packages).
    Installing packages from modern operating systems:
    From within R use the command install.packages("Ecdat"), e.g., to install the package Ecdat
  • In the lecture we will use RStudio as a front end.
RStudio
RStudio provides a front end to R, LaTeX, git and svn.

Exercises

Please send your answers to the following questions as an email to oliver@kirchkamp.de. Don’t attach any files to your email.

Exercise 1. Submit before Mon., 16.8., 10:30.

Install R and RStudio. Also install the package Ecdat from within RStudio. The command

help(package="Ecdat")

gives you a list of the datasets that are provided by the package Ecdat.

Can you find a dataset whose name starts with the same letter as your last name and which contains at least one variable that is a number? If there is no matching dataset, find one with the next letter in the alphabet. After the letter Z, continue with the letter A.

Exercise 2. Submit before Tue., 17.8., 10:30.

Following the same strategy as in the previous exercise: Find a dataset whose name starts with the same letter as your last name and that contains either a character variable or a variable that is a factor. If there is no matching dataset, proceed alphabetically, until you have found one that contains either a character variable or a variable that is a factor. Once you have reached the letter Z, continue with the letter A.

Exercise 3. Submit before Wed., 18.8., 10:30.

Now find a dataset that matches your first name and that includes at least one variable that is a number.

Exercise 4. Submit before Thu., 19.8., 10:30.

Find again a dataset that matches your first name and that includes at least two variables that are numbers.

In your answer, only include the commands, not the graph!

Exercise 5. Submit before Fri., 20.8., 10:30.

Find a dataset that matches your last name and that includes at least one variables that is a number, and a second variable that has fewer than 12 different values. I will call these (less than 12) values “cases”.