Oliver Kirchkamp

Data Wrangling, workflow and replicability

This course is offered within the context of the IMPRS BeSmart Summerschool.

Material

Videos can be found here. We all have different backgrounds. I suggest that all participants have a look at the videos before the course starts. This should make a productive discussion during the course easier.
Here are the slides I use in the videos. Here are the R-commands.
Not part of the course, but related: Here you find more information on Workflow of statistical data analysis.
Also not part of the course: Here you find a brief introduction to R.

Synchronous teaching

10 August + 11 August

Motivation

A significant part of statistical analysis relies on preparation of data. Raw data must be understood by the researcher, it must be structured and it must be cleaned. Causal inference is often only a small part of the work. In this course we study which steps of data preparation are necessary for a paper like Christoph Engel. “Lucky you: Your case is heard by a seasoned panel—Panel effects in the German Constitutional Court.” Journal of Empirical Legal Studies. 2022. 1179-1221.

We will first discuss a number of tools. We will then give an example how to apply these tools.

Finding and replacing text, regular expression.
Reading time and Date
Working with HTML.
Working with funny dataset, repetition.
Applying these tools to read and clean data from the Federal Constitutional Court.

Tools

We will outline the example first in R. Participants should have installed R, an IDE for R (e.g. RStudio), and the libraries lubridate, stringr, dplyr, tidyr, parallel, ggplot2, mgcv, tidymv, httr, xml2, rvest, xtable.

If time permits, we will also discuss how to solve the problem with Python. For this part, participants should have installed Python, an IDE for Python, and the libraries pandas, numpy, dfply, time, locale.