Oliver Kirchkamp
[A picture of Oliver Kirchkamp]

Data Wrangling, workflow and replicability

This course is offered within the context of the IMPRS BeSmart Summerschool.
Material
Synchronous teaching
10 August + 11 August
Motivation
A significant part of statistical analysis relies on preparation of data. Raw data must be understood by the researcher, it must be structured and it must be cleaned. Causal inference is often only a small part of the work. In this course we study which steps of data preparation are necessary for a paper like Christoph Engel. “Lucky you: Your case is heard by a seasoned panel—Panel effects in the German Constitutional Court.” Journal of Empirical Legal Studies. 2022. 1179-1221.

We will first discuss a number of tools. We will then give an example how to apply these tools.

  • Finding and replacing text, regular expression.
  • Reading time and Date
  • Working with HTML.
  • Working with funny dataset, repetition.
  • Applying these tools to read and clean data from the Federal Constitutional Court.
Tools
We will outline the example first in R. Participants should have installed R, an IDE for R (e.g. RStudio), and the libraries lubridate, stringr, dplyr, tidyr, parallel, ggplot2, mgcv, tidymv, httr, xml2, rvest, xtable.

If time permits, we will also discuss how to solve the problem with Python. For this part, participants should have installed Python, an IDE for Python, and the libraries pandas, numpy, dfply, time, locale.