Data analysis – WI Methods Lab

Workshop Recap: Introduction to Programming and Data Analysis with R

April 3, 2025April 4, 2025 Diana Ignatovich

A third edition on the Introduction to Programming and Data Analysis with R workshop took place on March 12th and 13th, 2025. Roland Toth with the Methods Lab at the Weizenbaum Institute engaged almost 20 participants with essential methods of data analysis via comprehensive coverage of fundamental R programming concepts and techniques.

Roland asks participants about their former experience with programming

On the first day, Roland guided participants through the basics of R syntax and its integration with Markdown/Quarto in an interactive environment. This included the very basics of programming like functions, objects, and indexing, but also data-related practices like data wrangling, sanity checks, and simple statistical analyses. Among others, participants also gained insight on managing warnings and errors that might stunt the process of coding throughout projects.

On day two, after an introduction to data visualization techniques, participants put their learning into practice: They explored provided survey data and developed a research question, so they could prepare and statistically analyze the data accordingly in R. The result was a reproducible HTML report on the reasoning behind the research question, all data wrangling steps, an exploration of the data set, the analysis, and the results including an interpretation. Attendees also supported each other’s progress whenever possible, while Roland offered personalized guidance.

The workshop alternated between lecture-like and interactive formats

The workshop concluded with a thorough review of useful functions and packages in R. Throughout the event, participants were encouraged to ask questions freely and frequently, and they took the opportunity. The Methods Lab would like to give a great thanks to all guests for their attendance and lively participation!

Workshop Recap: Introduction to Programming and Data Analysis with R

April 22, 2024July 24, 2024 Anna Hohwü-Christensen

On April 10th and 11th, The Methods Lab organized the second edition of the workshop Introduction to Programming and Data Analysis with R. Led by Roland Toth from the Methods Lab, the workshop was designed to equip participants with fundamental R programming skills essential for data wrangling and analysis.

Roland Toth introduces participants to data wrangling with R

Across two days, attendees engaged in a comprehensive exploration of R fundamentals, covering topics such as RStudio, Markdown, data wrangling, and practical data analysis. Day one focused on laying the groundwork, covering the main concepts in programming including functions, classes, objects, and vectors. Participants were also familiarized with Markdown and Quarto, enabling them to include analysis results while producing text, and the key steps and techniques of data wrangling.

Participants work on their own research questions during the practical exercise

The first half of the second day was dedicated to showcasing and exploring basic data analysis and various visualization methods. Afterwards, participants had the opportunity to put into practice the knowledge they had gained from the previous day by working with a dataset to formulate and address their own research questions. Roland was on hand to offer assistance and guidance to the participants, addressing any challenges or concerns that arose along the journey.

Christian Strippel presents first results

The workshop fostered a collaborative learning environment, with lively discussions and ample questions from all. We thank all participants for their active involvement!

Workshop: Introduction to Programming and Data Analysis with R (April 10-11, 2024)

February 27, 2024February 28, 2024 Methods Lab

Level: Beginner/Intermediate
Category: Data Analysis

After being well received last year, we’re happy to announce the return of our workshop Programming and Data Analysis with R for its second edition. This two-day intensive workshop led by Roland Toth (WI) will take place on Wednesday, April 10, and Thursday, April 11, at the Weizenbaum Institute.

During the first day, attendees will receive comprehensive training in programming fundamentals, essential data wrangling techniques, and Markdown integration. The second day will center around data analysis, providing participants with the chance to engage directly with a dataset and address a research topic independently. A blend of concepts, coding techniques, and smaller practical tasks will be interspersed throughout both days to reinforce hands-on learning.

For more information, check out the program page!

Recap: Digital Methods Colloquium (December 7, 2023)

December 24, 2023March 13, 2024 Roland Toth

Digital and computational data collection and analysis methods such as mobile/internet tracking, experience sampling, web scraping, text mining, machine learning, and image recognition have become more relevant than ever in the social sciences. While these methods enable new avenues of inquiry, they also present many challenges. It is important to share and discuss research, experiences, and challenges surrounding these methods with other researchers to exchange ideas and to learn from experiences.

For this reason, Roland Toth from the Methods Lab and research fellow Douglas Parry organized the Digital Methods Colloquium that took place on December 7 at the Weizenbaum Institute. They invited researchers from all over Germany who had used such methods before. The focus lied on sharing not only successes, but – even more so – the challenges that they had experienced in the research process.

Fenne Große Deters (U of Potsdam) talking about the effects of smartphone use on sleep quality

In the first part of the colloquium, participants presented recent or past research projects for which they had used digital methods. The presentations covered various methods, including experience sampling, mobile logging/tracking, multimodal content classification, network analysis, and large language models. All presentations were received very well and led to high engagement with many questions and exchanges from the participants.

The second part of the colloquium was designed to facilitate interactive discussion and knowledge sharing among the participants. They were assigned to one of two discussion groups that focused on either data collection or data analysis in the context of digital methods. In each group, participants followed prompts and discussed urgent issues and possible solutions, which they then visualized using posters. Finally, both groups sat together and presented the posters to each other, leading to a final discussion. After a short wrap-up, some participants joined the hosts at the Christmas Market for a well-deserved hot beverage.

Patrick Zerrer (U of Bremen) talking about mobile usage patterns of young political activists

The hosts would like to thank all participants for attending and engaging in the Digital Methods Colloquium. Bringing together researchers from different fields demonstrated that there are more commonalities than differences when it comes to the challenging and exciting field of digital methods. We are looking forward to more exchange and, possibly, Part 2 of the Digital Methods Colloquium sometime in the future.

Workshop Recap: Introduction to Programming and Data Analysis with R

April 23, 2023March 13, 2024 Roland Toth

Day 1, section *Introduction to Markdown* (photo: Katharina Stefes).

From March 29-30, the Methods Lab organized a workshop on the use of the programming language R for working with data, led by Roland Toth. The focus was on the main principles of programming in order to understand what is happening under the hood when working with data.

Day 1 focused on the advantages of using a programming language to work with data over dedicated software such as SPSS or Stata. In the course of this, the most important principles of programming in a research context, such as functions, classes, objects, vectors, and data frames were covered. Before going into the specific tasks in the context of data analysis, the markup language Markdown in combination with R was first introduced. This allows data analyses to be not only performed, but also reported in a directly reproducible and seamlessly interrelated manner, so that entire research papers can be written using R and Markdown. The day concluded by covering the key steps and techniques in data wrangling and performing calculations of typical descriptive and inferential statistical measures, tests, and models. At the end of each section of the day, there were small tasks to be solved by the participants to apply what they had learned.

On Day 2, the data analysis section was wrapped up with a demonstration of numerous visualization methods. This was followed by a longer section in which participants were allowed to think about their own research question based on a freely available data set from the European Social Survey (ESS) and answer it in R using all the techniques they had learned. They were supported by the workshop leader, since at the beginning of working with a programming language there are often many small, unforeseen problems that can quickly lead to frustration without prior experience. Lastly, an outlook was given on what techniques and packages to familiarize oneself with once beginning to dive deeper into data analysis in R and programming in general (for example, custom functions, loops, and pipes). The workshop was concluded with a Q&A where remaining questions could be asked.

For the purpose of optimizing the training offered by the Methods Lab, a short, anonymous evaluation was conducted at the very end of the workshop. Thankfully, the participants were very satisfied with the workshop throughout and only commented that more frequent and smaller tasks might have been even better. Although this is in parts difficult to reconcile with the concept of the workshop, this feedback is appreciated and will be used to improve future offerings in this regard.

The Methods Lab would like to thank all participants for their participation and commitment and hopes that the skills learned will be of benefit to them in future research projects and other application scenarios.

Workshop: Introduction to Programming and Data Analysis with R (March 29-30, 2023)

March 6, 2023April 24, 2023 Methods Lab

Our second workshop, Programming and Data Analysis with R, will be held on March 29 and 30 at the Institute.

During the first day of the workshop, Roland Toth (WI) will introduce and establish the fundamentals of programming in R/R Studio, combining it with Markdown. Building of the first, the second day will be dedicated to applying this knowledge to data analysis and working on a custom research question. No previous experience is necessary.

You can find more information about the workshop on its program page.

ECPR Winter School: Machine Learning with Big Data for Social Scientists

February 22, 2023February 26, 2024 Roland Toth

From February 6–10, Methods Lab member Roland Toth attended the online course Machine Learning with Big Data for Social Scientists at ECPR Winter School.

The goal was to gain a deeper insight into certain machine learning methods and to be able to apply them to social science questions in particular. It was also about efficiency in handling large data sets so that they can still be processed with high performance.

Numerous materials were made available for the workshop in advance. There were videos for each session in which presentation slides on the respective topics of the session were presented in the style of a lecture. These were accompanied by appropriate literature and studies. On each of the workshop days, there were two-hour live sessions in which the content of the videos was repeated and the application of the principles was practiced live.

The first step was to set up RStudio Server on the Amazon Web Services (AWS) cloud service. This offloads the entire RStudio environment from one’s own machine, allowing handling data and calculations without burdening local resources.

Furthermore, work with the package collection tidyverse was deepened. Here, among other things, it turned out that the function vrooom from the package of the same name provides faster import of larger data sets than similar functions. In addition, it was discussed how to access external data sets directly from RStudio via SQL syntax, so that it is not necessary to import the full data sets at all.

For illustrative purposes, data sets on COVID vaccination status and election outcomes in the United States were used during the workshop. Respectively, the observations were clustered at different levels (state, county, …), which rendered the merging of the data sets difficult. Besides typical functions of data wrangling (filtering, grouping, aggregating, mapping, merging), some special machine learning methods were discussed. Here, the logic of the procedure was first demonstrated using simple linear regression models: A model is trained with a (smaller) training data set and then applied to a (larger) test data set. The model is supposed to accurately predict the outcome, but not as accurately as to overfit to the training data and perform badly on the test data – in the end, it was a question of a balance between variance and bias. During the workshop, this principle was also applied to LASSO and Ridge regression, logistic regression, and classification methods such as Support Vector Machines, Decision Trees, and Random Forests.

All in all, it was a good introduction to working with machine learning methods. However, there was limited focus on the decision criteria for choosing certain methods over others, and a strong focus on the technical implementation of the methods in R. Nevertheless, the workshop was able to clarify some open questions and provide some new techniques that will help when working with larger datasets and in data analysis.