Research stay at Universidad de Navarra (Pamplona, Spain)

From April 17-23, Methods Lab Data Scientist Roland Toth spent a week at the Institute for Culture and Society (ICS) at Universidad de Navarra in Pamplona, Spain. This flash visiting researcher stay was financed and took place in the context of their project Youth in Transition in which they have collected data every year for four years in a representative sample of the Spanish population. These data include various information on smartphone use, smartphone pervasiveness, and psychological traits.

Together with the researchers Aurelio Fernández, Javier García-Manglano, and Pedro de la Rosa, Roland wrote a first draft of a research article using these data. As mobile media use is typically measured using indicators of use quantity (duration and frequency) alone, the paper deals with the question whether qualitative dimensions of mobile media use should be involved in its measurement, too. Specifically, the researchers are investigating the role of gratification variety (e.g., for information, social contact, or escapism) and situation variety (e.g., while in a meeting, while watching a movie, or while eating). Both represent defining characteristics of mobile media devices like the smartphone, as we typically use them for various purposes, anytime, and anywhere. For conceptual validation, the researchers examine whether these two qualitative dimensions contribute substantially to predicting the concept of mobile vigilance – the constant salience of mobile media devices and an urge to monitor and remain reactive to them. As such vigilance is tied to mobile media use per definition and emerged in close alignment to its development, it is bound to be associated with smartphone use. In other words: If gratification and situation of smartphone use can explain a share of mobile vigilance that remains unexplained by the quantity of smartphone use, this indicates that both dimensions are substantial to the measurement of mobile media use. The researchers are currently finalizing the article.

Inviting Roland for this stay was a generous gesture of ICS and the researchers and the institute were very welcoming and engaged in the project during his stay. Aside from the productive cooperation, our colleague was delighted with the beautiful campus and the equally charming city of Pamplona (and Donostia-San Sebastián), where spring had actually begun already. We hope that the article can be published successfully and that the cooperation between ICS at Universidad de Navarra and the Methods Lab of the Weizenbaum Institute will continue in future projects!

Workshop Recap: Introduction to Programming and Data Analysis with R

From March 29-30, the Methods Lab organized a workshop on the use of the programming language R for working with data, led by Roland Toth. The focus was on the main principles of programming in order to understand what is happening under the hood when working with data.

Day 1 focused on the advantages of using a programming language to work with data over dedicated software such as SPSS or Stata. In the course of this, the most important principles of programming in a research context, such as functions, classes, objects, vectors, and data frames were covered. Before going into the specific tasks in the context of data analysis, the markup language Markdown in combination with R was first introduced. This allows data analyses to be not only performed, but also reported in a directly reproducible and seamlessly interrelated manner, so that entire research papers can be written using R and Markdown. The day concluded by covering the key steps and techniques in data wrangling and performing calculations of typical descriptive and inferential statistical measures, tests, and models. At the end of each section of the day, there were small tasks to be solved by the participants to apply what they had learned.

On Day 2, the data analysis section was wrapped up with a demonstration of numerous visualization methods. This was followed by a longer section in which participants were allowed to think about their own research question based on a freely available data set from the European Social Survey (ESS) and answer it in R using all the techniques they had learned. They were supported by the workshop leader, since at the beginning of working with a programming language there are often many small, unforeseen problems that can quickly lead to frustration without prior experience. Lastly, an outlook was given on what techniques and packages to familiarize oneself with once beginning to dive deeper into data analysis in R and programming in general (for example, custom functions, loops, and pipes). The workshop was concluded with a Q&A where remaining questions could be asked.

For the purpose of optimizing the training offered by the Methods Lab, a short, anonymous evaluation was conducted at the very end of the workshop. Thankfully, the participants were very satisfied with the workshop throughout and only commented that more frequent and smaller tasks might have been even better. Although this is in parts difficult to reconcile with the concept of the workshop, this feedback is appreciated and will be used to improve future offerings in this regard.

The Methods Lab would like to thank all participants for their participation and commitment and hopes that the skills learned will be of benefit to them in future research projects and other application scenarios.

ECPR Winter School: Machine Learning with Big Data for Social Scientists

From February 6–10, Methods Lab member Roland Toth attended the online course Machine Learning with Big Data for Social Scientists at ECPR Winter School.

The goal was to gain a deeper insight into certain machine learning methods and to be able to apply them to social science questions in particular. It was also about efficiency in handling large data sets so that they can still be processed with high performance.

Numerous materials were made available for the workshop in advance. There were videos for each session in which presentation slides on the respective topics of the session were presented in the style of a lecture. These were accompanied by appropriate literature and studies. On each of the workshop days, there were two-hour live sessions in which the content of the videos was repeated and the application of the principles was practiced live.

The first step was to set up RStudio Server on the Amazon Web Services (AWS) cloud service. This offloads the entire RStudio environment from one’s own machine, allowing handling data and calculations without burdening local resources.

Furthermore, work with the package collection tidyverse was deepened. Here, among other things, it turned out that the function vrooom from the package of the same name provides faster import of larger data sets than similar functions. In addition, it was discussed how to access external data sets directly from RStudio via SQL syntax, so that it is not necessary to import the full data sets at all.

For illustrative purposes, data sets on COVID vaccination status and election outcomes in the United States were used during the workshop. Respectively, the observations were clustered at different levels (state, county, …), which rendered the merging of the data sets difficult. Besides typical functions of data wrangling (filtering, grouping, aggregating, mapping, merging), some special machine learning methods were discussed. Here, the logic of the procedure was first demonstrated using simple linear regression models: A model is trained with a (smaller) training data set and then applied to a (larger) test data set. The model is supposed to accurately predict the outcome, but not as accurately as to overfit to the training data and perform badly on the test data – in the end, it was a question of a balance between variance and bias. During the workshop, this principle was also applied to LASSO and Ridge regression, logistic regression, and classification methods such as Support Vector Machines, Decision Trees, and Random Forests.

All in all, it was a good introduction to working with machine learning methods. However, there was limited focus on the decision criteria for choosing certain methods over others, and a strong focus on the technical implementation of the methods in R. Nevertheless, the workshop was able to clarify some open questions and provide some new techniques that will help when working with larger datasets and in data analysis.