Workshop Recap: Open Research – Principles, Practices, and Implementation

On September 3 2024, Tobias Dienlin from the University of Vienna held the workshop Open Research – Principles, Practices, and Implementation at WI. In this workshop, he gave an overview of Open Research and its motivations, relevance, and formal and technical implementation.

In the first part of the workshop, Tobias argued that certain problems and values in science are the main reasons why researchers should practice Open Research. The problems included the replication crisis (a lack of or low quality of replication studies, especially in the social sciences), questionable research practices (p-hacking, HARKing, errors), and publication bias (journals prefer exciting, expected, and significant results). The values in question included openness as a foundation of science itself and the dedication to scientific advancement instead of emphasizing individuals that achieve it.

In the second part, the formal practices of Open Research were discussed. Tobias first clarified the differences between the terms Open Science, Open Research, and Open Scholarship. To achieve a culture of Open Research, he suggested aiming for open access, pre-/post-printing, open reviews, author contribution statements, open teaching, and citizen science. While these practices ususally require additional work, the burden can be lowered by already considering and preparing them in the initial stages of a research project. For instance, by implementing two of the most important Open Research practices: Preregistrations and registered reports.

  • In a preregistration, any details of a study that are already fixed (e.g., theoretical foundation, research questions, hypotheses, analysis methods, …) are published before conducting the study itself. After conducting the study, the preregistration is referred to in the manuscript, and possible deviations from it are explained. This procedure reduces the possibility and risk of p-hacking and HARKing, and under specific circumstances a preregistration can even take place after the data have already been collected.
  • A registered report is a more elaborate version of a preregistration. It consists of all parts of a submission that do not involve the analysis and the results. The submission can therefore be reviewed before the data and results even exist. This way, reviewers are not influenced by results and publication bias can be avoided. While a preregistration can be published anywhere, the registered report format needs to be offered by the journal itself.

In the last part of the workshop, the focus was on tools and software that help implement Open Research practices. For example, the free-to-use repository OSF can be used for pre-/post-prints, preregistrations, and online supplementary materials such as data, analysis code, or questionnaires. As an exercise, Tobias gave participants the opportunity to implement a basic preregistration or registered report on OSF for a research project they were working on already and try different features, such as linking it to a repository on GitHub. After summarizing the insights of the workshop, Tobias concluded by showing a fitting statement:

Open Science: Just Science Done Right.

During the workshop, participants had plenty of space to ask questions, discuss with everyone or in separate breakout rooms, and interact in various ways. We would like to thank Tobias for this insightful workshop and strongly encourage the implementation of Open Research.

Recap: Digital Methods Colloquium (December 7, 2023)

Digital and computational data collection and analysis methods such as mobile/internet tracking, experience sampling, web scraping, text mining, machine learning, and image recognition have become more relevant than ever in the social sciences. While these methods enable new avenues of inquiry, they also present many challenges. It is important to share and discuss research, experiences, and challenges surrounding these methods with other researchers to exchange ideas and to learn from experiences.

For this reason, Roland Toth from the Methods Lab and research fellow Douglas Parry organized the Digital Methods Colloquium that took place on December 7 at the Weizenbaum Institute. They invited researchers from all over Germany who had used such methods before. The focus lied on sharing not only successes, but – even more so – the challenges that they had experienced in the research process.

In the first part of the colloquium, participants presented recent or past research projects for which they had used digital methods. The presentations covered various methods, including experience sampling, mobile logging/tracking, multimodal content classification, network analysis, and large language models. All presentations were received very well and led to high engagement with many questions and exchanges from the participants.

The second part of the colloquium was designed to facilitate interactive discussion and knowledge sharing among the participants. They were assigned to one of two discussion groups that focused on either data collection or data analysis in the context of digital methods. In each group, participants followed prompts and discussed urgent issues and possible solutions, which they then visualized using posters. Finally, both groups sat together and presented the posters to each other, leading to a final discussion. After a short wrap-up, some participants joined the hosts at the Christmas Market for a well-deserved hot beverage.

The hosts would like to thank all participants for attending and engaging in the Digital Methods Colloquium. Bringing together researchers from different fields demonstrated that there are more commonalities than differences when it comes to the challenging and exciting field of digital methods. We are looking forward to more exchange and, possibly, Part 2 of the Digital Methods Colloquium sometime in the future.

Workshop Recap: A Practical Introduction to Text Analysis (November 30, 2023)

On November 30th, 2023, the Methods Lab organized a workshop on quantitative text analysis. The workshop was conducted by Douglas Parry (Stellenbosch University) and covered the whole process of text analysis from data preparation to the visualization of sentiments or topics identified.

In the first half of the workshop, Douglas covered the first steps involved in text analysis, such as tokenization (the transformation of texts into smaller parts like single words or consecutive words), the removal of “stop words” (words that do not contain meaningful information), and the aggregation of content by meta-information (authors, books, chapters, etc.). Apart from the investigation of the frequency with which terms occur, sentiment analysis using existing dictionaries was also addressed. This technique involves assigning values to each word representing certain targeted characteristics (e.g., emotionality/polarity), which in turn allows for comparing overall sentiments between different corpora. Finally, the visualization of word occurrences and sentiments was covered. After this introduction, participants had the chance to apply their knowledge using the programming language R by solving tasks with texts Douglas provided.

In the second half of the workshop, Douglas focused on different methods of topic modeling, which ultimately attempt to assign texts to latent topics based on the words they contain. In comparison to simpler procedures covered in the first half of the workshop, topic models can also consider the context of words within the texts. Specifically, Douglas introduced participants to Latent Dirichlet Allocation (LDA), Correlated Topic Modeling (CTM), and Structural Topic Modeling (STM). One of the most important decisions to be made for any such model is the number of topics to emerge: too few may dilute nuances within topics and too many may lead to redundancies. The visualization and – most importantly – limitations of topic modeling were also discussed before participants performed topic modeling themselves with the data provided earlier. Finally, Douglas concluded with a summary of everything covered and an overview of advanced subjects in text analysis.

The workshop was very well-received and prepared all participants for text analysis in the future. Douglas balanced lecture-style sections and well-prepared, hands-on application very well and provided all materials in a way that participants could focus on the tasks at hand, while following a logical structure throughout. We would like to thank him for this great introduction to text analysis!

Workshop Recap: Theory Construction – Building and Advancing Theories for Empirical Social Science (September 14, 2023)

On September 14th, 2023, the Methods Lab organized a workshop on the rationale and methodology of theory building in empirical research. The workshop was conducted by Adrian Meier (U of Erlangen-Nürnberg) and aimed to provide participants with an orientation for working with theories in a meaningful way that provides a foundation for empirical research.

In the first section of the workshop, Adrian outlined what theories are and how they relate to the overarching mission of science. The introduction focused on the differentiation between theories, concepts, constructs, and models and addressed the interplay between theories and empirical research.

After this introduction, the focus shifted to challenges and problems of social scientific theorizing. Participants were given the opportunity to add issues and questions they identified in the past when working with theories. Most prominently, they mentioned confusion due to different terminology that is used for specific concepts (i.e., synonymy and ambiguity), the “moving target” problem (as phenomena are changing while they are being studied), and the lack of incentivization to focus on theory in the formalized infrastructure of empirical research. Adrian elaborated on some of the underlying issues uniting many of these challenges: Theories are underdetermined by evidence, concepts and measurement instruments are rarely validated, and manipulations in experimental research are not precise enough.

In the last section of the workshop, participants learned about a recently proposed Theory Construction Methodology (Borsboom et al., 2021) and took part in an accompanying exercise. They were asked to read a one-pager summarizing crucial elements of the Mood Management Theory, a popular theory in the field of media psychology. Within this text, they should identify statements about phenomena the theory is supposed to explain, data that supported it (or not), as well as the theoretical statements (e.g., premises, propositions) themselves, to increase participants‘ sensitivity in differentiating between these elements in their own work. Lastly, Adrian gave an outlook on how theories can be formalized and how theory construction can be crucially fostered by non-confirmatory research practices.

The workshop was a great and unconventional addition to this year’s series of workshops organized by the Methods Lab. Adrian structured and executed it brilliantly and gave participants – who were associated with various fields of research and very engaged – lots of room for discussions.

We would like to thank Adrian for his thorough and inspiring workshop and hope he will contribute to the Methods Lab program again in the future. In the meantime, we recommend following him on X for updates on his research!

Launch of the Weizenbaum Panel Data Explorer

We are excited to announce the launch of the Weizenbaum Panel Data Explorer, an interactive website developed by Methods Lab member Roland Toth. The Data Explorer allows you to browse and analyze survey results from the annual survey conducted by the Weizenbaum Panel on media use, political participation, civic norms, and more. In the spirit of open science, it not only presents research data, but also in an easy-to-use manner.

The Weizenbaum Panel aims to shed light on the complex relationship between the digital realm and political engagement. By examining phenomena such as hate speech and fake news, as well as the active commitment to a democratic culture of debate, the telephone survey offers invaluable insights into the ever-evolving dynamics of citizen participation in Germany.

With the launch of Data Explorer, you can explore this comprehensive dataset and gain a deeper understanding of Germany’s social and political landscape. The platform offers six categories: social media platform use, political attitudes, civic norms, political participation, and online civic intervention. Each category presents a unique perspective, allowing you to examine specific aspects of Germany’s social and political fabric.

To begin your exploration, simply select a category that piques your interest. Within each category, you will find a selection of questions to delve into. Whether you want to gauge the political news media consumption of the German public, analyze trends in the use of video platforms such as TikTok and Instagram, or find out how often people discuss political issues at work, or with friends and family, the Data Explorer will assist you in this endeavor.

For a nuanced understanding of how different groups within the population engage in social and political activities, you can group the data output by selecting the demographic factors gender, age, level of education, or residence. Moreover, to enhance your experience and facilitate data sharing, you can download any graph in .png format. Each graph includes the question, answering options, and grouping, providing a comprehensive visual representation of the desired data.

The Weizenbaum Data Explorer was developed in Python/Jupyterhub and deployed using Voilà, which are all open-source. It is hosted on Weizenbaum Institute servers, which ensures adequate data protection. This is not the case for typical solutions such as using R Shiny and the deployment platform shinyapps.io. The Data Explorer will be expanded continuously – for example, the fourth wave of the Weizenbaum Panel will be integrated soon.

Whether you’re a researcher, journalist, student, or simply someone curious about Germany’s social and political landscape, the Weizenbaum Panel Data Explorer equips you with the tools to visualize data effortlessly. Happy exploring!

Research stay at Universidad de Navarra (Pamplona, Spain)

From April 17-23, Methods Lab Data Scientist Roland Toth spent a week at the Institute for Culture and Society (ICS) at Universidad de Navarra in Pamplona, Spain. This flash visiting researcher stay was financed and took place in the context of their project Youth in Transition in which they have collected data every year for four years in a representative sample of the Spanish population. These data include various information on smartphone use, smartphone pervasiveness, and psychological traits.

Together with the researchers Aurelio Fernández, Javier García-Manglano, and Pedro de la Rosa, Roland wrote a first draft of a research article using these data. As mobile media use is typically measured using indicators of use quantity (duration and frequency) alone, the paper deals with the question whether qualitative dimensions of mobile media use should be involved in its measurement, too. Specifically, the researchers are investigating the role of gratification variety (e.g., for information, social contact, or escapism) and situation variety (e.g., while in a meeting, while watching a movie, or while eating). Both represent defining characteristics of mobile media devices like the smartphone, as we typically use them for various purposes, anytime, and anywhere. For conceptual validation, the researchers examine whether these two qualitative dimensions contribute substantially to predicting the concept of mobile vigilance – the constant salience of mobile media devices and an urge to monitor and remain reactive to them. As such vigilance is tied to mobile media use per definition and emerged in close alignment to its development, it is bound to be associated with smartphone use. In other words: If gratification and situation of smartphone use can explain a share of mobile vigilance that remains unexplained by the quantity of smartphone use, this indicates that both dimensions are substantial to the measurement of mobile media use. The researchers are currently finalizing the article.

Inviting Roland for this stay was a generous gesture of ICS and the researchers and the institute were very welcoming and engaged in the project during his stay. Aside from the productive cooperation, our colleague was delighted with the beautiful campus and the equally charming city of Pamplona (and Donostia-San Sebastián), where spring had actually begun already. We hope that the article can be published successfully and that the cooperation between ICS at Universidad de Navarra and the Methods Lab of the Weizenbaum Institute will continue in future projects!

Workshop Recap: Introduction to Programming and Data Analysis with R

From March 29-30, the Methods Lab organized a workshop on the use of the programming language R for working with data, led by Roland Toth. The focus was on the main principles of programming in order to understand what is happening under the hood when working with data.

Day 1 focused on the advantages of using a programming language to work with data over dedicated software such as SPSS or Stata. In the course of this, the most important principles of programming in a research context, such as functions, classes, objects, vectors, and data frames were covered. Before going into the specific tasks in the context of data analysis, the markup language Markdown in combination with R was first introduced. This allows data analyses to be not only performed, but also reported in a directly reproducible and seamlessly interrelated manner, so that entire research papers can be written using R and Markdown. The day concluded by covering the key steps and techniques in data wrangling and performing calculations of typical descriptive and inferential statistical measures, tests, and models. At the end of each section of the day, there were small tasks to be solved by the participants to apply what they had learned.

On Day 2, the data analysis section was wrapped up with a demonstration of numerous visualization methods. This was followed by a longer section in which participants were allowed to think about their own research question based on a freely available data set from the European Social Survey (ESS) and answer it in R using all the techniques they had learned. They were supported by the workshop leader, since at the beginning of working with a programming language there are often many small, unforeseen problems that can quickly lead to frustration without prior experience. Lastly, an outlook was given on what techniques and packages to familiarize oneself with once beginning to dive deeper into data analysis in R and programming in general (for example, custom functions, loops, and pipes). The workshop was concluded with a Q&A where remaining questions could be asked.

For the purpose of optimizing the training offered by the Methods Lab, a short, anonymous evaluation was conducted at the very end of the workshop. Thankfully, the participants were very satisfied with the workshop throughout and only commented that more frequent and smaller tasks might have been even better. Although this is in parts difficult to reconcile with the concept of the workshop, this feedback is appreciated and will be used to improve future offerings in this regard.

The Methods Lab would like to thank all participants for their participation and commitment and hopes that the skills learned will be of benefit to them in future research projects and other application scenarios.

ECPR Winter School: Machine Learning with Big Data for Social Scientists

From February 6–10, Methods Lab member Roland Toth attended the online course Machine Learning with Big Data for Social Scientists at ECPR Winter School.

The goal was to gain a deeper insight into certain machine learning methods and to be able to apply them to social science questions in particular. It was also about efficiency in handling large data sets so that they can still be processed with high performance.

Numerous materials were made available for the workshop in advance. There were videos for each session in which presentation slides on the respective topics of the session were presented in the style of a lecture. These were accompanied by appropriate literature and studies. On each of the workshop days, there were two-hour live sessions in which the content of the videos was repeated and the application of the principles was practiced live.

The first step was to set up RStudio Server on the Amazon Web Services (AWS) cloud service. This offloads the entire RStudio environment from one’s own machine, allowing handling data and calculations without burdening local resources.

Furthermore, work with the package collection tidyverse was deepened. Here, among other things, it turned out that the function vrooom from the package of the same name provides faster import of larger data sets than similar functions. In addition, it was discussed how to access external data sets directly from RStudio via SQL syntax, so that it is not necessary to import the full data sets at all.

For illustrative purposes, data sets on COVID vaccination status and election outcomes in the United States were used during the workshop. Respectively, the observations were clustered at different levels (state, county, …), which rendered the merging of the data sets difficult. Besides typical functions of data wrangling (filtering, grouping, aggregating, mapping, merging), some special machine learning methods were discussed. Here, the logic of the procedure was first demonstrated using simple linear regression models: A model is trained with a (smaller) training data set and then applied to a (larger) test data set. The model is supposed to accurately predict the outcome, but not as accurately as to overfit to the training data and perform badly on the test data – in the end, it was a question of a balance between variance and bias. During the workshop, this principle was also applied to LASSO and Ridge regression, logistic regression, and classification methods such as Support Vector Machines, Decision Trees, and Random Forests.

All in all, it was a good introduction to working with machine learning methods. However, there was limited focus on the decision criteria for choosing certain methods over others, and a strong focus on the technical implementation of the methods in R. Nevertheless, the workshop was able to clarify some open questions and provide some new techniques that will help when working with larger datasets and in data analysis.