Workshop: Interdisciplinarity in Action: Methods for Fruitful Teamwork (October 4, 2023)

We are excited to announce our upcoming workshop, “Interdisciplinarity in Action: Methods for Fruitful Teamwork,” scheduled for Wednesday, October 4, at the Weizenbaum Institute. Led by Silvio Suckow and Sara Saba (both WI), this intensive one-day workshop provides practical tools and knowledge for enhancing teamwork and interdisciplinary collaboration. The workshop offers diverse perspectives and actionable advice for structuring interdisciplinary teams and their work, hands-on practice of various team-building methods, and an input presentation by an external speaker. It is open to anyone interested in interdisciplinary research, whether leading or collaborating on such projects. Please note that spots are limited and allocated on a first-come, first-served basis. A slightly modified online version of the course will be offered separately.

For more details about the workshop, visit our program page. We look forward to seeing you there!

Workshop Recap: Introduction to Topic Modeling (June 15, 2023)

On June 15, the Methods Lab organized the workshop Introduction to Topic Modeling in collaboration with the research group Platform Algorithms and Digital Propaganda. The workshop aimed to provide participants with a comprehensive understanding of topic modeling – a machine-learning technique used to determine clusters of similar words (i.e., topics) within bodies of text. The event took place at the Weizenbaum Institute in a hybrid format, bringing together researchers from various institutions.

The workshop was conducted by Daniel Matter (TU Munich) who guided the participants through basic concepts and applications of this method. Through theory, demonstrations, and practical examples, participants gained insight into commonly used algorithms such as Latent Dirichlet Allocation (LDA) and BERT-based topic models. The workshop enabled participants to assess the advantages and drawbacks of each approach, equipping them with a foundation in topic modeling while, at the same time, providing plenty of new insights to those with prior expertise.

During the workshop, Daniel explained the distinction between LDA and BERTopic, two popular topic modeling strategies. LDA, or Latent Dirichlet Allocation, a commonly used method for topic modeling, operates as a generative model and treats each document as a mixture of topics. LDA aims to determine the topic and word distributions that maximize the probability of generating the documents in the corpus. With LDA, as opposed to BERTopic, the number of topics must be known beforehand.

BERTopic, on the other hand, belongs to the category of Embeddings-Based Topic Models (EBTM), which take a different approach. Unlike LDA, which treats words as distinct features, BERTopic incorporates semantic relationships between words. BERTopic follows a bottom-up approach, embedding documents in a semantic space and extracting topics from this transformed representation. Unlike LDA, which can be applied to short and long text corpora, BERTopic generally works better on shorter text, such as social media posts or news headlines.

When deciding between BERTopic and LDA, it is essential to consider the specific requirements of the text analysis. BERTopic’s strength lies in its flexibility and ability to handle short texts effectively, while LDA is preferred when strong interpretability is needed.

With this workshop, we at the Methods Lab hope to have provided our attendees with a solid understanding of topic modeling as a method. By exploring the concepts, applications, and advantages of each approach, these tools can be used to unlock hidden semantic structures within textual data, enabling researchers to employ them in various domains and facilitating tasks such as document clustering, information retrieval, and recommender systems.

A big thank you to Daniel for inducting us into the world of topic modeling and to all our participants!

Our next workshop, Whose Data is it Anyway? Ethical, Practical, and Methodological Challenges of Data Donation in Messenger Groups Research, will take place on August 30, 2023. See you there!

Workshop: Theory Construction: Building and Advancing Theories for Empirical Social Science (September 14, 2023)

We are excited to announce our upcoming workshop, Theory Construction: Building and Advancing Theories for Empirical Social Science, which will take place on Thursday, September 14 in the Kassenhalle (main hall), WI. Led by Adrian Meier (FAU Erlangen-Nürnberg) and created in collaboration with Dr. Daniel Possler (JMU Würzburg), this intensive “crash course” will equip participants with practical strategies for constructing and advancing social scientific theories. Beginning with an exploration of fundamental concepts, structure, and quality criteria of social scientific theories, Adrian will delve into hands-on techniques for building and advancing theory. The workshop will focus on the theory-building process as well as the micro-level of social analysis, offering examples from media psychology and communication science.

For more information, visit our program page. See you there!

Workshop Recap: From Civic Tech to Science – Reimagining Science-Society Relations (July 6, 2023)

On July 6, Nicolas Zehner gave the workshop From Civic Tech to Science: Reimagining Science-Society Relations at the Weizenbaum Institute. Civic tech encompasses a diverse array of empowering technologies that enable democratic participation by allowing citizens to engage with societal issues and contribute to positive change. What insights can science gain from civic tech initiatives? How can they contribute to inclusive knowledge creation? And how can the design of these initiatives help rethink science-society relations? Those were some of the key questions that guided this workshop.

The workshop involved three introductory position statements, each shedding light on different aspects of civic tech’s impact. The position statement on “The Journalism of Things,” exemplified by projects like “Radmesser” and “Bienenlive,” demonstrated how civic tech can impact citizen behavior, raise topic visibility, and foster transdisciplinary knowledge. Dr. Beatrice Jetto’s position statement, “Blockchain-based Civic Tech Ecosystem: Bridging the Gap Between Research and Practice Objectives”, highlighted the potential of blockchain-based civic tech in making citizen participation in urban development more inclusive and transparent. Furthermore, Nicolas Zehner’s statement position, “AI, Environmental Protection, and the Promise of Participation”, discussed how Artificial Intelligence (AI) can serve as a platform for reimagining science-society relations and a gateway to thinking about more global issues by reintroducing the concept of “awareness of uncertainty” as a form of knowledge.

Following the position statements, the workshop engaged participants in group work sessions, facilitating discussions on knowledge transfer beyond conventional science communication. Collaboratively, they explored ways to create infrastructures that foster collaboration and include data subjects, avoiding the reproduction of existing power structures and ensuring equitable civic tech initiatives.

Workshop Recap: DSA – Data Access for Research (June 21, 2023)

Data is an invaluable asset for scientific research. However, accessing platform data for academic purposes has become increasingly challenging, particularly with the closure of free access to APIs like Twitter’s. Recognizing the significance of data accessibility for research, the Weizenbaum Institute organized the workshop Datenzugang für die Forschung – Der Digital Services Act (DSA) in collaboration with the European New School of Digital Studies (ENS) to explore the potential of the upcoming Digital Services Act (DSA) in facilitating data access for academic research.

The DSA is set to bring about improvements in data access for researchers under Article 40. However, the DSA’s regulations must be thoughtfully implemented at the national level to achieve these goals fully. With the closure of free access to Twitter’s API, there is an urgency to find robust solutions to enable researchers to access platform data for scientific inquiry. The DSA, expected to come into force in February 2024, holds promises to provide avenues for researchers to obtain the data they need for their academic research. Still, it also brings about its own set of challenges.

The workshop aimed to foster an open forum where researchers from diverse disciplines, particularly those who work or plan to work with platform data, could come together to provide recommendations for the effective implementation of the DSA. Organized by Ulrike Klinger (ENS) and Jakob Ohme (WI) and supported by the Stiftung Mercator, the workshop addressed crucial questions surrounding data access requests, eligible data, and the verification process by authorities and platforms.

The workshop started with a welcoming address from Ulrike Klinger. Jakob Ohme then provided an overview of the DSA’s Article 40, shedding light on its potential implications for researchers. This was followed by presentations on the DSA’s implementation in Germany by Gökhan Cetintas from the Bundesministerium für Digitales und Verkehr and Andrea Sanders-Winter from the Bundesnetzagentur, who offered insights into the data access rules under the DSA.

After a coffee break, Jessica Gabriele Walter from Aarhus University presented on DSA40 and scholarly networks in other EU countries, providing a broader perspective on data access challenges and solutions. Richard Kuchta from Democracy Reporting International later delved into “The Data Access Problem” and emphasized the necessity of a vetting process to ensure data security and accuracy.

The latter part of the workshop involved group work in which participants engaged in the discussion and expansion of a policy paper draft prepared by the Weizenbaum Institute and ENS, based on inputs from an early expert round. The goal was to develop actionable recommendations that would benefit the research community in Germany and the EU. Breakout sessions centered on topics like “Vetting Access,” “Access Modes,” and “Infrastructure,” allowing participants to delve deeper into specific aspects of data access.

The workshop brought together an interdisciplinary group of researchers with a shared vision: enabling access to platform data for academic purposes. By combining their expertise and perspectives, participants crafted recommendations for the effective implementation of the DSA, ensuring that data access for research remains equitable and secure. As the DSA comes into force and takes shape, the outcomes of this workshop are expected to serve as a significant step forward in fostering inclusive dialogue on the future of data accessibility.

Further Information
\ Thursday Lunch Talk Series: Article 40 of the DSA (April 20, 2023)
\ Response to the Call for Evidence DG CNECT-CNECT F2 by the European Commission
\ Interview with Jakob Ohme “Researchers Fight for Data Access under the DSA”

Workshop Recap: Introduction to Programming and Data Analysis with R

From March 29-30, the Methods Lab organized a workshop on the use of the programming language R for working with data, led by Roland Toth. The focus was on the main principles of programming in order to understand what is happening under the hood when working with data.

Day 1 focused on the advantages of using a programming language to work with data over dedicated software such as SPSS or Stata. In the course of this, the most important principles of programming in a research context, such as functions, classes, objects, vectors, and data frames were covered. Before going into the specific tasks in the context of data analysis, the markup language Markdown in combination with R was first introduced. This allows data analyses to be not only performed, but also reported in a directly reproducible and seamlessly interrelated manner, so that entire research papers can be written using R and Markdown. The day concluded by covering the key steps and techniques in data wrangling and performing calculations of typical descriptive and inferential statistical measures, tests, and models. At the end of each section of the day, there were small tasks to be solved by the participants to apply what they had learned.

On Day 2, the data analysis section was wrapped up with a demonstration of numerous visualization methods. This was followed by a longer section in which participants were allowed to think about their own research question based on a freely available data set from the European Social Survey (ESS) and answer it in R using all the techniques they had learned. They were supported by the workshop leader, since at the beginning of working with a programming language there are often many small, unforeseen problems that can quickly lead to frustration without prior experience. Lastly, an outlook was given on what techniques and packages to familiarize oneself with once beginning to dive deeper into data analysis in R and programming in general (for example, custom functions, loops, and pipes). The workshop was concluded with a Q&A where remaining questions could be asked.

For the purpose of optimizing the training offered by the Methods Lab, a short, anonymous evaluation was conducted at the very end of the workshop. Thankfully, the participants were very satisfied with the workshop throughout and only commented that more frequent and smaller tasks might have been even better. Although this is in parts difficult to reconcile with the concept of the workshop, this feedback is appreciated and will be used to improve future offerings in this regard.

The Methods Lab would like to thank all participants for their participation and commitment and hopes that the skills learned will be of benefit to them in future research projects and other application scenarios.

Workshop: Introduction to Programming and Data Analysis with R (March 29-30, 2023)

Our second workshop, Programming and Data Analysis with R, will be held on March 29 and 30 at the Institute.

During the first day of the workshop, Roland Toth (WI) will introduce and establish the fundamentals of programming in R/R Studio, combining it with Markdown. Building of the first, the second day will be dedicated to applying this knowledge to data analysis and working on a custom research question. No previous experience is necessary.

You can find more information about the workshop on its program page.

Workshop Recap: Web Scraping and API-based Data Collection

On March 2nd, the Methods Lab hosted its first-ever workshop, Web Scraping and API-based Data Collection. The workshop explored various techniques for accessing and gathering data from platforms using APIs and web scraping. Speakers included Florian Primig (FU Berlin), Steffen Lepa (TU Berlin), Felix Gaisbauer (WI), and Leon Wendel (WI). The workshop received an overwhelmingly positive response, with many people attending both in person and remotely. It generated plenty of discussions and concluded with a Q&A session.

Lion Wedel gives an introduction to Web-Scraping (photo: Roland Toth).

Thanks to all our presenters and participants in helping us create such a successful first event. We look forward to organizing more workshops in the future on emerging methodologies in the realm of digital research!

ECPR Winter School: Machine Learning with Big Data for Social Scientists

From February 6–10, Methods Lab member Roland Toth attended the online course Machine Learning with Big Data for Social Scientists at ECPR Winter School.

The goal was to gain a deeper insight into certain machine learning methods and to be able to apply them to social science questions in particular. It was also about efficiency in handling large data sets so that they can still be processed with high performance.

Numerous materials were made available for the workshop in advance. There were videos for each session in which presentation slides on the respective topics of the session were presented in the style of a lecture. These were accompanied by appropriate literature and studies. On each of the workshop days, there were two-hour live sessions in which the content of the videos was repeated and the application of the principles was practiced live.

The first step was to set up RStudio Server on the Amazon Web Services (AWS) cloud service. This offloads the entire RStudio environment from one’s own machine, allowing handling data and calculations without burdening local resources.

Furthermore, work with the package collection tidyverse was deepened. Here, among other things, it turned out that the function vrooom from the package of the same name provides faster import of larger data sets than similar functions. In addition, it was discussed how to access external data sets directly from RStudio via SQL syntax, so that it is not necessary to import the full data sets at all.

For illustrative purposes, data sets on COVID vaccination status and election outcomes in the United States were used during the workshop. Respectively, the observations were clustered at different levels (state, county, …), which rendered the merging of the data sets difficult. Besides typical functions of data wrangling (filtering, grouping, aggregating, mapping, merging), some special machine learning methods were discussed. Here, the logic of the procedure was first demonstrated using simple linear regression models: A model is trained with a (smaller) training data set and then applied to a (larger) test data set. The model is supposed to accurately predict the outcome, but not as accurately as to overfit to the training data and perform badly on the test data – in the end, it was a question of a balance between variance and bias. During the workshop, this principle was also applied to LASSO and Ridge regression, logistic regression, and classification methods such as Support Vector Machines, Decision Trees, and Random Forests.

All in all, it was a good introduction to working with machine learning methods. However, there was limited focus on the decision criteria for choosing certain methods over others, and a strong focus on the technical implementation of the methods in R. Nevertheless, the workshop was able to clarify some open questions and provide some new techniques that will help when working with larger datasets and in data analysis.

Workshop: Web Scraping and API-based Data Collection (March 2, 2023)

We hereby present the first workshop at the Institute to emerge from the methodological needs that were indicated in our institute-wide survey in December. It is titled Web Scraping and API-based Data Collection and takes place on March 2.

After an introduction to the topic by the Methods Lab team, Florian Primig (FU), Steffen Lepa (TU), Felix Gaisbauer (WI), and Lion Wedel (WI) will each present various use cases of these two data collection methods. You can find more information about the workshop on its program page.