Blog

Launch of the Weizenbaum Panel Data Explorer

We are excited to announce the launch of the Weizenbaum Panel Data Explorer, an interactive website developed by Methods Lab member Roland Toth. The Data Explorer allows you to browse and analyze survey results from the annual survey conducted by the Weizenbaum Panel on media use, political participation, civic norms, and more. In the spirit of open science, it not only presents research data, but also in an easy-to-use manner.

The Weizenbaum Panel aims to shed light on the complex relationship between the digital realm and political engagement. By examining phenomena such as hate speech and fake news, as well as the active commitment to a democratic culture of debate, the telephone survey offers invaluable insights into the ever-evolving dynamics of citizen participation in Germany.

With the launch of Data Explorer, you can explore this comprehensive dataset and gain a deeper understanding of Germany’s social and political landscape. The platform offers six categories: social media platform use, political attitudes, civic norms, political participation, and online civic intervention. Each category presents a unique perspective, allowing you to examine specific aspects of Germany’s social and political fabric.

To begin your exploration, simply select a category that piques your interest. Within each category, you will find a selection of questions to delve into. Whether you want to gauge the political news media consumption of the German public, analyze trends in the use of video platforms such as TikTok and Instagram, or find out how often people discuss political issues at work, or with friends and family, the Data Explorer will assist you in this endeavor.

For a nuanced understanding of how different groups within the population engage in social and political activities, you can group the data output by selecting the demographic factors gender, age, level of education, or residence. Moreover, to enhance your experience and facilitate data sharing, you can download any graph in .png format. Each graph includes the question, answering options, and grouping, providing a comprehensive visual representation of the desired data.

The Weizenbaum Data Explorer was developed in Python/Jupyterhub and deployed using Voilà, which are all open-source. It is hosted on Weizenbaum Institute servers, which ensures adequate data protection. This is not the case for typical solutions such as using R Shiny and the deployment platform shinyapps.io. The Data Explorer will be expanded continuously – for example, the fourth wave of the Weizenbaum Panel will be integrated soon.

Whether you’re a researcher, journalist, student, or simply someone curious about Germany’s social and political landscape, the Weizenbaum Panel Data Explorer equips you with the tools to visualize data effortlessly. Happy exploring!

Thursday Lunch Talk Series: Article 40 of the DSA (April 20, 2023)

Researchers in the EU are about to have a new legislative framework to access and study data held by platforms and search engines in the form of Article 40 of the Digital Services Act (DSA) – a major milestone in platform regulation history expected to have spillover effects worldwide. As part of the Thursday Lunch Talk Series, Jakob Ohme (WI) and the Methods Lab jointly organized a talk to gain more insight into what Article 40 means in the context of German law, and the consequences it might have on researchers’ access to platform data. Tupperware and brown paper bags in hand, hungry participants gathered in the Flexraum to listen to Jakob give the ABCs of the EU’s new data access regime and discuss some of its opportunities, limitations, and grey areas.

Here is a quick summary of Article 40:

  1. Providers of very large online platforms (VLOPs) or search engines (VLOSEs) shall provide access to data necessary for monitoring and assessing compliance with the DSA, at their reasoned request and within a reasonable period specified in that request, access to data necessary to monitor and assess compliance with this regulation.
  2. Data accessed can only be used for monitoring and assessing compliance while taking into account the rights and interests of the platform providers, service recipients, personal data protection, and the security of their services.
  3. Platforms must explain the design, logic, functioning, and testing of their algorithmic systems, including recommender systems, upon request.
  4. Vetted researchers can request access to data to conduct research on “systemic risks” in the EU and assess risk mitigation measures.
  5. Within 15 days, platforms can request to amend a data access request as referred to in §4 if:
    (a) they do not have access to the data
    (b) giving access to the data will lead to significant vulnerabilities in the security of their service or the protection of confidential information, particularly trade secrets.
  6. Requests for amendment pursuant to §5 should propose alternative means for providing access to appropriate and sufficient data.
  7. Platform providers or search engines shall facilitate and provide access to data pursuant to §1 and §4 through appropriate interfaces specified in the request, including online databases or application programming interfaces.
  8. Researchers can be granted the status of “vetted researchers” if they meet specific conditions, including affiliation with a research organization, independence from commercial interests, disclosure of research funding, capability to fulfill data security requirements, and commitment to making research results publicly available.
  9. Researchers can submit applications to the DSC of the Member State they are affiliated with, who conducts an initial assessment before forwarding the application to the DSC of Establishment for a final decision.
  10. The DSC can terminate data access for vetted researchers if they no longer meet the conditions. The coordinator must inform the platform provider and allow the researcher to respond before terminating access.
  11. DSCs must inform the Board about vetted researchers and their research purposes. If access to data is terminated, they must also inform the Board.
  12. Platforms must provide timely access to publicly accessible data, including real-time data, to researchers who meet the conditions and use it for research on systemic risks.
  13. With input from the Board, the Commission will adopt delegated acts to specify technical conditions for data sharing, including with researchers, while considering the rights and interests of platforms and service recipients, protection of confidential information, and maintaining service security.

Both presenter and the audience highlighted several aspects regarding the infrastructure and implications of the article, which made for a vibrant, fruitful discussion. One question focused on the effort platforms would need to make in order to prevent researchers from acquiring data (§5). Though making a projections at this point in time is challenging due to the remaining unknowns, lawyers predict that platforms will try to prevent researchers’ access to data more for certain areas than others. One such area could be questions pertaining to algorithms, which would fall under the so-called “trade-secret exemption.” Another topic of discussion was the “systemic risk research” requirement (§4). More specifically, what do we mean when we speak of systemic risks? As a term that can be understood very widely, it would be possible, hypothetically speaking, to file a request as long as one can argue for a broader understanding of it.

Some details regarding the data vetting process and its implementation remain unclear, such as the establishment of an independent advisory mechanism and the technical conditions under which it would operate. Most of the largest platforms and search engines are based in Ireland, so the DSC of Establishment tasked with vetting researchers will likely be the Irish DSC in many cases. Researchers can also send their applications to their country’s national digital services coordinator. In terms of regulatory oversight in Germany, it is anticipated that the Bundesnetzagentur will play a significant role as the DSC regulator. The future German DSC will be able to provide an opinion about whether to grant a data access request, but the final decision will remain in the hands of the Irish DSC.

DSCs are yet to be appointed by EU member states, and complex vetting may require an independent advisory body responsible for this task. However, the establishment of an independent advisory mechanism comes with its own set of challenges. How much power will the board have? And how will the board make its decisions? During the talk, the difficulty of dealing with and assessing raw data when one does not know what to look for was identified as another potential issue. An alternative model could involve access to publicly accessible data without vetting. This approach would be similar to what the Twitter API has provided in the past, and it may prove to be an exciting option for fueling research, primarily if implemented in real-time and through application programming interfaces.

This edition of the Thursday Lunch Talk Series shed light on several key aspects of Article 40, emphasizing the opportunities and challenges it could create for researchers’ access to platform data in the future. While some details, such as the data vetting process, remain uncertain, the presentation sparked valuable discussions, highlighting the complexities and considerations involved in what lies ahead for platform providers, researchers, and lawmakers in navigating our digital landscape.

Food for thought!

Further Information
\ Response to the Call for Evidence DG CNECT-CNECT F2 by the European Commission
\ Interview with Jakob Ohme “Researchers Fight for Data Access under the DSA”

Research stay at Universidad de Navarra (Pamplona, Spain)

From April 17-23, Methods Lab Data Scientist Roland Toth spent a week at the Institute for Culture and Society (ICS) at Universidad de Navarra in Pamplona, Spain. This flash visiting researcher stay was financed and took place in the context of their project Youth in Transition in which they have collected data every year for four years in a representative sample of the Spanish population. These data include various information on smartphone use, smartphone pervasiveness, and psychological traits.

Together with the researchers Aurelio Fernández, Javier García-Manglano, and Pedro de la Rosa, Roland wrote a first draft of a research article using these data. As mobile media use is typically measured using indicators of use quantity (duration and frequency) alone, the paper deals with the question whether qualitative dimensions of mobile media use should be involved in its measurement, too. Specifically, the researchers are investigating the role of gratification variety (e.g., for information, social contact, or escapism) and situation variety (e.g., while in a meeting, while watching a movie, or while eating). Both represent defining characteristics of mobile media devices like the smartphone, as we typically use them for various purposes, anytime, and anywhere. For conceptual validation, the researchers examine whether these two qualitative dimensions contribute substantially to predicting the concept of mobile vigilance – the constant salience of mobile media devices and an urge to monitor and remain reactive to them. As such vigilance is tied to mobile media use per definition and emerged in close alignment to its development, it is bound to be associated with smartphone use. In other words: If gratification and situation of smartphone use can explain a share of mobile vigilance that remains unexplained by the quantity of smartphone use, this indicates that both dimensions are substantial to the measurement of mobile media use. The researchers are currently finalizing the article.

Inviting Roland for this stay was a generous gesture of ICS and the researchers and the institute were very welcoming and engaged in the project during his stay. Aside from the productive cooperation, our colleague was delighted with the beautiful campus and the equally charming city of Pamplona (and Donostia-San Sebastián), where spring had actually begun already. We hope that the article can be published successfully and that the cooperation between ICS at Universidad de Navarra and the Methods Lab of the Weizenbaum Institute will continue in future projects!

Book Launch: Challenges and Perspectives of Hate Speech Research

We are thrilled to announce the release of “Challenges and Perspectives of Hate Speech Research,” a collection of 26 texts on contemporary forms of hate speech by scholars from various disciplines and countries. The anthology is co-edited by Methods Lab members Christian Strippel and Martin Emmer, together with research colleagues Sünje Paasch-Colberg and Joachim Trebbe. Divided into three sections, it covers present-day political issues and developments, provides an overview of key concepts, terms, and definitions, and offers numerous methodological perspectives on the topic. Whether you are a fellow academic researcher or a concerned netizen, this book is a must-read for anyone interested in the dynamic field of interdisciplinary hate speech research and the future of our evolving digital landscape.

Challenges and Perspectives of Hate Speech Research is open access!

This book is the result of a conference that could not take place. It is a collection of 26 texts that address and discuss the latest developments in international hate speech research from a wide range of disciplinary perspectives. This includes case studies from Brazil, Lebanon, Poland, Nigeria, and India, theoretical introductions to the concepts of hate speech, dangerous speech, incivility, toxicity, extreme speech, and dark participation, as well as reflections on methodological challenges such as scraping, annotation, datafication, implicity, explainability, and machine learning. As such, it provides a much-needed forum for cross-national and cross-disciplinary conversations in what is currently a very vibrant field of research.

Workshop Recap: Introduction to Programming and Data Analysis with R

From March 29-30, the Methods Lab organized a workshop on the use of the programming language R for working with data, led by Roland Toth. The focus was on the main principles of programming in order to understand what is happening under the hood when working with data.

Day 1 focused on the advantages of using a programming language to work with data over dedicated software such as SPSS or Stata. In the course of this, the most important principles of programming in a research context, such as functions, classes, objects, vectors, and data frames were covered. Before going into the specific tasks in the context of data analysis, the markup language Markdown in combination with R was first introduced. This allows data analyses to be not only performed, but also reported in a directly reproducible and seamlessly interrelated manner, so that entire research papers can be written using R and Markdown. The day concluded by covering the key steps and techniques in data wrangling and performing calculations of typical descriptive and inferential statistical measures, tests, and models. At the end of each section of the day, there were small tasks to be solved by the participants to apply what they had learned.

On Day 2, the data analysis section was wrapped up with a demonstration of numerous visualization methods. This was followed by a longer section in which participants were allowed to think about their own research question based on a freely available data set from the European Social Survey (ESS) and answer it in R using all the techniques they had learned. They were supported by the workshop leader, since at the beginning of working with a programming language there are often many small, unforeseen problems that can quickly lead to frustration without prior experience. Lastly, an outlook was given on what techniques and packages to familiarize oneself with once beginning to dive deeper into data analysis in R and programming in general (for example, custom functions, loops, and pipes). The workshop was concluded with a Q&A where remaining questions could be asked.

For the purpose of optimizing the training offered by the Methods Lab, a short, anonymous evaluation was conducted at the very end of the workshop. Thankfully, the participants were very satisfied with the workshop throughout and only commented that more frequent and smaller tasks might have been even better. Although this is in parts difficult to reconcile with the concept of the workshop, this feedback is appreciated and will be used to improve future offerings in this regard.

The Methods Lab would like to thank all participants for their participation and commitment and hopes that the skills learned will be of benefit to them in future research projects and other application scenarios.

Workshop: Introduction to Programming and Data Analysis with R (March 29-30, 2023)

Our second workshop, Programming and Data Analysis with R, will be held on March 29 and 30 at the Institute.

During the first day of the workshop, Roland Toth (WI) will introduce and establish the fundamentals of programming in R/R Studio, combining it with Markdown. Building of the first, the second day will be dedicated to applying this knowledge to data analysis and working on a custom research question. No previous experience is necessary.

You can find more information about the workshop on its program page.

Workshop Recap: Web Scraping and API-based Data Collection

On March 2nd, the Methods Lab hosted its first-ever workshop, Web Scraping and API-based Data Collection. The workshop explored various techniques for accessing and gathering data from platforms using APIs and web scraping. Speakers included Florian Primig (FU Berlin), Steffen Lepa (TU Berlin), Felix Gaisbauer (WI), and Leon Wendel (WI). The workshop received an overwhelmingly positive response, with many people attending both in person and remotely. It generated plenty of discussions and concluded with a Q&A session.

Lion Wedel gives an introduction to Web-Scraping (photo: Roland Toth).

Thanks to all our presenters and participants in helping us create such a successful first event. We look forward to organizing more workshops in the future on emerging methodologies in the realm of digital research!

ECPR Winter School: Machine Learning with Big Data for Social Scientists

From February 6–10, Methods Lab member Roland Toth attended the online course Machine Learning with Big Data for Social Scientists at ECPR Winter School.

The goal was to gain a deeper insight into certain machine learning methods and to be able to apply them to social science questions in particular. It was also about efficiency in handling large data sets so that they can still be processed with high performance.

Numerous materials were made available for the workshop in advance. There were videos for each session in which presentation slides on the respective topics of the session were presented in the style of a lecture. These were accompanied by appropriate literature and studies. On each of the workshop days, there were two-hour live sessions in which the content of the videos was repeated and the application of the principles was practiced live.

The first step was to set up RStudio Server on the Amazon Web Services (AWS) cloud service. This offloads the entire RStudio environment from one’s own machine, allowing handling data and calculations without burdening local resources.

Furthermore, work with the package collection tidyverse was deepened. Here, among other things, it turned out that the function vrooom from the package of the same name provides faster import of larger data sets than similar functions. In addition, it was discussed how to access external data sets directly from RStudio via SQL syntax, so that it is not necessary to import the full data sets at all.

For illustrative purposes, data sets on COVID vaccination status and election outcomes in the United States were used during the workshop. Respectively, the observations were clustered at different levels (state, county, …), which rendered the merging of the data sets difficult. Besides typical functions of data wrangling (filtering, grouping, aggregating, mapping, merging), some special machine learning methods were discussed. Here, the logic of the procedure was first demonstrated using simple linear regression models: A model is trained with a (smaller) training data set and then applied to a (larger) test data set. The model is supposed to accurately predict the outcome, but not as accurately as to overfit to the training data and perform badly on the test data – in the end, it was a question of a balance between variance and bias. During the workshop, this principle was also applied to LASSO and Ridge regression, logistic regression, and classification methods such as Support Vector Machines, Decision Trees, and Random Forests.

All in all, it was a good introduction to working with machine learning methods. However, there was limited focus on the decision criteria for choosing certain methods over others, and a strong focus on the technical implementation of the methods in R. Nevertheless, the workshop was able to clarify some open questions and provide some new techniques that will help when working with larger datasets and in data analysis.

Workshop: Web Scraping and API-based Data Collection (March 2, 2023)

We hereby present the first workshop at the Institute to emerge from the methodological needs that were indicated in our institute-wide survey in December. It is titled Web Scraping and API-based Data Collection and takes place on March 2.

After an introduction to the topic by the Methods Lab team, Florian Primig (FU), Steffen Lepa (TU), Felix Gaisbauer (WI), and Lion Wedel (WI) will each present various use cases of these two data collection methods. You can find more information about the workshop on its program page.

Research Methods at the Weizenbaum Institute: Survey Results

In December 2022, the Methods Lab conducted an internal survey to map out the methodological experiences and needs at the Weizenbaum Institute. Thanks to everybody who participated! We have identified specific demands and requests at the institute. Even though there already is extensive expertise for a large variety of methods and tools, many Weizenbaum scholars also expressed a wish for additional support and knowledge-building in, for instance, the following areas:

  • Data collection: Automated observation (e.g., logging, tracking), Automated content analysis, Web Scraping, API-based data collection, and Eye-Tracking
  • Data Analysis: Network Analysis, Deep/Transfer Learning, Natural Language Processing, and Classification Methods
  • Software/Tools: R, Python, and Network analysis software

With these results as our polaris, we in the Methods Lab have embarked on the expedition of developing a future methods training and consulting program suited to your needs, which we will announce shortly. In the meantime, the results of the survey hopefully serve as a launch pad for networking amongst the scholars at the Weizenbaum Institute.