In a joint effort, the Career Development and the Methods Lab are excited to announce the hybrid “Career Tutorial on LLMs for all Expertise Levels”. In this tutorial, beginning with fundamental concepts of LLMs and in-context learning, we’ll address the “Needle in the Haystack Problem” and compare ultra-long context models with RAG approaches. Through practical demonstrations, participants will gain hands-on experience with RAG’s core functionalities and understand its objectives. The session delves into scaling solutions using vector databases and advanced implementations, including chunking strategies, hybrid RAG, and graph-based RAG architectures. We conclude with an overview of emerging trends, examining agentic RAG and the integration of reasoning models in deep research applications. This comprehensive exploration equips attendees with both theoretical knowledge and practical insights into the latest developments in AI language models.
For more information, visit our program page. We are looking forward to your participation!
On February 6th, 2025, LK Seiling facilitated a workshop for an Introduction to Git, with support from Sascha Kostadinoski and Quentin Bukold. This was co-organized by the Methods Lab and took place at the Weizenbaum institute. The hybrid event provided a thorough overview on the foundation of Git and its relative platforms for about 30 participants.
Firstly, Git was introduced for its general relevance. Seiling explored the qualities of its version control system and the advantages of efficiently managing changes to files. Its widespread use and accessibility were also highlighted by the software’s free and open source application. At its core, Git enables collaborative work by allowing concurrent adjustments to files by multiple participants and offers a system to track the changes made without requiring alterations to the original file.
LK Seiling describes the features of Git
Next, participants were invited to open the Terminal and guided through some basic commands. To this end, commands for traversing directories, creating, moving, organizing, and deleting files were explained and demonstrated in detail.
This was followed by instructions on the key functionality of Git, such as the Git repository, Git commands, branches, and conflict resolution. For instance, the branches gave insight into how to leverage simultaneous work done separately from the overall code base. This is especially beneficial for feature development while also helping to streamline the process of reviewing changes before merging. Throughout this instruction, commands were given to switch branches and merge scripts in the terminal, which was operationalized with a quickly constructed example. Seiling also provided necessary information on managing repositories, including visuals of the basic workflow and linkage between local and remote repositories, either for individual or collaborative effort.
For those curious when to use which Git platform, Bukold jumped in to detail the major differences between Github, Gitlab and Git.
In the second hour of the workshop, Seiling encouraged participants to implement these basics by imagining the context of a classic Python project, one that might require collaborative engagement. Here, Python scripts were saved, renamed and staged accordingly to git messages and configurations. The principle git practices were emphasized to remind the audience of when and how to commit changes to the previously specified local repository. Furthermore, Seiling prepared guests to make requests when merging work, added description templates for joint projects and generally taught the features of use for group collaboration.
LK Seiling explains how to stage and commit changes
Later, Seiling explored some advantageous elements of the GitLab platform, accessible free of charge to Weizenbaum researchers, by describing the repository graph, issue tracking and project management tools. To elaborate, the repository graph structures insight into how a participant makes a contribution or change by arranging branches to show merges or commits, particularly relevant for collaborative code projects. In case of software malfunctioning, the issue tracking feature allows one to see who is working on what branch for an update on the progress of the problem. Finally, Gitlab’s management tool was outlined for instances of assigning work, applying tags to notify when projects are finished and to open or close potential issues.
To close, Kostadinoski briefly summarized the basic elements of Git, along with its implications in data work, such as for software development and research. He simplified key terms and embraced questions in a Q&A. Seiling joined in, encouraging participants to “learn by doing” and stay connected with each other via Weizenbaum associated Github accounts for future internal coordination.
Throughout this workshop, participants were presented with various tasks and benefited from frequent recaps that highlighted key points, ensuring a solid understanding of the material. Attendees both online and in person freely asked questions and received support from instructors. Therefore the Methods Lab would like to give a huge thank you to LK Seiling, Sascha Kostadinoski and Quentin Bukold for their clear instruction on the foundations of Git and for facilitating such an engaging environment for all participants.
The Methods Lab with contributors Zeerak Talat and Flor Miriam Plaza del Arco are excited to introduce the workshop “Social Science and Language Models – Methods and theory to responsible research on and with Language Technologies” taking place on April 3–4, 2025 at the Weizenbaum Institute. This hybrid event encourages interdisciplinary collaboration to promote ethically responsible research in the application of natural language technology. As methodology utilizing language models is increasingly applied to a variety of contexts from social science, health-care settings to computer software development, research suggests the growing need to monitor potential biased outcomes of its use. However, the absence of collaborative understanding between researchers of social science and those in Natural Language Processing (NLP), perpetuates discrimination as biases in the conception and measurement of socio-technical systems often go unrecognized.
Therefore we hope to engage a diverse group of researchers involved in the methodology of social or economic fields of discipline to address this prejudice in language technologies. Submissions of abstracts are encouraged to involve aspects of bias in the mitigation and measurement of NLP, as well as its implications in the social sciences.
For more information, visit our program page. We are looking forward to your participation!
In the first part of the event, participants discuss their experiences with networking strategies in a speed-dating format. Each conversation was documented by a member of the organizing team. Participants were rotated every few minutes to create different pairings. Each conversation was documented by a member of the organizing team. Participants highlighted the importance of networking within their own institutions, attending regularly organized events to formalize informal connections, pooling resources, and implementing cross-institutional research projects.
Melanie Althage (IZ D2MCM) guides participants through the new calendar system
In the second part of the event, colleagues from IZ D2MCM presented participants with a calendar system they developed. Its purpose is to consolidate events occurring at the network institutions into a single platform, making them accessible to all members. The system was then discussed in two groups. In one group, participants exchanged ideas on the design and admission criteria for events, considering aspects such as content, format, and location. In the other group, participants focused on facilitating the technical implementation, which operates through Git and enables network members to submit event metadata in a structured format.
The Methods Lab would like to thank the IZ D2MCM and all participants for their contributions to this successful event. Stay tuned for the next one!
After another successful run, the Methods Lab is excited to bring back the third annual Programming and Data Analysis with R workshop, led by Roland Toth (WI). This two-day event held at the Weizenbaum institute falls on Wednesday, March12th and Thursday, March 13th.
On the first day, one can expect a comprehensive introduction to the fundamentals of programming, essential data wrangling techniques and Markdown integration. Following this, the second day emphasizes data analysis and incorporates hands-on application of datasets, enabling attendees to independently explore a relevant research topic. Throughout both days, participants will be presented with conceptual knowledge, coding techniques and basic subtasks for a practical and immersive learning experience.
Join us in our first workshop of 2025 for an Introduction to Git, held on Thursday, February 6th. This event will be taking place at the Weizenbaum Institute and welcomes Weizenbaum Institute members to participate.
LK Seiling, an associate researcher, IT administrator Sascha Kostadinoski, and student assistant Quentin Bukold will be the primary instructors leading this event. Together they will guide participants through short theoretical segments, introducing fundamental Git commands and version control concepts. In addition to learning the operations of key Gitlab features, this workshop encourages guests to participate in quizzes and incorporates interactive exercises.
For further details, visit our program page. We hope to see you there!
On November 18, 2024, Karsten Wolf and Florian Hohmann from the University of Bremen presented the software OpenQDA at WI. In this Show and Tell, they gave an overview of OpenQDA and its motivations, functions, and limitations.
In the first part of the Show and Tell, Karsten Wolf presented the development and purpose of the software. It is an open-source alternative to the commercial software MaxQDA, which is a popular tool for text annotation (i.e., coding) in qualitative research. The team at the University of Bremen had been working on OpenQDA for quite some time to not only deliver a free and customizable alternative to MaxQDA, but also allow for (simultaneous) collaboration on projects. In addition, OpenQDA has a plug-in framework that will be expanded over time. For example, atrain is already supported and can be used to transcribe audio files to text, and a plug-in that allows for implementing Python scripts is currently in the works. While OpenQDA is still under development and currently in early-access, the first official release is planned for the near future. It runs on servers at the Unversity of Bremen and can be used by anyone for free.
Florian Hohmann presents features and the GitHub repository of OpenQDA
In the second part of the Show and Tell, Florian Hohmann gave a practical introduction to the most recent version of the software. He showed participants how to create an account, set up a new project, and create a team to work on projects collaboratively. Text content can be added manually, from documents, audio files, and soon even remote sources. These texts can then be annotated/coded using separate, color-coded categories, and it is possible to set up sub-categories for further refinement. The results can be exported in CSV format. In addition, users can create a code portrait, which illustrates the distribution of categories across the text, and a word cloud for quick visual analysis.
At the end of the Show and Tell, participants provided feedback and suggestions for future implementation. For example, the automated conversion of scanned documents to plain text using OCR, and functions like counting and automatic coding, were discussed. Some participants were willing to stay and provide further feedback even after the main event ended. Finally, the team from Bremen, the Methods Lab, and the Weizenbaum Institute IT department discussed the installation of OpenQDA on the Institute’s servers in 2025 to provide a local instance to Weizenbaum Institute researchers.
The Methods Lab would like to thank the colleagues from Bremen for their work, and all participants for providing useful feedback!
On November 26 2024, Maximilian Heimstädt, Professor of Digital Governance & Service Design at the Helmut Schmidt University in Hamburg, shared his experiences and expertise in applying qualitative methods to studying algorithms in organizations. This workshop was co-organized by the Methods Lab and the Research in Practice – PhD Network for Qualitative Research, coordinated by Katharina Berr and Jana Pannier.
The workshop focused on the complexities of studying algorithms from an interpretivist social science perspective; not only the potentials and risks people ascribe to them, but how they are made sense of, enacted, negotiated and integrated into everyday work settings. Drawing on joint research with Simon Egbert on predictive policing, Max shared how he gained access to public sector organizations, approached team-based multi-sited ethnographic fieldwork and learned to understand complex technologies developed and implemented across different empirical sites and over time.
Maximilian Heimstädt presents theoretical approaches to research algorithms in practice
Max introduced three central theoretical approaches from organization studies and critical data studies to research algorithms in practice: technology trajectories, biographies of algorithms, and data journeys that all afford different analytical lenses and offer more nuanced understandings of algorithmic systems. The approach of technology trajectories expands research of the design and use of technologies by integrating broader questions of power, ideology, and institutional change (Bailey & Barley, 2020). Approaching digitalization research from a biographies approach draws attention to the dynamic development of digital technologies, understood as ‘entangled, relational, emergent, and nested assemblages’ across different organizational contexts and time (Glaser, Pollock, & D’Adderio, 2021). Finally, the data journeys approach allows to ‘focus attention on the life of data as they move through space and time, through different sites and cultures of data practice’, and offers a perspective that is attentive to frictions of such data journeys (Bates, Lin, & Goodale, 2016). Based on an introduction of these approaches, the workshop participants explored how their own research has been (both implicitly and explicitly) informed by these approaches, and discussed their practical and epistemic potentials and limits.
The Idea Behind the ‘Research in Practice’ Workshop Series
Qualitative research often feels polished in academic publications, but the reality is that the process can be quite complex at times, and full of twists and turns. We have created this workshop series to center the ‘backstage’ of qualitative research. The goal is to hear directly from scholars about how they conduct their work – the challenges, the unexpected discoveries and unplanned adaptations, the specific methods and digital tools used, and the strategies that help them arrive at interesting and valuable findings. With this workshop format and research network, we aim to create a space for qualitative researchers within and beyond the Weizenbaum Institute to connect, collaborate, and learn from one another.
What to Expect
Each workshop session in the series brings a new perspective on qualitative (digital) research. Invited scholars walk us through their research processes, focusing on how they have handled the challenges of their work. This includes designing studies, building rapport with research participants, analyzing different kinds of qualitative data, theorizing as method, and navigating ethical considerations. The sessions are interactive, offering opportunities to ask questions, share ideas, and discuss in depth. By opening up the processes behind qualitative research, we hope to demystify the work and facilitate conversations that help researchers at all levels.
If you would like to join our network and to be informed about upcoming events, reach out to Katharina Berr and Jana Pannier.
AI applications are growing in popularity, everyday digital tasks are intuitively streamlined, and social media platforms are flooded with automated media that emulate the clarity of actual events. Naturally, this inspires discussions of future opportunities and concerns, such as the possibility of computers overtaking jobs that once relied upon humans. But amidst this consideration of AI into our routine behaviors, how much do we really know about the foundation of these tools? What are the invisible costs of this innovation, and who bears the consequences? The answer is revealed in this article, unsettling accounts behind the scenes of our usage are presented by the data workers’ inquiry.
This community-based initiative fights for fair working conditions and adequate recognition of data workers’ expertise. Since 2022, workers behind AI applications have been investigating their own workplaces to address labor conditions and build workplace power. Derived from the principles of 1880s Marxist thinking, workers conduct research tailored to their political and environmental concerns, with support from trained qualitative researchers. This team of researchers includes lead researcher Milagros Miceli with the Weizenbaum Institute, Adio Dinika,Krystal Kauffman, Camilla Salim Wagner, and Laurenz Sachenbacher. Without compromising the workers’ epistemic authority, they provide training in methods for data collection and analysis to create a methodology for workers to use within investigations. They also diligently monitor ethical and legal boundaries throughout the duration of projects.
The inquiries take place across Venezuela, Kenya, Syria and Germany. Whether in essays, artwork or documentaries, data workers creatively share their perspective working under various AI industries. The striking truths are outlined in the inquiries below. Ultimately, this research will provide structure for collective action, establishing future ethical guidelines in regard to the treatment of data workers.
This blog post discusses when and when not to use the official TikTokAPI. Additionally, this blog post provides step-by-step instructions for a typical research scenario to inform aspiring researchers about using the API.
When and when not to use it
While being the official way of data access, the official TikTok API is by no means the only way for collecting TikTok data in an automatized fashion. Depending on the research endeavour, one of the other ways might be the way to go:
4Cat + Zeeschumier: Sensible if you want to collect limited data on one or more actors, hashtags, or keywords and/or are not confident in programming for the subsequent analysis.
An in-official TiKTok API (pyktok or the Unofficial TikTok API in Python): Both are great projects that provide significantly more data points than the official API. However, this comes with costs: stability and dependency on developers reacting to changes on TikTok’s site.
But why should you use the official TikTok API if those two options are available?
Reliability. In theory, the official API data access provides more stable access than other solutions.
Legality. Depending on your country or home institution, official data access might be a problem for legal reasons. However, you are on the safer side with official data access. Please consult your institution regarding data access.
User-level data. Other data collection methods are often superior in terms of data points on the video level (Ruz et al. 2023). However, the official TikTok API offers a set of user-level data (User info, liked videos, pinned videos, followers, following, reposted videos), which is not as conveniently available through other data collection methods.
One fundamental limitation still needs to be kept in mind. One can make only 1,000 daily requests, each containing 100 records (e.g., videos, comments) at most. This means that if one can exploit the complete 100 records per request (rarely possible), one can retrieve a maximum of 100,000 records per day.
To start with the official TikTok research API, visit Research API. To gain access, you need to create a developer account and submit an application form. When doing so, please record your access request under DSA40 Data Access Tracker to contribute to an effort to track the data access platforms provided under DSA40.
The official documentation on research API usage is not intuitive, especially for newcomers (Documentation). Using the API within the typical programming language Python/R might still pose a challenge, especially for researchers who are working with an API for the first time. The currently scarce availability of API guidance motivates this blog post to provide such guidance without a paywall.
To provide the best experiences, we use cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.