Recap: Networking Event for Digitalization Research in Berlin

On May 31st, the Methods Lab of the Weizenbaum Institute and the Interdisciplinary Center for Digitality and Digital Methods of the Humboldt University Berlin (IZ D2MCM) organized a networking event to which they invited various institutions, institutionalized teams, and centers that are actively engaged in digital research in the humanities, social sciences, and cultural studies in Berlin. On this Friday, about 50 scientists met in the Auditorium of the Grimm Center to present their work, future needs, and opportunities for cooperation, and thus to improve the networking of the Berlin research landscape. For this purpose, the event was divided into two parts:

In the first part, all teams, initiatives, and institutes introduced themselves in short presentations. The following institutions, teams and initiatives presented themselves:

  • The Data-Methods-Monitoring Cluster at DeZIM Institute is a cross-disciplinary facility that uses and adapts proven data evaluation methods. Their work includes the development of experimental designs (DeZIM.lab), the creation and adaptation of survey methods and survey designs (DeZIM.methods). In addition, they offer training on quantitative and qualitative methods via the DeZIM Summer School. https://www.dezim-institut.de/en/institute/data-methods-monitoring-cluster/
  • The Digital History of Education Lab (DHELab) at Bibliothek für Bildungsgeschichtliche Forschung (BBF)offers training and lectures on digital history and 3D research data, visualization, text mining and AI-supported literature research through initiatives such as Last Friday’s Lab Talk (LFLT). The lab also develops services to support digital research practice in historical educational research. https://bbf.dipf.de/de/arbeiten-lernen/dhelab
  • The Digital Humanities Network at University of Potsdam focuses on research and teaching collaborations in the digital humanities. It offers events and courses such as the ECode & Culture Lecture Series”, the “Henriette Herz Humanities Hackathons” and the “Python 4 Poets Course”. https://www.uni-potsdam.de/en/digital-humanities/
  • The Department of Audio Communication at TU Berlin conducts transdisciplinary research and development in the areas of music, sound and language. They have developed digital research tools such as „PLAY“ and „Spotivey“, which are used in areas such as virtual acoustic reality and music and media reception research. https://www.tu.berlin/ak
  • The Alexander von Humboldt Institute for Internet and Society (HIIG) focuses on how digital methods impact internet and society research, including the development of digital tools and collaborative platforms. They have worked on projects such as the „Open Knowledge Maps“ for academic content visualization, and tools to enhance rainforest protection in Indonesia using remote sensing and geo-tracking technologies. https://www.hiig.de/das-institut/
  • Prof. Dr. Helena Mihaljević, Professor of Data Science and Analytics at the University of Applied Sciences (HTW) and the Einstein Center for Digital Future (ECDF), organizes the “Social and Political Data Science Seminar.” The seminar brings together PhD students, postdocs, senior researchers, and NGO professionals and serves as a platform to present and discuss ongoing projects, providing valuable feedback on research questions, methods and other aspects. https://www.htw-berlin.de/hochschule/personen/person/?eid=11889
  • The Interdisciplinary Center for Digitality and Digital Methods at Campus Mitte (IZ D2MCM) at HU Berlinfosters collaboration among different faculties in the areas of digitality and digital methods. They provide innovative infrastructure to support excellent research, and have working groups focused on Data Literacy, Reading and Writing Workshops, Large Language Models, Git & Software, Statistics, and Mapping. https://izd2m.hu-berlin.de/
  • QUADRIGA, a data competence center for Digital Humanities, Administrative Sciences, Computer Science, and Information Science, specializes in pioneering digital research methods. They support researchers across disciplines by facilitating studies on data flow, international standards like FAIR and CARE, and digital methods development. Their platform, QUADRIGA Space, hosts educational resources, including the QUADRIGA Navigator tool, to aid in navigating the digital research landscape. https://quadriga-dk.github.io/
  • The DM4 ‘Data Science’ at Berlin State Library develops machine learning and artificial intelligence technologies to enhance the accessibility of digitized historical cultural data, encompassing both text and image formats. Its primary objective is to enable efficient search capabilities and facilitate the organization and structuring of data, making it openly available and machine-readable „Collections as Data“.The STABI Lab focuses on research and development initiatives to advance digital projects and enhance access to cultural heritage materials. https://lab.sbb.berlin/?lang=en
  • TELOTA is an initiative by the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) dedicated to digital research methods, focusing on projects and tools that facilitate scholarly inquiry in digital environments. It offers resources like digital editions, training, text analysis tools, and research infrastructure to support scholars in various disciplines engaging in digital research. https://www.bbaw.de/bbaw-digital/telota
  • The Research Lab “Culture, Society” and the Digital at HU Berlin combines research and teaching in the field of digital anthropology at the Institute for European Ethnology at Humboldt University. Theoretically and empirically, it combines various traditions of ethnographic and multi-modal digitization research based on social theory and analysis. Projects include „The Social Life of XG – Digital infrastructures and the reconfiguration of sovereignity and imagined communities“, „Packet Politics: Automation, Labour, Data“, „Cultures of Rejection; Digitalization of Work and Migration“, and more. https://www.euroethno.hu-berlin.de/de/forschung-1/labore/digilab
  • The Methods Lab at the Weizenbaum Institute is the central unit for supporting, connecting, and coordinating the methods training and consulting. It conducts methods research, and has developed tools for the collection and analysis of digital data, such as the „Weizenbaum Panel Data Explorer“ and the „MART“ app for the collection and analysis of digital data. https://www.weizenbaum-institut.de/en/research/wdsc/methods-lab/
  • The Social Science Research Center Berlin (WZB) has PostDoc networks that focus on specific (digital) methods. For example, a new group on AI Tools facilitates discussions on their utilization and potential acquisitions by the WZB. In addition, the WZB organizes the Summer Institute in Computational Social Science (SICSS) with an emphasis on text as data. The IT & eScience department of WZB supports scientific IT and data science consulting. They offer infrastructures for computing power and geodata and expertise in research software engineering. Projects include the digital version of the DIW Weekly Report. https://wzb.eu/
  • The Digital History Department at Humboldt University is involved in projects such as H-Soz-Kult and JupyterBook for Historians. Their main areas of expertise include modeling historical information, the impact of dataification, and the application of AI-based methods in historical research. They also offer workshops, support, and educational resources for digital history tools. https://www.geschichte.hu-berlin.de/de/bereiche-und-lehrstuehle/digital-history

In the second part of the event, we then broke into seventables to discuss selected topics in small groups in a World Café format.

  • At the “Training on digitality and digital methods” table, we discussed digitality as a central epistemic problem of our research. For example, it seems to go hand in hand with a kind of compulsive quantification that needs to be reflected in both the application and the teaching of digital methods. Future needs that the network could address were mentioned: The creation of interdisciplinary networks for self-learning & trainings, a better exchange of basic teaching modules (Open Educational Resources), a sort of level system for methods training, as is already used for learning foreign languages (A1, A2, B1…), and the sharing of technical infrastructures such as HPC clusters.
  • The table on “Collaboration—governance and best practices” explored different formats of collaboration and networking. Starting points are e.g. networks around topics or around similar data domains. It is important that networking is supported at an early stage, that existing networks are used to avoid duplication of effort, that they stem from bottom-up initiatives, have a long-term perspective, and that they are supported with resources wherever possible. Impulse budgets, for example, are particularly suitable for this purpose. To ensure that collaborations and networks do not become an end in themselves, it is also important to define goals and constantly monitor their success. 
  • At the “Services in teams and centers” table, we collected various areas in which services are needed and offered. These include the provision of infrastructure, interfaces, software, and other resources, as well as consulting, training, and networking. The central goal of services is skill development in any given institution, while important challenges include ensuring their sustainability, evaluating their functionality and needs to avoid oversupply, and clearly targeting previously defined audiences. 
  • The “Cooperation with external services” table discussed various areas in which purchasing external services can be useful. These include: OCR, digitization & quality assurance, less frequently data conversion, GUIs, web applications, and increasingly data analytics. However, most services are still performed in-house. The reasons for this are that there are few positive reports, the cost of tendering is high, small projects are often not economically viable for external providers, and there is little co-development and few real collaborations. The benefits of such collaborations are that external providers can be helpful for realistic cost estimates. In addition, public-private partnerships could open up new sources of funding through economic development.
  • At the “Digitality as a concept between society and science” table, we discussed the problems that emerge from different approaches and perspectives science and the broad public have regarding digitality and digitalization. As these play an increasingly important role for more and more people in everyday life, education, and at work, science communication is tasked with informing the public about current developments. Past events (e.g., data leaks) proved that skepticism toward technology is certainly warranted, but projects and action practices meant to tackle this issue are oftentimes implemented rather poorly. As a result, technology such as artificial intelligence may appear overly powerful, which leads to fear rather than literacy. Future research and discussions should therefore assess how to best merge and guide digitality and digitalization to make the public more comfortable with them.
  • At the “Large Language Models” table, we discussed the current state of research on these types of models as well as future perspectives and desires regarding their use in research. Embracing reproducibility and open research is one of the most important requirements in this field. Further, participants noted the requirement to move beyond mere benchmarking, limitations in terms of resources and hardware, the necessity of establishing AI guidelines, and considering the contexts of Large Language Model application. Providing hardware and an infrastructure that can be used in research and teaching emerged as a crucial step to achieve these goals. Specifically, strengthening local ties between institutions (e.g., in and around Berlin) could be a first step. Finally, more research on how and for what the public uses such technologies could help develop measures to improve AI literacy.
  • At the “Research Software Engineers” table, we discussed the necessity as well as challenges in the context of employees in research who focus on developing software and tools. In practice, tools that are necessary for conducting (niche) research are either bought or developed by the researchers themselves. While the former is dependent on financial means and funding, the latter is often not possible at all because of a lack of skill and/or time. In teaching as well as interdisciplinary research contexts, acquiring programming knowledge is oftentimes neither desired nor feasible. Hiring research software engineers is however tied to other challenges. They have to be situated in a way that their work is not too specific (e.g., on a research group level), but also not too unspecific (e.g., on an institution level). Standardization, documentation, and longevity of their work needs to be ensured. Also, the inclusion of software or tool development in funding applications is not typical in many fields of research yet. Finally, the acknowledgment of software as a publication and contribution of science is not satisfactory yet, but important.

Since the networking event was so successful, the organizers will be meeting again soon to discuss how to continue and intensify the networking. Stay tuned!

Conference Recap: “Data, Archive & Tool Demos” at DGPUK 2024 (March 14, 2024)

Together with Johannes Breuer, Silke Fürst, Erik Koenen, Dimitri Prandner, and Christian Schwarzenegger, Methods Lab member Christian Strippel organized a “Data, Archive & Tool Demos” session as part of the DGPuK 2024 conference at the University of Erfurt on March 14, 2024. The idea behind this session was to provide space to present and discuss datasets, archives, and software to an interested audience. The event met with great interest, so that all the seats were taken. After a high-density session in which all 13 projects were presented in short talks, the individual projects were discussed in more detail in the following poster and demo session in the hallway.

The 13 contributions were:

CKIT: Construction KIT
— Lisa Dieckmann, Maria Effinger, Anne Klammt, Fabian Offert, & Daniel Röwenstrunk
CKIT is a review journal for research tools and data services in the humanities, founded in 2022. The journal addresses the increasing use of digital tools and online databases across academic disciplines, highlighting the importance of understanding how these tools influence research design and outcomes. Despite their critical role, scholarly examination of these tools has been minimal. CKIT aims to fill this gap by providing a platform for reviews that appeal to both humanities scholars and technical experts, promoting interdisciplinary collaboration. For more details, see here.

Der Querdenken Telegram Datensatz 2020-2022 
Kilian Buehling, Heidi Schulze, & Maximilian Zehring
The Querdenken Telegram Datensatz is a dataset that represents the German-speaking anti-COVID-19 measures protest mobilization from 2020 to 2022. It includes public messages from 390 channels and 611 groups associated with the Querdenken movement and the broader COVID-19 protest movement. Unlike other datasets, it is manually classified and processed to provide a longitudinal view of this specific movement and its networking. 

DOCA – Database of Variables for Content Analysis
Franziska Oehmer-Pedrazzi, Sabrina H. Kessler, Edda Humprech, Katharina Sommer, & Laia Castro
The DOCA database collects, systematizes, and evaluates operationalizations for standardized manual and automated content analysis in communication science. It helps researchers find suitable and established operationalizations and codebooks, making them freely accessible in line with Open Method and Open Access principles. This enhances the comparability of content analytical studies and emphasizes transparency in operationalizations and quality indicators. DOCA includes variables for various areas such as journalism, fictional content, strategic communication, and user-generated content. It is supported by an open-access handbook that consolidates current research. For more info, visit the project’s website here.

A “Community Data Trustee Model” for the Study of Far-Right Online Communication
Jan Rau, Nils Jungmann, Moritz Fürneisen, Gregor Wiedemann, Pascal Siegers, & Heidi Schulze
The community data trustee model is introduced for researching sensitive areas like digital right-wing extremism. This model involves sharing lists of relevant actors and their online presences across various projects to reduce the labor-intensive data collection process. It proposes creating and maintaining these lists as a community effort, with users contributing updates back into a shared repository, facilitated by an online portal. The model aims to incentivize data sharing, ensure legal security and trust, and improve data quality through collaborative efforts.

Development and Publication of Individual Research Apps Using DIKI as an Example
— Anke Stoll
DIKI is a dictionary designed for the automated detection of incivility in German-language online discussions, accessible through a web application. Developed using the Streamlit framework, DIKI allows users to perform automated content analysis via a drag-and-drop interface without needing to install any software. This tool exemplifies how modern frameworks can transform complex analytical methods into user-friendly applications, enhancing the accessibility and reuse of research instruments. By providing an intuitive graphical user interface, DIKI makes advanced analytical capabilities available to those without programming expertise, thus broadening the scope and impact of computational communication science.

The FROG Tool for Gathering Telegram Data
Florian Primig & Fabian Fröschl
The FROG tool is designed to gather data from Telegram, a platform increasingly important for social science research due to its popularity and resilience against deplatforming. FROG addresses the challenges of data loss and the tedious collection process by providing a user-friendly interface capable of scraping multiple channels simultaneously. It allows users to select specific timeframes or perform full channel collections, making it suitable for both qualitative and quantitative research. The tool aims to facilitate data collection for researchers with limited coding skills and invites the community to contribute to its ongoing development. An introduction to the tool can be found here.

Mastodon-Toolbox – Decentralized Data Collection in the Fediverse
— Tim Schatto-Eckrodt
The Mastodon Toolbox is a Python package designed for systematic analysis of user content and network structures on the decentralized social media platform Mastodon. Developed as an alternative to centralized platforms, Mastodon offers more privacy and control over data. The toolbox aids researchers in selecting relevant instances, filtering public posts by hashtags or keywords, collecting interactions such as replies, reblogs, and likes, and exporting data for further analysis. It is particularly useful for researchers with limited programming skills, enabling comprehensive data collection across Mastodon’s decentralized network. More info about the tool can be found here.

Open Source Transformer Models: A Simple Tool for Automated Content Analysis for (German-Speaking) Communication Science
Felix Dietrich, Daniel Possler, Anica Lammers, & Jule Scheper
The “Open Source Transformer Models” tool is designed for automated content analysis in German-language communication science. Leveraging advancements in natural language processing, it utilizes large transformer-based language models to interpret word meanings in context and adapt to specific applications like sentiment analysis and emotion classification. Hosted on the Open Source platform “Hugging Face,” the tool allows researchers to analyze diverse text types with minimal programming skills.

Meteor: A Research Platform for Political Text Data
Paul Balluff, Michele Scotto di Vettimo, Marvin Stecker, Susan Banducci, & Hajo G. Boomgaarden
Meteor is a comprehensive research platform designed to enhance the study of political texts by providing a wide range of resources, including datasets, tools, and scientific publications. It features a curated classification system and an interlinked graph structure to facilitate easy navigation and discoverability of resources. Users can contribute new resources, create personalized collections, and receive updates through a notification system. Additionally, Meteor integrates with AmCAT 4.0 to enable non-consumptive research, ensuring the protection of copyrighted materials. For more details, visit the project’s website here.

rufus – The Portal for Radio Search
— Patricia F. Blume
The “rufus” tool is an online research platform developed by the Leipzig University Library (UBL) to provide easy access to broadcast information from the ZDF archive. This platform allows researchers to search production archive data from an external source for the first time, offering data from nearly 500,000 broadcasts and 2 million segments dating back to 1963. The tool features a versatile user interface with specific search instruments, enabling straightforward viewing requests to the ZDF archive. Built with open-source components, rufus not only facilitates access to valuable audiovisual heritage for communication and media researchers but also supports the integration of additional data providers. For more details, visit the project’s website here.

Weizenbaum Panel
— Martin Emmer, Katharina Heger, Sofie Jokerst, Roland Toth, & Christian Strippel
The Weizenbaum Panel is an annual, representative telephone survey conducted by the Weizenbaum Institute for the Networked Society and the Institute for Journalism and Communication Studies at the Free University of Berlin. Since 2019, around 2,000 German-speaking individuals over the age of 16 are surveyed each year about their media usage, democratic attitudes, civic norms, and social and political engagement, with a special focus on online civic interventions. The survey allows for longitudinal intra-individual analyses and the data is made available for scientific reuse shortly after collection. More information about the panel can be found here.

WhatsR – An R Package for Processing and Analyzing WhatsApp Chat Logs
— Julian Kohne
The WhatsR R-package enables researchers to process and analyze WhatsApp chat logs, addressing the gap in studying private interpersonal communication. It supports parsing, preprocessing, and anonymizing chat data from exported logs, while allowing researchers to analyze either their own data or data voluntarily donated by participants. The package includes a function to exclude data from non-consenting participants and is complemented by the ChatDashboard, an interactive R shiny app for transparent data donation and participant feedback. The package can be found here.

OpenQDA
— Andreas Hepp & Florian Hohmann
The OpenQDA tool is an open source qualitative data analysis tool, and the latest product developed at the ZeMKI institute in Bremen. It is provided as free-to-use research software that enables collaborative text analysis and all basic functions of other QDA software. The tool that is currently still in its beta version can be found here.