Workshop Recap: Introduction to High-Performance Computing (HPC)

On May 6 2024, Dr. Loris Bennett from FUB-IT at Freie Universität Berlin held the workshop Introduction to High-Performance Computing (HPC) at WI. In this workshop, he gave an overview of the mechanics of HPC and enabled participants to try it out themselves. While the workshop used the HPC cluster provided by FUB-IT as a practical example, most of the contents applied to HPC in general.

Dr. Bennett began with definitions of HPC and core concepts. He described HPC as a cluster of servers providing cores, memory, storage with high-speed interconnections. These resources are shared between users and distributed by the system itself. Users send jobs consisting of one or more tasks to the HPC cluster. Each task will run on a single compute server, also called a node, and can make use of multiple cores up to the maximum available on a node. The number of tasks per node can be set for each job, but defaults to one. Lastly, an HPC cluster may provide different file systems for different purposes. For example, the file system /home is optimized for large numbers of small files used for programs, scripts, and results, while /scratch is optimized for temporary storage of small numbers of large files.

Next, Dr. Bennett proceeded with resource management. When launching a job, many parameters can be set, such as the number of CPU and GPU cores, the amount of memory, and the time used. In order to determine the resources required for jobs, users need to run a few jobs and check what was actually used. This information can then be used to set the requirements for future jobs and thus ensure that the resources are used efficiently. The priority of a job dictates when a job is likely to start and depends mainly on the amount of resources consumed by the user in the last month. A Quality of Service (QoS) can be set per job which will increase the priority of a job, but the jobs within a given QoS will be restricted in the total amount of resources they can use. In addition, it is possible to parallelize tasks by splitting them into subtasks that can be performed simultaneously. Likewise, many similar jobs can be planned efficiently using job arrays.

Finally, participants could log into the FUB-IT HPC cluster themselves either using the command line or graphical interface tools and request first sample jobs. They were shown how to write batch files defining job parameters, use commands to submit, show, or cancel jobs, and check the results and efficiency of a completed job.

The Methods Lab would like to thank Dr. Bennett for his concise but comprehensive introduction to HPC!

Recap: Networking Event for Digitalization Research in Berlin

On May 31st, the Methods Lab of the Weizenbaum Institute and the Interdisciplinary Center for Digitality and Digital Methods of the Humboldt University Berlin (IZ D2MCM) organized a networking event to which they invited various institutions, institutionalized teams, and centers that are actively engaged in digital research in the humanities, social sciences, and cultural studies in Berlin. On this Friday, about 50 scientists met in the Auditorium of the Grimm Center to present their work, future needs, and opportunities for cooperation, and thus to improve the networking of the Berlin research landscape. For this purpose, the event was divided into two parts:

In the first part, all teams, initiatives, and institutes introduced themselves in short presentations. The following institutions, teams and initiatives presented themselves:

  • The Data-Methods-Monitoring Cluster at DeZIM Institute is a cross-disciplinary facility that uses and adapts proven data evaluation methods. Their work includes the development of experimental designs (DeZIM.lab), the creation and adaptation of survey methods and survey designs (DeZIM.methods). In addition, they offer training on quantitative and qualitative methods via the DeZIM Summer School.
  • The Digital History of Education Lab (DHELab) at Bibliothek für Bildungsgeschichtliche Forschung (BBF)offers training and lectures on digital history and 3D research data, visualization, text mining and AI-supported literature research through initiatives such as Last Friday’s Lab Talk (LFLT). The lab also develops services to support digital research practice in historical educational research.
  • The Digital Humanities Network at University of Potsdam focuses on research and teaching collaborations in the digital humanities. It offers events and courses such as the ECode & Culture Lecture Series”, the “Henriette Herz Humanities Hackathons” and the “Python 4 Poets Course”.
  • The Department of Audio Communication at TU Berlin conducts transdisciplinary research and development in the areas of music, sound and language. They have developed digital research tools such as „PLAY“ and „Spotivey“, which are used in areas such as virtual acoustic reality and music and media reception research.
  • The Alexander von Humboldt Institute for Internet and Society (HIIG) focuses on how digital methods impact internet and society research, including the development of digital tools and collaborative platforms. They have worked on projects such as the „Open Knowledge Maps“ for academic content visualization, and tools to enhance rainforest protection in Indonesia using remote sensing and geo-tracking technologies.
  • Prof. Dr. Helena Mihaljević, Professor of Data Science and Analytics at the University of Applied Sciences (HTW) and the Einstein Center for Digital Future (ECDF), organizes the “Social and Political Data Science Seminar.” The seminar brings together PhD students, postdocs, senior researchers, and NGO professionals and serves as a platform to present and discuss ongoing projects, providing valuable feedback on research questions, methods and other aspects.
  • The Interdisciplinary Center for Digitality and Digital Methods at Campus Mitte (IZ D2MCM) at HU Berlinfosters collaboration among different faculties in the areas of digitality and digital methods. They provide innovative infrastructure to support excellent research, and have working groups focused on Data Literacy, Reading and Writing Workshops, Large Language Models, Git & Software, Statistics, and Mapping.
  • QUADRIGA, a data competence center for Digital Humanities, Administrative Sciences, Computer Science, and Information Science, specializes in pioneering digital research methods. They support researchers across disciplines by facilitating studies on data flow, international standards like FAIR and CARE, and digital methods development. Their platform, QUADRIGA Space, hosts educational resources, including the QUADRIGA Navigator tool, to aid in navigating the digital research landscape.
  • The DM4 ‘Data Science’ at Berlin State Library develops machine learning and artificial intelligence technologies to enhance the accessibility of digitized historical cultural data, encompassing both text and image formats. Its primary objective is to enable efficient search capabilities and facilitate the organization and structuring of data, making it openly available and machine-readable „Collections as Data“.The STABI Lab focuses on research and development initiatives to advance digital projects and enhance access to cultural heritage materials.
  • TELOTA is an initiative by the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) dedicated to digital research methods, focusing on projects and tools that facilitate scholarly inquiry in digital environments. It offers resources like digital editions, training, text analysis tools, and research infrastructure to support scholars in various disciplines engaging in digital research.
  • The Research Lab “Culture, Society” and the Digital at HU Berlin combines research and teaching in the field of digital anthropology at the Institute for European Ethnology at Humboldt University. Theoretically and empirically, it combines various traditions of ethnographic and multi-modal digitization research based on social theory and analysis. Projects include „The Social Life of XG – Digital infrastructures and the reconfiguration of sovereignity and imagined communities“, „Packet Politics: Automation, Labour, Data“, „Cultures of Rejection; Digitalization of Work and Migration“, and more.
  • The Methods Lab at the Weizenbaum Institute is the central unit for supporting, connecting, and coordinating the methods training and consulting. It conducts methods research, and has developed tools for the collection and analysis of digital data, such as the „Weizenbaum Panel Data Explorer“ and the „MART“ app for the collection and analysis of digital data.
  • The Social Science Research Center Berlin (WZB) has PostDoc networks that focus on specific (digital) methods. For example, a new group on AI Tools facilitates discussions on their utilization and potential acquisitions by the WZB. In addition, the WZB organizes the Summer Institute in Computational Social Science (SICSS) with an emphasis on text as data. The IT & eScience department of WZB supports scientific IT and data science consulting. They offer infrastructures for computing power and geodata and expertise in research software engineering. Projects include the digital version of the DIW Weekly Report.
  • The Digital History Department at Humboldt University is involved in projects such as H-Soz-Kult and JupyterBook for Historians. Their main areas of expertise include modeling historical information, the impact of dataification, and the application of AI-based methods in historical research. They also offer workshops, support, and educational resources for digital history tools.

In the second part of the event, we then broke into seventables to discuss selected topics in small groups in a World Café format.

  • At the “Training on digitality and digital methods” table, we discussed digitality as a central epistemic problem of our research. For example, it seems to go hand in hand with a kind of compulsive quantification that needs to be reflected in both the application and the teaching of digital methods. Future needs that the network could address were mentioned: The creation of interdisciplinary networks for self-learning & trainings, a better exchange of basic teaching modules (Open Educational Resources), a sort of level system for methods training, as is already used for learning foreign languages (A1, A2, B1…), and the sharing of technical infrastructures such as HPC clusters.
  • The table on “Collaboration—governance and best practices” explored different formats of collaboration and networking. Starting points are e.g. networks around topics or around similar data domains. It is important that networking is supported at an early stage, that existing networks are used to avoid duplication of effort, that they stem from bottom-up initiatives, have a long-term perspective, and that they are supported with resources wherever possible. Impulse budgets, for example, are particularly suitable for this purpose. To ensure that collaborations and networks do not become an end in themselves, it is also important to define goals and constantly monitor their success. 
  • At the “Services in teams and centers” table, we collected various areas in which services are needed and offered. These include the provision of infrastructure, interfaces, software, and other resources, as well as consulting, training, and networking. The central goal of services is skill development in any given institution, while important challenges include ensuring their sustainability, evaluating their functionality and needs to avoid oversupply, and clearly targeting previously defined audiences. 
  • The “Cooperation with external services” table discussed various areas in which purchasing external services can be useful. These include: OCR, digitization & quality assurance, less frequently data conversion, GUIs, web applications, and increasingly data analytics. However, most services are still performed in-house. The reasons for this are that there are few positive reports, the cost of tendering is high, small projects are often not economically viable for external providers, and there is little co-development and few real collaborations. The benefits of such collaborations are that external providers can be helpful for realistic cost estimates. In addition, public-private partnerships could open up new sources of funding through economic development.
  • At the “Digitality as a concept between society and science” table, we discussed the problems that emerge from different approaches and perspectives science and the broad public have regarding digitality and digitalization. As these play an increasingly important role for more and more people in everyday life, education, and at work, science communication is tasked with informing the public about current developments. Past events (e.g., data leaks) proved that skepticism toward technology is certainly warranted, but projects and action practices meant to tackle this issue are oftentimes implemented rather poorly. As a result, technology such as artificial intelligence may appear overly powerful, which leads to fear rather than literacy. Future research and discussions should therefore assess how to best merge and guide digitality and digitalization to make the public more comfortable with them.
  • At the “Large Language Models” table, we discussed the current state of research on these types of models as well as future perspectives and desires regarding their use in research. Embracing reproducibility and open research is one of the most important requirements in this field. Further, participants noted the requirement to move beyond mere benchmarking, limitations in terms of resources and hardware, the necessity of establishing AI guidelines, and considering the contexts of Large Language Model application. Providing hardware and an infrastructure that can be used in research and teaching emerged as a crucial step to achieve these goals. Specifically, strengthening local ties between institutions (e.g., in and around Berlin) could be a first step. Finally, more research on how and for what the public uses such technologies could help develop measures to improve AI literacy.
  • At the “Research Software Engineers” table, we discussed the necessity as well as challenges in the context of employees in research who focus on developing software and tools. In practice, tools that are necessary for conducting (niche) research are either bought or developed by the researchers themselves. While the former is dependent on financial means and funding, the latter is often not possible at all because of a lack of skill and/or time. In teaching as well as interdisciplinary research contexts, acquiring programming knowledge is oftentimes neither desired nor feasible. Hiring research software engineers is however tied to other challenges. They have to be situated in a way that their work is not too specific (e.g., on a research group level), but also not too unspecific (e.g., on an institution level). Standardization, documentation, and longevity of their work needs to be ensured. Also, the inclusion of software or tool development in funding applications is not typical in many fields of research yet. Finally, the acknowledgment of software as a publication and contribution of science is not satisfactory yet, but important.

Since the networking event was so successful, the organizers will be meeting again soon to discuss how to continue and intensify the networking. Stay tuned!

Workshop Recap: Research Ethics – Principles and Practice in Digitalization Research

On April 18 2024, the Methods Lab organized the workshop Research Ethics – Principles and Practice in Digitalization Research to meet the increasing relevance and complexity of ethics in digitalization research.

In the first part of the workshop, Christine Normann (WZB) introduced participants to good research practice and research ethics in alignment with the guidelines of the German Research Foundation (DFG). Besides the need to balance the freedom of research and data protection, she informed about important institutions, noted the difficulties of formulating ethics statements for funding applications before study designs are finalized, and provided some practical tips regarding guidance when planning research.

Next, Julian Vuorimäki (WI) guided participants through the handling of research ethics at the Weizenbaum Institute. He focussed on the code of conduct, ombudspersons, guideline for handling research data, and the newly founded review board. The latter is in charge of providing ethics reviews for individual projects and studies, which can be applied for through a questionnaire on the institute website.

In the second part of the workshop, three researchers presented practical ethical implications and learnings from research projects. Methods Lab lead Christian Strippel reported on a study where user comments were annotated to allow for the automatic detection of hate speech. He focused on possible misuse for censorship, the confrontation of coders with questionable content, and the challenges of publishing the results and data regarding copyright and framing. Tianling Yang (WI) presented ethical considerations and challenges in qualitative research. The focus lied on consent acquisition, anonymity and confidentiality, power relations, reciprocity (i.e., incentives and support), and the protection of the researchers themselves due to the physical and emotional impact of qualitative field work. Finally, Maximilian Heimstädt (Helmut Schmidt University Hamburg) talked about ambiguous consent in ethnographic research. He gave insights into a study in cooperation with the state criminal police office to predict crime for regional police agencies. Not all individuals in this research could be informed about the research endeavor, especially when the researchers accompanied the police during their shifts, which raised the question of how to find a balance between overt and covert research.

The Methods Labs thanks all presenters and participants for this insightful workshop!

Networking Event: Humanities, Societies and the Digital (May 31, 2024)

Berlin’s academic landscape is rich with diverse research endeavors, particularly in the realms of digital cultural, social, and humanities studies. However, there’s a notable gap in structured and sustained networking among key players in these fields. To address this gap, the Weizenbaum Institute and the Interdisciplinary Center for Digitality and Digital Methods at HU Berlin’s Campus Mitte are organizing a networking meeting.

Scheduled for May 31, 2024, at the Auditorium in the Grimm-Zentrum, HU Berlin, this event is open to institutions, institutionalized teams, and centers actively engaged in digital research within the humanities, social sciences, and cultural studies in Berlin. The aim of the meeting is to strengthen existing connections, identify potential common interests and goals, and spotlight further avenues for collaboration and exchange within Berlin’s vibrant digital research community.

In the first part of the meeting, every team will introduce themselves through short highlighting talks. In the second part, the participants will facilitate a casual, direct exchange for all participants in a World Café format, covering various questions and cross-cutting themes related to digitality and digital methods in the humanities and social sciences.

Event Details:

  • Date: Friday, May 31, 2024
  • Time: 13:00–16:00
  • Location: Auditorium at the Grimm-Zentrum, HU Berlin, Geschwister-Scholl-Str. 1/3
  • Registration Deadline: April 30, 2024
  • Registration: RSVP to

This event is co-organized by the Methods Lab, with contributions from Roland Toth, Martin Emmer, and Christian Strippel.

Workshop: Introduction to High-Performance Computing (HPC) (May 6, 2024)

We’re excited to announce our upcoming workshop Introduction to High-Performance Computing (HPC), scheduled for Monday, May 6th at the Weizenbaum Institute. Led by Loris Bennett (FU) from the HPC service at Freie Universität Berlin, the workshop is open to members of the Weizenbaum Institute with an FU account and access to HPC resources at FU. It aims to provide fundamentals on utilizing HPC resources in general by the example of those offered by FU Berlin.

For further details about the workshop, please visit our program page.

Workshop: Research Ethics – Principles and Practice in Digitalization Research

We are excited to announce our next workshop, “Research Ethics – Principles and Practice in Digitalization Research“, which will take place on Thursday, April 18. This workshop will be conducted both at the Weizenbaum Institute and online, and is open to Weizenbaum Institute members as well as external participants (and the QPD). Led by Christine Normann (WZB), Julian Vuorimäki (WI), Maximilian Heimstädt (HSU), and Tiangling Yang (WI), the workshop will focus on principles and best practices of ethics in research. After a general introduction and overview of principles according to the German Research Foundation (DFG), current plans regarding an ethics board at Weizenbaum Institute will be presented and finally, three separate examples for ethical considerations in research practice will be shown.

For detailed information about the workshop, please visit our program page. We are looking forward to your participation!

Workshop: Introduction to Programming and Data Analysis with R (April 10-11, 2024)

Level: Beginner/Intermediate
Category: Data Analysis

After being well received last year, we’re happy to announce the return of our workshop Programming and Data Analysis with R for its second edition. This two-day intensive workshop led by Roland Toth (WI) will take place on Wednesday, April 10, and Thursday, April 11, at the Weizenbaum Institute.

During the first day, attendees will receive comprehensive training in programming fundamentals, essential data wrangling techniques, and Markdown integration. The second day will center around data analysis, providing participants with the chance to engage directly with a dataset and address a research topic independently. A blend of concepts, coding techniques, and smaller practical tasks will be interspersed throughout both days to reinforce hands-on learning.

For more information, check out the program page!

Workshop: Introduction to Online Surveys

We are excited to announce the Methods Lab’s first workshop of the year, “Introduction to Online Surveys“, which will take place on Thursday, February 22. This workshop will be conducted both at the Weizenbaum Institute and online, and is open to Weizenbaum Institute members as well as external participants. Led by members of the Methods Lab, Martin Emmer, Christian Strippel, and Roland Toth, the workshop will focus on the use of online surveys in the context of social science research, providing participants with a theoretical foundation as well as a hands-on guide. We will cover aspects such as the logic and design of online surveys, how to work with access panel providers, and demonstrate how to effectively set up an online survey using the versatile survey tool LimeSurvey. Crucial topics such as ethics and data protection will also be discussed.

For detailed information about the workshop, please visit our program page. We look forward to your participation!

First Research Fellow at the Methods Lab

The Methods Lab is excited to welcome its first research fellow who arrived at the Weizenbaum Institute on November 20: Douglas Parry from Stellenbosch University, South Africa. His research focus lies on Socio-Informatics in the area of Communication Science, Human-Computer Interaction, and Media/CyberPsychology.

During his 4-week stay, Douglas Parry will contribute to work at the Methods Lab in different ways. On November 30, he will hold the workshop A Practical Introduction to Text Analysis, where he covers all important steps, from pre-processing text to visualizing results of topic modeling in a single day. On December 7, he will host a Digital Methods Colloquium together with Roland Toth, where German researchers focusing on digital methods will get together, present recent work, and discuss challenges and opportunities in the field.

Furthermore, Douglas Parry is collaborating on two research projects with the Methods Lab during his stay, both of which involve the processing of complex data surrounding smartphone usage that were collected using multiple methods earlier this year.

The Methods Lab is happy to host Douglas Parry and is looking forward to the results of this exciting partnership – stay tuned!