Blog

Tutorial: When and how to use the official TikTok API

This blog post discusses when and when not to use the official TikTokAPI. Additionally, this blog post provides step-by-step instructions for a typical research scenario to inform aspiring researchers about using the API.

When and when not to use it

While being the official way of data access, the official TikTok API is by no means the only way for collecting TikTok data in an automatized fashion. Depending on the research endeavour, one of the other ways might be the way to go:

  1. 4Cat + Zeeschumier: Sensible if you want to collect limited data on one or more actors, hashtags, or keywords and/or are not confident in programming for the subsequent analysis.
  2. An in-official TiKTok API (pyktok or the Unofficial TikTok API in Python): Both are great projects that provide significantly more data points than the official API. However, this comes with costs: stability and dependency on developers reacting to changes on TikTok’s site.

But why should you use the official TikTok API if those two options are available?

  • Reliability. In theory, the official API data access provides more stable access than other solutions.
  • Legality. Depending on your country or home institution, official data access might be a problem for legal reasons. However, you are on the safer side with official data access. Please consult your institution regarding data access.
  • User-level data. Other data collection methods are often superior in terms of data points on the video level (Ruz et al. 2023). However, the official TikTok API offers a set of user-level data (User info, liked videos, pinned videos, followers, following, reposted videos), which is not as conveniently available through other data collection methods.

One fundamental limitation still needs to be kept in mind. One can make only 1,000 daily requests, each containing 100 records (e.g., videos, comments) at most. This means that if one can exploit the complete 100 records per request (rarely possible), one can retrieve a maximum of 100,000 records per day.

To start with the official TikTok research API, visit Research API. To gain access, you need to create a developer account and submit an application form. When doing so, please record your access request under DSA40 Data Access Tracker to contribute to an effort to track the data access platforms provided under DSA40.

The official documentation on research API usage is not intuitive, especially for newcomers (Documentation). Using the API within the typical programming language Python/R might still pose a challenge, especially for researchers who are working with an API for the first time. The currently scarce availability of API guidance motivates this blog post to provide such guidance without a paywall.

Read More

Show and Tell: OpenQDA – A Sustainable and Open Research Software for Collaborative Qualitative Data Analysis

The Methods Lab is eager to present the upcoming Show and Tell: OpenQDA—A Sustainable and Open Research Software for Collaborative Qualitative Data Analysis, scheduled for Monday, November 18th. This Show and Tell will be directed at the Weizenbaum Institute, but is also available to members joining online.

Florian Hohmann with the ZeMKI research unit at the University of Bremen, will be facilitating the event. This workshop provides a live demonstration and open discussion that introduces researchers to the structure of this web-based, collaborative, and open-source tool on qualitative data analysis. The ZeMKI team will present a thorough guide of instruction for using OpenQDA, as well as the idea’s goals and implications. Following this, all participants are encouraged to ask questions in the allotted Q&A period. The second segment of this workshop will invite on-site participants to engage in a research community session, for an exchange of knowledge that supports the future development of this application. To get a sense of prospective demands, we request that researchers bring the relevant data analysis formats of published works such as tables or visualizations.

To learn more, visit our program page. We are looking forward to your participation!

Special Issue: Open Research Infrastructures and Resources for Communication and Media Studies

Despite the advantages of accessible and reproducible research practices for scholars in media and communication research, few journals present opportunities to examine these resources. Therefore the journal of Media and Communication plans to publish a Special Issue on “Open Research Infrastructures and Resources for Communication and Media Studies” in 2026 to encourage an exchange of feedback between researchers on the implications of relevant resources and infrastructures. The Call for Papers on this issue invites papers to discuss and pursue resources that adhere to open science principles. The Methods Lab lead, Christian Strippel is a co-editor of this issue. 

In regards to submissions, open science principles emphasize non-commercial tools that may apply to both quantitative and qualitative methods. Articles that present datasets, evaluate research software or compare instruments involved in data analysis are encouraged. The scope also extends to papers discussing developments or challenges to the operation of open research infrastructure, and investigates the potential areas for improvement. Notably, this publication considers implications for researchers of different socioeconomic and cultural backgrounds to address research inequalities and promote sustainability. Thus, papers are encouraged to reflect this dimension of diversity. In conclusion, contributions to this publication equip researchers with greater access and ease of operation to these valuable resources, ultimately advancing and promoting inclusivity within open research practices. 

Submission of Abstracts: 1-15 September 2025

Submission of Full Papers: 15-31 January 2026

Publication of the Issue: July/December 2026

Workshop: Research in Practice – Attending to Algorithms in and around Organizations (November 26, 2024)

We’re excited to announce our upcoming workshop Research in Practice – Attending to Algorithms in and around Organizations, which will take place on Tuesday, November 26. This workshop will be conducted at the Weizenbaum Institute and is open to Weizenbaum Institute members.

Led by Maximilian Heimstädt, Professor of Digital Governance & Service Design at the Helmut Schmidt University in Hamburg, this workshop sheds light on how to research the role of algorithms for work and workers in and around organizations. As the anthropologist Nick Seaver has aptly put it: “Just as critical scholars picked them up, algorithms seemed to break apart.” The aim of the workshop is to understand the breaking apart of algorithms as an opportunity for creative research questions, designs and methods. In the first part of the workshop, Maximilian presents different ways in which algorithms in organizations lend themselves to study. In the second part, participants are invited to present their own research projects and ideas, and to discuss methodological challenges with the group.

The workshop is co-organized by the Methods Lab and the Research in Practice – PhD Network for Qualitative Research, coordinated by Katharina Berr and Jana Pannier.

For further details, visit our program page. We are looking forward to your participation!

New preprint article: Extracting smartphone use from Android event log data

With smartphones now more prevalent in everyday life than ever before, understanding their use and its implications becomes increasingly necessary. While self-reporting in surveys is the method typically used to assess smartphone use, it is affected by various problems such as distorted retrospection, social desirability bias, and high aggregation. More advanced methods include the Experience Sampling Method (ESM), which presents multiple short surveys per day to limit the degree of retrospection, and logging (Android only), which accesses an internal log on the device itself that documents each user activity in extremely high resolution. Although the latter is the most precise and objective method available for assessing smartphone use, the raw data received from the log file requires extensive transformation to extract actual human behavior rather than technical artifacts. Still, this transformation was never documented systematically and researchers working with this input implemented arbitrary steps to extract the data they required. 

The preprint article Extracting Meaningful Measures of Smartphone Usage from Android Event Log Data: A Methodological Primer, authored by former Methods Lab fellow Douglas Parry and Methods Lab member Roland Toth, aims to provide a detailed step-by-step guide to extracting different levels of smartphone use from Android log data. Specifically, the guide helps identify glances (short checks without unlocking the device), sessions (uses from unlocking to locking), and episodes (single app uses) from such log files, allowing for further investigation. All steps are presented as pseudo-code as well as described in text. In addition, the Online Supplementary Material (OSM) contains the full pseudo-code, a rendition in the R programming language, a sample data set containing raw log data, and more helpful material.

This guide ultimately enhances our understanding of how humans interact with these versatile devices, particularly beneficial for projects within the social sciences and neighboring disciplines. While survey methods are recognized for their economical advantages and ease of administration, access to objective high-resolution data contributes a more refined perspective. We hope this article helps researchers identify valuable measures from raw android event log data, thereby making this rich data source more accessible and manageable than it has previously been. 

Workshop Recap: Open Research – Principles, Practices, and Implementation

On September 3 2024, Tobias Dienlin from the University of Vienna held the workshop Open Research – Principles, Practices, and Implementation at WI. In this workshop, he gave an overview of Open Research and its motivations, relevance, and formal and technical implementation.

In the first part of the workshop, Tobias argued that certain problems and values in science are the main reasons why researchers should practice Open Research. The problems included the replication crisis (a lack of or low quality of replication studies, especially in the social sciences), questionable research practices (p-hacking, HARKing, errors), and publication bias (journals prefer exciting, expected, and significant results). The values in question included openness as a foundation of science itself and the dedication to scientific advancement instead of emphasizing individuals that achieve it.

In the second part, the formal practices of Open Research were discussed. Tobias first clarified the differences between the terms Open Science, Open Research, and Open Scholarship. To achieve a culture of Open Research, he suggested aiming for open access, pre-/post-printing, open reviews, author contribution statements, open teaching, and citizen science. While these practices ususally require additional work, the burden can be lowered by already considering and preparing them in the initial stages of a research project. For instance, by implementing two of the most important Open Research practices: Preregistrations and registered reports.

  • In a preregistration, any details of a study that are already fixed (e.g., theoretical foundation, research questions, hypotheses, analysis methods, …) are published before conducting the study itself. After conducting the study, the preregistration is referred to in the manuscript, and possible deviations from it are explained. This procedure reduces the possibility and risk of p-hacking and HARKing, and under specific circumstances a preregistration can even take place after the data have already been collected.
  • A registered report is a more elaborate version of a preregistration. It consists of all parts of a submission that do not involve the analysis and the results. The submission can therefore be reviewed before the data and results even exist. This way, reviewers are not influenced by results and publication bias can be avoided. While a preregistration can be published anywhere, the registered report format needs to be offered by the journal itself.

In the last part of the workshop, the focus was on tools and software that help implement Open Research practices. For example, the free-to-use repository OSF can be used for pre-/post-prints, preregistrations, and online supplementary materials such as data, analysis code, or questionnaires. As an exercise, Tobias gave participants the opportunity to implement a basic preregistration or registered report on OSF for a research project they were working on already and try different features, such as linking it to a repository on GitHub. After summarizing the insights of the workshop, Tobias concluded by showing a fitting statement:

Open Science: Just Science Done Right.

During the workshop, participants had plenty of space to ask questions, discuss with everyone or in separate breakout rooms, and interact in various ways. We would like to thank Tobias for this insightful workshop and strongly encourage the implementation of Open Research.

Workshop: Open Research – Principles, Practices, and Implementation (September 3, 2024)

We’re excited to announce our upcoming workshop Open Research – Principles, Practices, and Implementation, which will take place on Tuesday, September 3. This workshop will be conducted both at the Weizenbaum Institute and online, and is open to Weizenbaum Institute members as well as external participants (and the QPD).

Led by Tobias Dienlin, Assistant Professor of Interactive Communication at the University of Vienna, this workshop will equip participants with skills in open research by covering principles of transparency, reproducibility, the replication crisis, and practical sessions on sharing research materials, data, and analyses. It will also include preregistrations, registered reports, preprints, postprints, TOP Guidelines, and initiatives like DORA, CORA, and RESQUE. Participants will engage in drafting preregistration plans and discussing the incentives and challenges of open research, aiming to integrate these practices into their work for a more transparent and robust research community.

For further details, visit our program page. We are looking forward to your participation!

Short Project: Ethics of Data Work

AI systems rely heavily on workers who face precarious conditions. Data work, clickwork, and crowdwork—essential for validating algorithms and creating datasets to train and refine AI systems—are frequently outsourced by commercial entities and academic institutions. Despite the vast and growing workforce of 435 million data workers enabling machine learning, their working conditions remain largely unaddressed, resulting in exploitative practices. Academic clients, in particular, lack clear guidance on how to outsource data work ethically and responsibly.

To address this issue, Christian Strippel from the Methods Lab is part of the short project “Ethics of Data Work” together with Milagros Miceli and Tianling Yang from the research group “Data, Algorithmic Systems and Ethics“, Bianca Herlo and Corinna Canali from the research “Design, Diversity and New Commons“, and Alexandra Keiner from the research group “Norm Setting and Decision Processes“. Together they aim to create equitable working systems grounded in the real knowledge and experience of data workers. The project will gather valuable insights about the challenges and needs data workers face, with the objective of developing ethical guidelines for researchers to ensure responsible and ethical treatment in the future.

Workshop Recap: Introduction to High-Performance Computing (HPC)

On May 6 2024, Dr. Loris Bennett from FUB-IT at Freie Universität Berlin held the workshop Introduction to High-Performance Computing (HPC) at WI. In this workshop, he gave an overview of the mechanics of HPC and enabled participants to try it out themselves. While the workshop used the HPC cluster provided by FUB-IT as a practical example, most of the contents applied to HPC in general.

Dr. Bennett began with definitions of HPC and core concepts. He described HPC as a cluster of servers providing cores, memory, storage with high-speed interconnections. These resources are shared between users and distributed by the system itself. Users send jobs consisting of one or more tasks to the HPC cluster. Each task will run on a single compute server, also called a node, and can make use of multiple cores up to the maximum available on a node. The number of tasks per node can be set for each job, but defaults to one. Lastly, an HPC cluster may provide different file systems for different purposes. For example, the file system /home is optimized for large numbers of small files used for programs, scripts, and results, while /scratch is optimized for temporary storage of small numbers of large files.

Next, Dr. Bennett proceeded with resource management. When launching a job, many parameters can be set, such as the number of CPU and GPU cores, the amount of memory, and the time used. In order to determine the resources required for jobs, users need to run a few jobs and check what was actually used. This information can then be used to set the requirements for future jobs and thus ensure that the resources are used efficiently. The priority of a job dictates when a job is likely to start and depends mainly on the amount of resources consumed by the user in the last month. A Quality of Service (QoS) can be set per job which will increase the priority of a job, but the jobs within a given QoS will be restricted in the total amount of resources they can use. In addition, it is possible to parallelize tasks by splitting them into subtasks that can be performed simultaneously. Likewise, many similar jobs can be planned efficiently using job arrays.

Finally, participants could log into the FUB-IT HPC cluster themselves either using the command line or graphical interface tools and request first sample jobs. They were shown how to write batch files defining job parameters, use commands to submit, show, or cancel jobs, and check the results and efficiency of a completed job.

The Methods Lab would like to thank Dr. Bennett for his concise but comprehensive introduction to HPC!

Recap: Networking Event for Digitalization Research in Berlin

On May 31st, the Methods Lab of the Weizenbaum Institute and the Interdisciplinary Center for Digitality and Digital Methods of the Humboldt University Berlin (IZ D2MCM) organized a networking event to which they invited various institutions, institutionalized teams, and centers that are actively engaged in digital research in the humanities, social sciences, and cultural studies in Berlin. On this Friday, about 50 scientists met in the Auditorium of the Grimm Center to present their work, future needs, and opportunities for cooperation, and thus to improve the networking of the Berlin research landscape. For this purpose, the event was divided into two parts:

In the first part, all teams, initiatives, and institutes introduced themselves in short presentations. The following institutions, teams and initiatives presented themselves:

  • The Data-Methods-Monitoring Cluster at DeZIM Institute is a cross-disciplinary facility that uses and adapts proven data evaluation methods. Their work includes the development of experimental designs (DeZIM.lab), the creation and adaptation of survey methods and survey designs (DeZIM.methods). In addition, they offer training on quantitative and qualitative methods via the DeZIM Summer School. https://www.dezim-institut.de/en/institute/data-methods-monitoring-cluster/
  • The Digital History of Education Lab (DHELab) at Bibliothek für Bildungsgeschichtliche Forschung (BBF)offers training and lectures on digital history and 3D research data, visualization, text mining and AI-supported literature research through initiatives such as Last Friday’s Lab Talk (LFLT). The lab also develops services to support digital research practice in historical educational research. https://bbf.dipf.de/de/arbeiten-lernen/dhelab
  • The Digital Humanities Network at University of Potsdam focuses on research and teaching collaborations in the digital humanities. It offers events and courses such as the ECode & Culture Lecture Series”, the “Henriette Herz Humanities Hackathons” and the “Python 4 Poets Course”. https://www.uni-potsdam.de/en/digital-humanities/
  • The Department of Audio Communication at TU Berlin conducts transdisciplinary research and development in the areas of music, sound and language. They have developed digital research tools such as „PLAY“ and „Spotivey“, which are used in areas such as virtual acoustic reality and music and media reception research. https://www.tu.berlin/ak
  • The Alexander von Humboldt Institute for Internet and Society (HIIG) focuses on how digital methods impact internet and society research, including the development of digital tools and collaborative platforms. They have worked on projects such as the „Open Knowledge Maps“ for academic content visualization, and tools to enhance rainforest protection in Indonesia using remote sensing and geo-tracking technologies. https://www.hiig.de/das-institut/
  • Prof. Dr. Helena Mihaljević, Professor of Data Science and Analytics at the University of Applied Sciences (HTW) and the Einstein Center for Digital Future (ECDF), organizes the “Social and Political Data Science Seminar.” The seminar brings together PhD students, postdocs, senior researchers, and NGO professionals and serves as a platform to present and discuss ongoing projects, providing valuable feedback on research questions, methods and other aspects. https://www.htw-berlin.de/hochschule/personen/person/?eid=11889
  • The Interdisciplinary Center for Digitality and Digital Methods at Campus Mitte (IZ D2MCM) at HU Berlinfosters collaboration among different faculties in the areas of digitality and digital methods. They provide innovative infrastructure to support excellent research, and have working groups focused on Data Literacy, Reading and Writing Workshops, Large Language Models, Git & Software, Statistics, and Mapping. https://izd2m.hu-berlin.de/
  • QUADRIGA, a data competence center for Digital Humanities, Administrative Sciences, Computer Science, and Information Science, specializes in pioneering digital research methods. They support researchers across disciplines by facilitating studies on data flow, international standards like FAIR and CARE, and digital methods development. Their platform, QUADRIGA Space, hosts educational resources, including the QUADRIGA Navigator tool, to aid in navigating the digital research landscape. https://quadriga-dk.github.io/
  • The DM4 ‘Data Science’ at Berlin State Library develops machine learning and artificial intelligence technologies to enhance the accessibility of digitized historical cultural data, encompassing both text and image formats. Its primary objective is to enable efficient search capabilities and facilitate the organization and structuring of data, making it openly available and machine-readable „Collections as Data“.The STABI Lab focuses on research and development initiatives to advance digital projects and enhance access to cultural heritage materials. https://lab.sbb.berlin/?lang=en
  • TELOTA is an initiative by the Berlin-Brandenburg Academy of Sciences and Humanities (BBAW) dedicated to digital research methods, focusing on projects and tools that facilitate scholarly inquiry in digital environments. It offers resources like digital editions, training, text analysis tools, and research infrastructure to support scholars in various disciplines engaging in digital research. https://www.bbaw.de/bbaw-digital/telota
  • The Research Lab “Culture, Society” and the Digital at HU Berlin combines research and teaching in the field of digital anthropology at the Institute for European Ethnology at Humboldt University. Theoretically and empirically, it combines various traditions of ethnographic and multi-modal digitization research based on social theory and analysis. Projects include „The Social Life of XG – Digital infrastructures and the reconfiguration of sovereignity and imagined communities“, „Packet Politics: Automation, Labour, Data“, „Cultures of Rejection; Digitalization of Work and Migration“, and more. https://www.euroethno.hu-berlin.de/de/forschung-1/labore/digilab
  • The Methods Lab at the Weizenbaum Institute is the central unit for supporting, connecting, and coordinating the methods training and consulting. It conducts methods research, and has developed tools for the collection and analysis of digital data, such as the „Weizenbaum Panel Data Explorer“ and the „MART“ app for the collection and analysis of digital data. https://www.weizenbaum-institut.de/en/research/wdsc/methods-lab/
  • The Social Science Research Center Berlin (WZB) has PostDoc networks that focus on specific (digital) methods. For example, a new group on AI Tools facilitates discussions on their utilization and potential acquisitions by the WZB. In addition, the WZB organizes the Summer Institute in Computational Social Science (SICSS) with an emphasis on text as data. The IT & eScience department of WZB supports scientific IT and data science consulting. They offer infrastructures for computing power and geodata and expertise in research software engineering. Projects include the digital version of the DIW Weekly Report. https://wzb.eu/
  • The Digital History Department at Humboldt University is involved in projects such as H-Soz-Kult and JupyterBook for Historians. Their main areas of expertise include modeling historical information, the impact of dataification, and the application of AI-based methods in historical research. They also offer workshops, support, and educational resources for digital history tools. https://www.geschichte.hu-berlin.de/de/bereiche-und-lehrstuehle/digital-history

In the second part of the event, we then broke into seventables to discuss selected topics in small groups in a World Café format.

  • At the “Training on digitality and digital methods” table, we discussed digitality as a central epistemic problem of our research. For example, it seems to go hand in hand with a kind of compulsive quantification that needs to be reflected in both the application and the teaching of digital methods. Future needs that the network could address were mentioned: The creation of interdisciplinary networks for self-learning & trainings, a better exchange of basic teaching modules (Open Educational Resources), a sort of level system for methods training, as is already used for learning foreign languages (A1, A2, B1…), and the sharing of technical infrastructures such as HPC clusters.
  • The table on “Collaboration—governance and best practices” explored different formats of collaboration and networking. Starting points are e.g. networks around topics or around similar data domains. It is important that networking is supported at an early stage, that existing networks are used to avoid duplication of effort, that they stem from bottom-up initiatives, have a long-term perspective, and that they are supported with resources wherever possible. Impulse budgets, for example, are particularly suitable for this purpose. To ensure that collaborations and networks do not become an end in themselves, it is also important to define goals and constantly monitor their success. 
  • At the “Services in teams and centers” table, we collected various areas in which services are needed and offered. These include the provision of infrastructure, interfaces, software, and other resources, as well as consulting, training, and networking. The central goal of services is skill development in any given institution, while important challenges include ensuring their sustainability, evaluating their functionality and needs to avoid oversupply, and clearly targeting previously defined audiences. 
  • The “Cooperation with external services” table discussed various areas in which purchasing external services can be useful. These include: OCR, digitization & quality assurance, less frequently data conversion, GUIs, web applications, and increasingly data analytics. However, most services are still performed in-house. The reasons for this are that there are few positive reports, the cost of tendering is high, small projects are often not economically viable for external providers, and there is little co-development and few real collaborations. The benefits of such collaborations are that external providers can be helpful for realistic cost estimates. In addition, public-private partnerships could open up new sources of funding through economic development.
  • At the “Digitality as a concept between society and science” table, we discussed the problems that emerge from different approaches and perspectives science and the broad public have regarding digitality and digitalization. As these play an increasingly important role for more and more people in everyday life, education, and at work, science communication is tasked with informing the public about current developments. Past events (e.g., data leaks) proved that skepticism toward technology is certainly warranted, but projects and action practices meant to tackle this issue are oftentimes implemented rather poorly. As a result, technology such as artificial intelligence may appear overly powerful, which leads to fear rather than literacy. Future research and discussions should therefore assess how to best merge and guide digitality and digitalization to make the public more comfortable with them.
  • At the “Large Language Models” table, we discussed the current state of research on these types of models as well as future perspectives and desires regarding their use in research. Embracing reproducibility and open research is one of the most important requirements in this field. Further, participants noted the requirement to move beyond mere benchmarking, limitations in terms of resources and hardware, the necessity of establishing AI guidelines, and considering the contexts of Large Language Model application. Providing hardware and an infrastructure that can be used in research and teaching emerged as a crucial step to achieve these goals. Specifically, strengthening local ties between institutions (e.g., in and around Berlin) could be a first step. Finally, more research on how and for what the public uses such technologies could help develop measures to improve AI literacy.
  • At the “Research Software Engineers” table, we discussed the necessity as well as challenges in the context of employees in research who focus on developing software and tools. In practice, tools that are necessary for conducting (niche) research are either bought or developed by the researchers themselves. While the former is dependent on financial means and funding, the latter is often not possible at all because of a lack of skill and/or time. In teaching as well as interdisciplinary research contexts, acquiring programming knowledge is oftentimes neither desired nor feasible. Hiring research software engineers is however tied to other challenges. They have to be situated in a way that their work is not too specific (e.g., on a research group level), but also not too unspecific (e.g., on an institution level). Standardization, documentation, and longevity of their work needs to be ensured. Also, the inclusion of software or tool development in funding applications is not typical in many fields of research yet. Finally, the acknowledgment of software as a publication and contribution of science is not satisfactory yet, but important.

Since the networking event was so successful, the organizers will be meeting again soon to discuss how to continue and intensify the networking. Stay tuned!