Tutorial – WI Methods Lab

This blog post discusses when and when not to use the official TikTokAPI. Additionally, this blog post provides step-by-step instructions for a typical research scenario to inform aspiring researchers about using the API.

When and when not to use it

While being the official way of data access, the official TikTok API is by no means the only way for collecting TikTok data in an automatized fashion. Depending on the research endeavour, one of the other ways might be the way to go:

4Cat + Zeeschumier: Sensible if you want to collect limited data on one or more actors, hashtags, or keywords and/or are not confident in programming for the subsequent analysis.
An in-official TiKTok API (pyktok or the Unofficial TikTok API in Python): Both are great projects that provide significantly more data points than the official API. However, this comes with costs: stability and dependency on developers reacting to changes on TikTok’s site.

But why should you use the official TikTok API if those two options are available?

Reliability. In theory, the official API data access provides more stable access than other solutions.
Legality. Depending on your country or home institution, official data access might be a problem for legal reasons. However, you are on the safer side with official data access. Please consult your institution regarding data access.
User-level data. Other data collection methods are often superior in terms of data points on the video level (Ruz et al. 2023). However, the official TikTok API offers a set of user-level data (User info, liked videos, pinned videos, followers, following, reposted videos), which is not as conveniently available through other data collection methods.

One fundamental limitation still needs to be kept in mind. One can make only 1,000 daily requests, each containing 100 records (e.g., videos, comments) at most. This means that if one can exploit the complete 100 records per request (rarely possible), one can retrieve a maximum of 100,000 records per day.

To start with the official TikTok research API, visit Research API. To gain access, you need to create a developer account and submit an application form. When doing so, please record your access request under DSA40 Data Access Tracker to contribute to an effort to track the data access platforms provided under DSA40.

The official documentation on research API usage is not intuitive, especially for newcomers (Documentation). Using the API within the typical programming language Python/R might still pose a challenge, especially for researchers who are working with an API for the first time. The currently scarce availability of API guidance motivates this blog post to provide such guidance without a paywall.

With smartphones now more prevalent in everyday life than ever before, understanding their use and its implications becomes increasingly necessary. While self-reporting in surveys is the method typically used to assess smartphone use, it is affected by various problems such as distorted retrospection, social desirability bias, and high aggregation. More advanced methods include the Experience Sampling Method (ESM), which presents multiple short surveys per day to limit the degree of retrospection, and logging (Android only), which accesses an internal log on the device itself that documents each user activity in extremely high resolution. Although the latter is the most precise and objective method available for assessing smartphone use, the raw data received from the log file requires extensive transformation to extract actual human behavior rather than technical artifacts. Still, this transformation was never documented systematically and researchers working with this input implemented arbitrary steps to extract the data they required.

The preprint article Extracting Meaningful Measures of Smartphone Usage from Android Event Log Data: A Methodological Primer, authored by former Methods Lab fellow Douglas Parry and Methods Lab member Roland Toth, aims to provide a detailed step-by-step guide to extracting different levels of smartphone use from Android log data. Specifically, the guide helps identify glances (short checks without unlocking the device), sessions (uses from unlocking to locking), and episodes (single app uses) from such log files, allowing for further investigation. All steps are presented as pseudo-code as well as described in text. In addition, the Online Supplementary Material (OSM) contains the full pseudo-code, a rendition in the R programming language, a sample data set containing raw log data, and more helpful material.

This guide ultimately enhances our understanding of how humans interact with these versatile devices, particularly beneficial for projects within the social sciences and neighboring disciplines. While survey methods are recognized for their economical advantages and ease of administration, access to objective high-resolution data contributes a more refined perspective. We hope this article helps researchers identify valuable measures from raw android event log data, thereby making this rich data source more accessible and manageable than it has previously been.

Category: Tutorial

Tutorial: When and how to use the official TikTok API

When and when not to use it

New preprint article: Extracting smartphone use from Android event log data