Workshop: Web Scraping and API-based Data Collection

— with Florian Primig (FU Berlin), Steffen Lepa (TU Berlin), Felix Gaisbauer, and Lion Wedel (both Weizenbaum Institute)

When: Thursday, March 2nd, 2023, 10am–1pm
Where: WI Flexroom (A1 04) + Collocall (hybrid)

Abstract: Gathering data is the first step in answering questions in empirical research. If this data is owned by platforms or companies, we as researchers face a crucial issue: How can we access this data in order to analyze it? Many platforms offer direct access to (parts of) their data base via so-called Application Programming Interfaces (API). However, as we can currently see from the example of Twitter, this data collection option only exists as long as companies choose to enable it. This leads us to web scraping, an alternative data collection technique that has gained popularity over the past years. It allows for the collection of data by automating the access to websites, effectively simulating a regular user.

For this workshop, we asked four colleagues to provide us with a “Show & Tell” introduction to various data collection methods involving API and web scraping techniques applied to different platforms. Florian Primig (FU Berlin) presents the use case of the Telegram API, which allows for harvesting data from numerous Telegram groups. Steffen Lepa (TU Berlin) gives us a virtual tour through Spotivey, a GDPR-compliant web application for retrieving Spotify user data within online surveys. Felix Gaisbauer (WI) demonstrates how to use the Twitter API and twitter explorer, which allows for the investigation of networks surrounding specific topics. Finally, Lion Wedel (WI) shows us how to apply web scraping using the Python module Beautiful Soup that he used in a study of a forum of so-called “incels” (“involuntary celibates”). The four presentations will be framed by a general introduction to web scraping and API-based data collection by the Methods Lab team as well as a Q&A at the end of the workshop.

Florian Primig is a doctoral researcher in the Digitalization and Participation division at the Institute for Media and Communication Studies at the Free University of Berlin. He conducts research on platform publics, disinformation and (fringe) discourse. 

Steffen Lepa is a postdoc researcher at the Audio Communication Group at TU Berlin. He conducts research on mediatization, digital media change, media use and reception (focus on sound & music), audio branding, music information retrieval and empirical research methods.  

Felix Gaisbauer is a postdoctoral researcher in the research group “Digital News Dynamics” at Weizenbaum Institute, where he studies the use of computational methods to capture news-driven public debate online. His core research areas are network theory, modeling of complex systems and quantitative analysis of platform communication, e.g. on Twitter.

Lion Wedel is a doctoral researcher in the research group “Digital News Dynamics” at Weizenbaum Institute. He conducts research on TikTok, news dynamics and fringe communities.