Workshop Recap: A Practical Introduction to Text Analysis (November 30, 2023)

On November 30th, 2023, the Methods Lab organized a workshop on quantitative text analysis. The workshop was conducted by Douglas Parry (Stellenbosch University) and covered the whole process of text analysis from data preparation to the visualization of sentiments or topics identified.

In the first half of the workshop, Douglas covered the first steps involved in text analysis, such as tokenization (the transformation of texts into smaller parts like single words or consecutive words), the removal of “stop words” (words that do not contain meaningful information), and the aggregation of content by meta-information (authors, books, chapters, etc.). Apart from the investigation of the frequency with which terms occur, sentiment analysis using existing dictionaries was also addressed. This technique involves assigning values to each word representing certain targeted characteristics (e.g., emotionality/polarity), which in turn allows for comparing overall sentiments between different corpora. Finally, the visualization of word occurrences and sentiments was covered. After this introduction, participants had the chance to apply their knowledge using the programming language R by solving tasks with texts Douglas provided.

In the second half of the workshop, Douglas focused on different methods of topic modeling, which ultimately attempt to assign texts to latent topics based on the words they contain. In comparison to simpler procedures covered in the first half of the workshop, topic models can also consider the context of words within the texts. Specifically, Douglas introduced participants to Latent Dirichlet Allocation (LDA), Correlated Topic Modeling (CTM), and Structural Topic Modeling (STM). One of the most important decisions to be made for any such model is the number of topics to emerge: too few may dilute nuances within topics and too many may lead to redundancies. The visualization and – most importantly – limitations of topic modeling were also discussed before participants performed topic modeling themselves with the data provided earlier. Finally, Douglas concluded with a summary of everything covered and an overview of advanced subjects in text analysis.

The workshop was very well-received and prepared all participants for text analysis in the future. Douglas balanced lecture-style sections and well-prepared, hands-on application very well and provided all materials in a way that participants could focus on the tasks at hand, while following a logical structure throughout. We would like to thank him for this great introduction to text analysis!

First Research Fellow at the Methods Lab

The Methods Lab is excited to welcome its first research fellow who arrived at the Weizenbaum Institute on November 20: Douglas Parry from Stellenbosch University, South Africa. His research focus lies on Socio-Informatics in the area of Communication Science, Human-Computer Interaction, and Media/CyberPsychology.

During his 4-week stay, Douglas Parry will contribute to work at the Methods Lab in different ways. On November 30, he will hold the workshop A Practical Introduction to Text Analysis, where he covers all important steps, from pre-processing text to visualizing results of topic modeling in a single day. On December 7, he will host a Digital Methods Colloquium together with Roland Toth, where German researchers focusing on digital methods will get together, present recent work, and discuss challenges and opportunities in the field.

Furthermore, Douglas Parry is collaborating on two research projects with the Methods Lab during his stay, both of which involve the processing of complex data surrounding smartphone usage that were collected using multiple methods earlier this year.

The Methods Lab is happy to host Douglas Parry and is looking forward to the results of this exciting partnership – stay tuned!