From November 11 and until November 29th I visited the Natural Language Processing group at the University of Sheffield, in the United Kingdom. The Short-Term Scientific Mission was part of the Transnational Access program of the SoBigData project, funded by the European Union under grant agreement No. 654024. The program provides support to researchers to travel to one of the SoBigData partner centers where they can engage researchers who are experts in their fields.
I visited Prof. Kalina Bontcheva at the University of Sheffield who has done extensive research on Natural Language processing on social media text and, more recently, on spread of rumor veracity prediction, hyperpartisan news detection, and spread of misinformation. During the research visit, I continued my ITU research on multi-lingual stance classification. Stance classification is the task of classifying the stance of the author of a piece of text with respect to another piece of text. In my research I focus on social media content, predominantly from Twitter. For example, a Twitter user may post a claim such as, for example “I heard that the Eiffel tower is made out of cheese”. When other Twitter users react to such tweets, they often manifest a supportive stance (e.g. “I believe it’s all made out of Gouda”), disagreement (e.g. “Been there, checked, it’s definitely made of metal”), ask for clarifications (e.g. “Do you have any link to support that?”), or just simply comment on the reply without necessarily taking a clear stance. Automatically classifying the stance in social media text is an important Natural Language Processing task since stance labels provide information for validating claims and help in identification of rumors and fake news. This has important implications for societal events such as democratic elections. The task is particularly challenging as it requires that the NLP systems learn to recognize relation patterns between words in tweets and their replies. Moreover, the amount of labeled data is quite small which poses a challenge for modern deep neural networks, such as recurrent neural networks with LSTMs which are known to require lots of training data to learn from. Training such models on languages other than English is even more difficult since there is even less labeled data for other languages. Finally, label imbalance in the data can cause machine learning NLP systems to tend to ignore those kinds of stances that appear less frequently.
With the assistance of Kalina Bontcheva and Carolina Scarton, I focused on understanding the difficulties of evaluating stance classification. Researchers at the University of Sheffield have extensive experience in running shared task competitions on stance classification and rumor identification, and on evaluating systems participating in the competition. One of the major hurdles in both training highly performing systems and in evaluation is label imbalance. Most tweets in currently available data sets are comments (i.e. they do not manifest a strong supporting, denying, or querying stance). Because of this, traditional measurement metrics such as accuracy, precision, recall, and F1 tend to be misleading of the system’s true performance. Therefore, alternative metrics must be used to understand the performance of stance classification systems. Macro-averaged metrics (versions of precision, recall, and F1 that assign equal importance to each kind of stance) are one such option. During the visit in Sheffield, I investigated the output of my own classifiers for stance classification in multiple languages by following Carolina’s work in using confusion matrices to understand where systems fail and why. Confusion matrices make is relatively easy to observe if a stance classifier has a tendency to ignore rare stances and provides information on what kind of stances are confused for each other. I discovered that utilization of class weights allows my classifier to address some of the problems characteristic of label imbalance, and came up with a strategy to perform transfer learning to let the classifiers learn from more data than what is included in the various stance classification data sets currently available. Transfer learning is a technique for training systems based on machine learning on two or more tasks that share some commonalities. The goal is for the system to improve its performance on a target task by benefiting from the similarities with the related tasks.
Now, upon returning to ITU in Copenhagen, I will continue research of the several promising paths identified while in Sheffield to improve the performance of my stance classification architecture.
~ Manuel R. Ciosici