ClinRead: Understanding clinical notes for new languages – Novo Nordisk Foundation grant for Leon Derczynski

A huge amount of information about patients is stored in text. Medical record systems have many different data fields, but patients and treatments vary hugely, and everything that doesn’t fit perfectly in a data field is instead typed up as a note. Some estimates indicate that around 40% of information about a patient is kept in the clinical note text.

Accessing this information automatically brings strong advantages. Advances mean patient histories can be automatically mined; signs of diseases can be detected in post-hoc cohort studies; records can be automatically summarised to allow medical professionals to quickly gain an overview of a patient’s history; and diseases can be automatically surveilled.

However, there is a lack of tools for processing these clinical texts in most languages. While research has advanced on understanding English clinical notes, largely thanks to access available via private providers in the USA, the situation for other languages is often much worse. This prevents patients in those countries from accessing the technologically-driven advantages as those who are treated in English-speaking countries.

This barrier is compounded by (a) regulatory requirements; (b) the cost and complexity of annotating data for training machine learning models.

ClinRead applies modern artificial intelligence techniques in the sub-field of natural language processing to map the techniques and advances available in privileged languages (e.g. English) to clinical notes written in other languages. This will be done via a technique known as transfer learning. The success of the approach will be validated in the first instance against de-identified Danish clinical notes, an under-privileged language in terms of available technology. If successful, the resulting technology gives a route to significantly reducing the barriers in introducing clinical natural language processing to new languages.

The ClinRead project will last for 12 months, starting in spring 2020, and is supported by Novo Nordisk Foundation grant NF0059138.

Verif-AI: against misinformation – DFF grant for Leon Derczynski

Leon Derczynski has won a 30-month grant from DFF, the Independent Danish Research Fund, who contribute 2.9M DKK of funding to the project. The project, Verif-AI, researches automatic detection of fake news, across many languages.

Misinformation and propaganda threaten our public discussions and try to change our attitudes and behaviours. This kind of manipulation is easy to produce at huge scales, making it hard for fact checkers to detect it and deal with it. This is a problem that we can use artificial intelligence against, and there have been some promising early results, but only really for English.

While there are many different kinds of misinformation and propaganda, one that’s efficient to work against is where there are false claims that we can provide evidence against. So, to improve the situation, Verif-AI investigates cross-lingual ways of finding misinformation, by detecting where claims have been made, and comparing the claims with knowledge bases like Danmarks Statistik.

Using AI tools called adversarial learning and multi-task learning, we can not only build fact verification technology for multiple languages: we can also benefit by adding together data from many different languages. What’s more, once there are models for cross-language fact checking, the project will adapt these to languages for which there’s no fact-checking data at all, bringing fact checking technologies to many languages at once, assisting journalists across the world.

Journalists play an important role in dealing with misinformation, and this is a core part of the project, too: we work with TjekDet at Mandag Morgen and also the EU “WeVerify” project to share news and data to a broad network of journalists, giving Verif-AI has real impact.

Verif-AI starts in 2020 and will run until mid-2022. You can reach the project lead at

Where to find us: NODALIDA 2019

Being a research group located in the Nordics, ITU NLP has a strong presence at NODALIDA this year, held in Turku. The conference’s general chair is Barbara Plank from ITU NLP, for whose efforts we are all very grateful. You can find us here:

Monday September 30

NLPL Workshop on Deep Learning for Natural Language Processing, 09:00-17:00 PUB2

  • Co-organiser: Leon Derczynski (also first session chair, 09:20-10:00)

Deep Transfer Learning: Learning across Languages, Modalities and Tasks

Barbara Plank. Keynote, 10:30-11:30, NLPL DL4NLP, PUB2

Tuesday October 1

Lexical Resources for Low-Resource PoS Tagging in Neural Times.

Barbara Plank and Sigrid Klerke. Talk: 11:25-11:50, Parallel session A, PUB1

Bornholmsk Natural Language Processing: Resources and Tools.

Leon Derczynski and Alex Speed Kjeldsen. Poster: 16:45-17:45, Poster and demo session, Entrance hall

We introduce language processing resources and tools for Bornholmsk, a language spoken on the island of Bornholm, with roots in Danish and closely related to Scanian. This presents an overview of the language and available data, and the first NLP models for this living, minority Nordic language.

The Lacunae of Danish Natural Language Processing.

Andreas Kirkedal, Barbara Plank, Leon Derczynski and Natalie Schluter. Poster: 16:45-17:45, Poster and demo session, Entrance hall

Danish has received relatively little attention from a technological perspective. In this paper, we review Natural Language Processing (NLP) research, digital resources and tools which have been developed for Danish. We find that availability of models and tools is limited, which calls for work that lifts Danish NLP a step closer to the privileged languages.

UniParse: A universal graph-based parsing toolkit.

Daniel Varab and Natalie Schluter. Demo: 16:45-17:45 

Come by for a chat on how UniParse works and how it may be useful for your research.

Wednesday October 2, 2019

Political Stance Detection for Danish.

Rasmus Lehmann and Leon Derczynski. Talk: 11:10-11:35, Parallel session A: Sentiment Analysis and Stance, PUB1

The presented research concerns identification of the stance towards immigration within quotes from politicians brought in Danish newspapers. Covered in the presentation will be the creation of a dataset of stance annotated quotes from politicians in Danish, the first of its kind, along with the creation of two deep-learning based stance detection models, one using an LSTM architecture and one using a basic feed forward architecture, along with the results of testing these models.

Neural Cross-Lingual Transfer and Limited Annotated Data for Named Entity Recognition in Danish.

Barbara Plank. Talk: 11:10-11:35, Parallel session B: Named Entity Recognition, PUB3

Session chairing – Parallel session A: Text Generation and Language Model Applications
14:00-15:15, PUB1. Leon Derczynski

Joint Rumour Stance and Veracity Prediction.

Anders Edelbo Lillie, Emil Refsgaard Middelboe and Leon Derczynski. Talk: 11:35-12:00, Parallel session A: Sentiment Analysis and Stance, PUB1

We present an end-to-end stance and veracity prediction system that works at SotA level on Danish despite low data, and show that stance-based veracity prediction models can be transferred across languages and platforms with negligible performance drop.

NEALT business meeting

13:00-14:00, PUB1

Watch out for ✨exciting✨ items from ITU NLP here…



We hope to meet you in Turku!

Rob van der Goot joins NLP at ITU

Rob has a background in information science, but quickly became interested in the field of natural language processing, especially in the problem of building robust models. His expertise lies in automatically deriving syntactic analyses of natural language (parsing), with a focus on low-resource settings. During his PhD, he improved the automatic syntactic analysis of social media texts by first translating it to a more ‘standard’ form (try it yourself: More broadly, he is interested in the automatic processing of all types of language varieties without having explicit training data.
Rob will be working at the ITU as a postdoc under supervision of Barbara Plank (partially funded by Amazon), together they will develop natural language processing models for low-resource languages and language varieties.
Rob van der Goot

Alan Ramponi joins NLP at ITU

Alan is a Ph.D. student in natural language processing at Fondazione The Microsoft Research – University of Trento COSBI, Italy. His research focuses on unsupervised domain adaptation and deep learning methods for biomedical information extraction from scientific publications. Broadly, his interests are centered on building robust language models which are resilient to domain shift, thus being readily applicable to real-world problems in which the target domain is not known in advance.

Alan will be doing his work as a visiting Ph.D. fellow with Barbara Plank, researching domain adaptation methods for all the stages of the task of biomedical event extraction.

Rasmus Lehmann joins NLP at ITU

We’re very happy to welcome Rasmus Lehmann to NLP at ITU!

Rasmus resides in the cross section between business, communication and technology, with a Bachelor’s degree within organizational communication and economics from CBS, and a Master’s degree within software development, specialized in Business Intelligence and Machine Learning. Rasmus’ interest in the field of NLP was aroused while working on implementing a deep learning-based model for use in rumor identification, and he continued to write his thesis, titled “Stance Detection in Danish Politics”. The focus of this project was to build a dataset of quotes from Danish politicians for use in stance detection in Danish, and applying a deep learning-based approach to solving this classification task. The project was subsequently turned into a submission for the NoDaLiDa 2019 conference on Computational Linguistics, to which the paper was accepted.

Rasmus will be working closely with Leon Derczynski on creating tools for NLP in Danish.

Rasmus Lehmann

Daniel Varab joins NLP at ITU

We are delighted to welcome Daniel Varab back to NLP at ITU! Daniel introduces himself:

“I come from a traditional computer science background and somewhat by chance ended up writing my thesis titled “Contradiction Detection in Natural Languages” in the scope of natural language inference (NLI). This sparked my interest in NLP and has caused all my work since to revolve around the field. I am now two years out after graduation and have spent a year as a research assistant exploring NLI and graph-based dependency parsing, followed by a year at the Danish/Swedish company Karnov Group where I have worked on helping lawyers navigate the ever-growing pile of legislation with the use of NLP techniques. I am now excited to be heading back into academia where I will be working on text summarization together with Natalie Schluter and supporting courses of ITU’s data science bachelor degree.

With regards to interests, I genuinely enjoy work on simple models with well-founded inductive biases, work on so-called less privileged languages, and good old thorough research.”

Daniel Varab

Mateusz Jurewicz joins NLP at ITU

We are delighted to welcome Mateusz Jurewicz to NLP at ITU! Mateusz’ project is on Deep Learning Generative Models for Content Structuring. He introduces himself:

I’m currently working as Machine Learning Engineer at Tjek A/S (also known as ShopGun, eTilbudsavis & Mattilbud in other countries) and have just started my Industrial PhD.

I have previously worked as a software engineer at Intel, working on their Nervana ML project as well as a business analyst at a number of other companies. I’ve received my Master’s degree at the University of Warsaw, back in Poland where I’m originally from.

I’ll be working on generative approaches towards structuring product catalogs, such as the one you can see here:

I really enjoy solving problem through code (particularly in Python), reading unusual books (e.g. Kim Stanley Robinson’s Years of Rice and Salt), rock climbing and dungeons and dragons.  

If you’d like to check out some of my engineering projects, you can take a look at my github portfolio here:

I look forward to working with you 🙂 

Amrith Krishna joins NLP at ITU

We are delighted to welcome Amrith Krishna to NLP at ITU! Amrith introduces himself:

A Passepartout when it comes to my research interests. Broadly, I am interested in anything that comes under computational linguistics and Natural Language Processing. Specifically, my research interests lie in Morphology, Free word order languages, structured prediction, and program synthesis. My Ph.D. thesis titled, “Addressing Characteristics for Data-Driven Modelling of Lexical, Syntactic and Prosodic Tasks in Sanskrit”, was under the supervision of Prof Pawan Goyal at the Dept. of Computer Science and Engineering, IIT Kharagpur (under review). Currently, I work with Dr. Natalie Schluter, where we explore research at the intersection of formal languages, algorithms, and machine learning.

Amrith Krishna

Misinformation on Twitter During the Danish National Election: A Case Study

Leon Derczynski, Torben Oskar Albert-Lindqvist, Marius Venø Bendsen, Nanna Inie, Viktor Due Pedersen and Jens Egholm Pedersen


Elections are a time when communication is important in democracies, including over social media. This paper describes a case study of applying NLP to determine the extent to which misinformation and external manipulation were present on Twitter during a national election. We use three methods to detect the spread of misinformation: analysing unusual spatial and temporal behaviours; detecting known false claims and using these to estimate the total prevalence; and detecting amplifiers through language use. We find that while present, detectable spread of misinformation on Twitter was remarkably low during the election period in Denmark.