A huge amount of information about patients is stored in text. Medical record systems have many different data fields, but patients and treatments vary hugely, and everything that doesn’t fit perfectly in a data field is instead typed up as a note. Some estimates indicate that around 40% of information about a patient is kept in the clinical note text.
Accessing this information automatically brings strong advantages. Advances mean patient histories can be automatically mined; signs of diseases can be detected in post-hoc cohort studies; records can be automatically summarised to allow medical professionals to quickly gain an overview of a patient’s history; and diseases can be automatically surveilled.
However, there is a lack of tools for processing these clinical texts in most languages. While research has advanced on understanding English clinical notes, largely thanks to access available via private providers in the USA, the situation for other languages is often much worse. This prevents patients in those countries from accessing the technologically-driven advantages as those who are treated in English-speaking countries.
This barrier is compounded by (a) regulatory requirements; (b) the cost and complexity of annotating data for training machine learning models.
ClinRead applies modern artificial intelligence techniques in the sub-field of natural language processing to map the techniques and advances available in privileged languages (e.g. English) to clinical notes written in other languages. This will be done via a technique known as transfer learning. The success of the approach will be validated in the first instance against de-identified Danish clinical notes, an under-privileged language in terms of available technology. If successful, the resulting technology gives a route to significantly reducing the barriers in introducing clinical natural language processing to new languages.
The ClinRead project will last for 12 months, starting in spring 2020, and is supported by Novo Nordisk Foundation grant NF0059138.