Resources

Code

For all our research code, see our ITU NLP github repository

Danish pipeline

With tokenization, part-of-speech tagging and parsing:

Til automatisk orddeling, annotering af ordklasse, og dependensanalysering af tekster på dansk.

Danish named entity recognition

For location, person, and organization names:

Til automatisk navnegenkendelse af steder, personer og organisationer af tekster på dansk.

Danish word representations

  • dansk-brown.tar.bz2 ; Brown Clusters induced on Danish text from Wikipedia and Common Crawl (input length |S|=134M tokens; window a=5000; vocab |V|=778K word types). These are generalised Brown clusters, so you can generate clusterings of any size instantly from the download (see README).

Bornholmsk resources

To work with Bornholmsk:

Te at arbja på Borrinjholmsk

Dansk NLP mailing list

https://mailman.itu.dk/mailman/listinfo/dansknlp