Economics magazine brand eins wants to make accessible to their readers the entire back catalogue of past articles. The brand eins online archive should support search and browsing by topics across the more than 10,000 texts. So far, a manually curated topic structure already exists, but no mapping of texts into that topic structure. It is in the automation of precisely this mapping that LangTec will be supporting with text analytic expertise.
The objective of this project is to map the texts to these topic categories in an n-to-m assignment. In doing so, topic categories as well as their respective association strengths need to be reported for each text. LangTec aims to implement this mapping with machine learning. A central challenge in that will be to adjust the approach such that complete indexing of all texts can be achieved without additional editorial effort – even in the absence of labelled training data.