Our client is aiming to prequalify suitable investment targets. In a yet another project involving large transformer-based language models, LangTec has developed a solution to identify all relevant company types based on company website information. The key challenge in this task is to deal with huge amounts of website content whose length exceeds the typical sequence length limitation posed by transformer-based language models. LangTec’s solution was optimised for recall, i.e., it was designd to capture all potentially interesting companies in the training and test set.
In addition to designing, training and optimising the perfect-recall classifier, LangTec successfully trained a hybrid language model that uses features from a another, non-neural-network statistical model along with features from the transformer-based model to arrive at a joint classification decision. This model architecture permits to combine transformer-based models with other machine-learning models in a hybrid architecture.