Viewing posts from: %s
Evaluating the Performance of Large Language Models for Information Extraction: A Comparative Study

This article examines the performance of large language models (LLMs) like ChatGPT4-Turbo and ChatGPT4-Omni for information extraction tasks, comparing them with LangTec’s specialized E-MailParser. Our analysis reveals significant limitations of LLMs in this domain.

Benchmarking Accuracy Scores

To benchmark the accuracy of ChatGPT4-Turbo, ChatGPT4-Omni, and LangTec’s E-MailParser, we conducted a comprehensive evaluation using 20 documents across four extraction tasks:

  • Q88: Vetting Questionnaires for Tanker Information
  • Timesheet: Tanker Loading/Unloading Statements of Fact
  • Ship: Requests for Commercial Cargo Shipping
  • Cargo: Commercial Cargo Vessel Position Lists

From each of these documents we extracted about 20 target data points. For the evaluation,These documents had predefined ground truth labels, indicating the expected target values for each field. By comparing the extracted values to these ground truth labels, we were able to calculate accuracy scores for each model.

Benchmark Results: LLMs vs. LangTec’s E-MailParser

Both ChatGPT4-Turbo and ChatGPT4-Omni show some level of accuracy in information extraction tasks, achieving overall scores of 56 % and 49 % respectively. Notably, the newer model ChatGPT4-Omni performs worse on this task for most document types than its predecessor ChatGPT4-Turbo. Another important observation was that model performance is impaired by inconsistency. For the same input text and extraction task, these models provide different answers each time they are queried, even when prompted for the same question. This non-deterministic behavior renders them unreliable for scenarios where consistent and accurate information retrieval is essential.

In contrast, specialized parsers such as LangTec’s E-MailParser exhibit significantly higher accuracy, are fully deterministic in their behaviour and consistently achieve 98 % extraction accuracy across various document formats. This reliability makes a deterministic document-understanding solution like E-MailParser a more dependable solution for information extraction tasks, particularly when dealing with diverse e-mail content in business-critical applications.


While LLMs like ChatGPT are excellent for generating content, they have notable limitations, particularly in scenarios requiring deterministic output such as information extraction tasks. For such applications, document-understanding solutions like LangTec’s E-MailParser offer a more reliable and accurate solution.

Read More
Presenting at the Maritime Breakfast at the Business Club Hamburg: Talks and Networking for Logistics Practitioners

The topic of the Maritime Cluster Norddeutschland event on April 23 was “Innovative algorithms for shipping and logistics: How automatic text extraction and quantum computing will change business processes”. 40 participants met in the premises of the Business Club Hamburg, the Villa im Heinepark on Elbchaussee to hear fascinating presentations about the promises of digitisation in the shipping industry.

Jan Herberg, CEO of our partner of many years Herberg Systems GmbH, showed how automatic information extraction from email requests enables the digitisation of workflows for shipping logistics. Dr. Kilian Foth, Team Lead Text Analytics at LangTec demo’ed LangTec’s EmailReader and showed how AI-based semantic text analysis can obtain structured business data from various kinds of unstructured documents and messages.

Oliver Szal and Joshua Dibbern from FraunhoferCML showed that competitive quantum computing has already arrived: the audience chose the parameters of a “Maritime Inventory Routing Problem” that was then solved twice over (locally and by a quadratic annealer in Canada) with the same time budget, and the quantum algorithm D-WAVE found the higher-value solution than classical CPLEX optimization.

Read More
LangTec attends FoldForum II

Protein Folding is a prime example of how rapidly AI can bring technological advances in numerous fields. To witness this, two LangTec team members, Maximilian and Pat, attended the FoldForum II event.

FoldForum II was hosted by a cooperation of AUFBRUCH.Hamburg and Artificial Intelligence Center Hamburg (Aric e.V.) at the DeepTech Campus. As with the first FoldForum event, this cooperation has been a great host and the DeepTech Campus alone is worth a visit for it’s striking appearance and new, comfortable interior.

After Dr. Natalie Rotermund from Aric e.V. and Dr. Dr. Alexander El Gammal from AUFBRUCH.Hamburg introduced the event and the speakers, Dr. Felix Tobola from Aric e.V. started the talks with an in-depth overview of what protein folding is and how it works. He created many moments of insights with his very visual presentation. His talk was followed by another great talk by Dr. Kilian Guse and Head of Bioinformatics Brian Dawson from GQ Bio Therapeutics. They showed how this new technology can be creatively used to design pharmaceutical products, presenting astonishing technology which would probably only be found in science fiction novels just a few years ago.

A highlight was the following panel discussion between the different speakers as well as the audience. Topics ranged from technical details of the AI models to philosophical questions about the epistemic implications of AI-based protein folding for academic research, driven by the manifold interests and backgrounds of the audience.

We would like to thank Aric e.V. and AUFBRUCH.Hamburg for hosting such an inspiring event with so many knowledgeable speakers and are looking forward to future events in this series!

Read More
More Effective Project Acquisition by Automating Cross-Portal Search for New Public Tenders and Project Offers

Both public tenders and project offerings are published with high update frequency on a wide range of online portals. Companies therefore need to check multiple portals continually for new entries and updates if they do not want to miss out. Additionally, one usually performs multiple searches with different query terms, such that the effective number of queries to each portal quickly multiplies. Scanning results ploughing through large number of results, often already seen in previous queries. To be effective, this manual search needs to be performed regularly and tends to be extremely tedious and time-consuming.

To help with that, LangTec has developed a crawler-based solution that fully automates this process. The result is a periodic e-mail update, which presents a clear summary of all new entries found across all portals for the user’s personalised search terms. This allows for a much faster response to new public tenders and project offers and removes the effort of manual search.

Initially, LangTec developed this solution for internal use only. To registered users the service now is also available as a subscribable, commercial service. The selection and number of search terms is fully customisable and the same holds for the set of portals scraped and the e-mail update intervals. Feel free to reach out to us  any time in case this sounds interesting to you.

Read More
Joint Talk on AI at the Spring Convention of tekom Germany in Freiburg

Hosted by the tekom Germany Spring Convention in picturesque Freiburg, together with our business partner parson AG, we gave an interesting talk on use cases of applied AI in technical documentation. The topical focus of this presentation were the many different tools and methods that AI offers these days, and how they can be applied to specific challenges in technical documentation.

We’d like to extend a big ‘Thank you!’ to all participants for the inspiring questions and discussions that followed the talk. Certainly after the presentation it was very clear to everyone that when it comes to automation AI features as a sophisticated Swiss army knife rather than a crude mallet. Being in the know of precisely which AI tool to unfold for which use case empowers you to solve even complex automation challenges effectively and efficiently. It goes without saying, that LangTec is always more than happy to support in such situations 🙂

Read More
An insight into the future of climate technology: Our workshop at KlimaInvest

Our recent journey took us to the vibrant heart of HafenCity, where we had the opportunity to participate in an inspiring workshop with our esteemed client, KlimaInvest. Nestled amidst breathtaking views of the surrounding harbor landscape, the company’s new office provided the perfect setting for gaining deep insights into their team dynamics and work processes. Accompanied by a hint of fresh sea breeze, the workshop focused on various aspects that KlimaInvest will advance in 2024. Here are some of the prominent points that were highlighted during the event:

An in-depth introduction and analysis of KlimaInvest’s products and solutions by the CEO, Johannes Schimler, contributing to a better understanding of market conditions and the company’s strategic direction.

Extensive adjustments in the CRM system are planned for KlimaInvest in the first half of 2024.

Furthermore, lively discussions were initiated on ensuring operational and process stability in increasingly integrated applications. Together with JaMoin, we also explored optimization potentials of tests and test infrastructure to further enhance the efficiency and quality of processes.

In the images we share here, you can enjoy the stunning view from KlimaInvest’s office.

We thank KlimaInvest for the opportunity to participate in this enlightening workshop and look forward to further successful collaboration in the service of a greener future.

Read More
Celebrating 13 Years of Language Innovation at Langtec: A Delicious Anniversary Lunch

LangTec is a teenager!  As we recently celebrated our 13th anniversary, we took the opportunity to reflect on our journey and to celebrate our achievements as a team. And what better way to commemorate this milestone than by indulging in a delightful team lunch at Bullerei Deli restaurant, nestled in the vibrant Schanze district and helmed by the renowned TV chef, Tim Mälzer.

The ambiance of Bullerei Deli provided the perfect backdrop for our celebration as we eagerly explored the restaurant’s new menu, boasting a fusion of Japanese influences, classic dishes, many poultry delights, and an array of vegetarian options.

Our culinary standouts were the Mushroom Pasta 24/7, crispy Karaage, steaming Tantanmen, and the Veggie Larb – something for everyone.

Thirteen years of innovation, collaboration, and growth have brought us to this moment, and we look forward to the challenges and successes that lie ahead.
Cheers to Langtec, and to the delicious memories created at the Bullerei deli!

Read More
Visit to the German Stasi Records Archive: Participation in the expression of interest procedure for the virtual reconstruction of the Stasi files

The German Federal Archives store 40 to 55 million pages of torn Stasi files. These are to be restored through automatic virtual reconstruction. A previous pilot project was unable to complete the task adequately, as even the Tagesschau reported on 21.04.2023. An expression of interest procedure has now been launched for a two-part project consisting of a scanning process and virtual reconstruction. We are applying for the automatic virtual reconstruction. The core task is to develop an automated process for arranging scanned document snippets into full pages and complete documents.

At the end of January, we visited the German Stasi Records Archive to talk to the Vice President of the Federal Archives, Alexandra Titze, and present our approach. As a research-oriented technology provider, LangTec showcases an innovative AI-based approach that enables efficient processing of the large amounts of text and data that the volume of Stasi documents inevitably represent.

We are going to follow the topic with great interest and look forward to a possible future collaboration!


Read More
A journey from Bethlehem to South America – This year’s LangTec Winter Holiday Party

This the season of joy, laughter, twinkling lights and delicious food and this year’s LangTec Winter Holiday Party was no exception.  We began on a snowy evening at the Hamburg Planetarium where we travelled back 2000 years to investigate theories of the true origin of the celestial wonder of the “Star of Bethlehem”.
From there we took a Moia and sped forward in time to Yaku, a modern Peruvian/Mexican fusion restaurant in Hamburg’s Grindelviertel, where we indulged in course after course of new, wonderful and colourful flavour combinations and merry mescal cocktails.

Happy holidays from our LangTec family to yours!

Read More
parson and LangTec announce AI cooperation

The use of artificial intelligence is becoming increasingly important in technical communication. In order to offer our customers the best possible solutions for AI-based text and language technology applications, parson has started a cooperation with the Hamburg-based technology provider LangTec. LangTec develops innovative language technology solutions for the efficient processing of large amounts of text and data with special focus on AI and machine learning.

“LangTec’s know-how and many years of experience with machine learning and artificial intelligence perfectly complement our expertise in the field of technical documentation. Together we can develop the best possible solutions for the use of artificial intelligence in technical communication,” says Ulrike Parson, CEO of parson AG.

“We are very pleased about this close cooperation with parson AG. As one of the leading providers of AI-based language technology in the German-speaking market, it is particularly important for us to work with established players who are ready to put custom text analytics solutions into production. AI becomes valuable when it helps to gain concrete competitive advantages,” says Dr. Patrick McCrae, founder and managing director of LangTec.

About parson

parson is a leading service provider of smart content and intelligent information solutions. parson AG advises its customers on the digitalization of content processes and the introduction of a sustainable content strategy. For products, software and services, parson delivers semantically enriched, modular content such as user documentation, programming instructions, online help, eLearning content and specifications.


Fine-Tuning of a Language Model

To kick off the partnership with LangTec, parson presents a pilot project at this year’s tcworld conference 2023. This model projects deals with the fine-tuning of a Large Language Model (LLM) and was realized jointly with LangTec.

In their presentation, Helle Hannken-Illjes and Ulrike Parson show first results of the domain-specific fine-tuning of a pre-trained large language model (LLM) on customer-specific data. The presented model can be operated locally and is hence also perfectly suited for processing sensitive customer data:

AI yes, but not ChatGPT! How do I get my own language model? (in German)

Helle Hannken-Illjes and Ulrike Parson, presentation

tcworld 2023, November 15, 2023, 9.00 a.m., room C6.2

Find out more

Read More