Gone are the days when machine learning (ML) was limited by too few training data. Many use cases today require machine learning algorithms to learn complex patterns from huge amounts of training data. In most use cases, however, these training data are hard to come by, especially when dealing with highly personal or confidential document types such as IDs, insurance contracts or social security cards.
To remedy this, LangTec has created DataGenerator, a customised AI solution for generating massively varied amounts of training data based on a very small number of representative sample documents. DataGenerator permits to generate literally hundreds of thousands of unique document instances based on which even the most data-hungry learning algorithms will have enough to munch on.