The internet has been buzzing for a while with shocking and amusing applications of the latest Artifical Intelligence platforms – from dressing the Pope in rapper swag, refilming Harry Potter in vertical format or asking an AI to pass a medical exam.
Real commercial applications of AI are much more mundane but nonetheless amazingly effective if done right.
Yes, an AI can replace human labor with higher throughput and better results. But Codemart focuses on AI systems handling tasks which are excessively boring and overwhelming for the human mind. 🙄
For our German health insurance client PBeaKK (Postbeamtenkrankenkasse) in Stuttgart, we delivered an AI which relieves 3 full-time employees of mind-numbing classification work.
Enter insurance bureaucracy
While there have been some success stories automating insurance claim processing or generating custom quotes for customers, the real challenges await downstream – in the convoluted bowels of internal document processing.
As a health insurer, PBeaKK receives hundreds of documents from their approx. 380.000 insurants – every single day!
Reading, understanding, classifying and filing these documents accordingly has been an enormous manual task for over 100 years in this company. 🗄
Obviously, PBeaKK had installed digitization processes from OCR to queueing, scheduling and categorized archiving. But these processes had a bottleneck:
- There are over 200 archival categories, frequently changing and not all properly maintained.
- PBeaKK case workers leave archival hints in the form of unstructured, digital "post-it notes". These range from 1 letter up to 1000 words.
- It's impossible to define a central authority of proper category assignments since there are too many edge cases and moving targets.
In the past, up to 3 employees were tasked to interpret these post-it notes and assign proper archival categories for every document.
On a good day, 1 employee could handle about 200 documents from this ever-growing pool.
Language-processing neural networks are failing
Codemart's team for this AI project consists of two developers with a background in cutting-edge neuroscience.
The available training data were 200.000 exemplary "post-it notes". For privacy and scalability reasons, we could not get access to the underlying documents or even document metadata.
Since this problem space was so unique and the data so incoherent, we set up a "battlefield" of competing AI technologies and architectures. And the results were very counter-intuitive:
- Established models for language processing (Recurrent neural networks) failed miserably because they couldn't access contextual information.
- Best performance was delivered by models built for image analysis (Convolutional neural networks) – but used for ASCII text!
Armed with this unexpected champion, Codemart could build a successful system for production in just 2 months.
To ensure high confidence in its judgment, the system consists of 2 very different models running in parallel – both built with PyTorch:
- A 3-layer neural network seeded with a pretrained Transformer from HuggingFace (providing baseline knowledge of German language).
- A language-neutral, classic machine learning model.
Both models together have to arrive at the same judgment with reasonable confidence.
The smartest office messenger around
Codemart's AI platform for PBeaKK (dubbed "AntonIA") is used in production since February 2022 and retrained every 6 months.
Retraining is run on a cloud-based Jupyter notebook (Google Colaboratory) and requires 10 epochs with 2 hours for each. This computing load is covered by just one NVIDIA Tesla T4 GPU with 16 GB RAM.
To put the size of AntonIA into context:
The well-known ChatGPT AI has 175 billion parameters. AntonIA can do its job with only 900.000 parameters.
In production, the Codemart AI easily handles 600-1000 archival tasks per day, running on very modest hardware as transparent web service. No fancy GPU farms necessary.
AntonIA is embedded in a structured internal document processing workflow:
- Initial archival tasks with case worker notes are handed over to "Anton": a classic, deterministic rule engine with regular expressions.
- Documents with notes outside of this rulebook are handed over to AntonIA.
- When AntonIA is able to classify a document with more than 88% confidence, it's filed automatically.
- Documents with lower categorization confidence are still handed over to human experts.
Our client PBeaKK is super happy with this achievement and has booked Codemart to add further enhancements: with extended access to metadata, AntonIA will be able to increase its confidence and also trigger more complex document routing actions.
And rest assured, the displaced 3 employees who were doing this manually have been cheering ever since! They are now working on different tasks which are muuuuch more rewarding. 😅