Software for masterpieces

  • company
  • code
  • crew
  • chronicle
  • contact

CODEMART :: Software for Masterpieces

    • Back to list
    • Subscribe to rss
    • Previous Post
    • Next Post

    Filter by topics

    • All posts38
    • Codemart15
    • Conference4
    • Deployed6
    • Google8
    • Jobs13
    • Middleware1
    • Neuroscience16
    • Project9
    • Public Transport2
    • Students14
    • TheVirtualBrain11
    • Website1
    • Zwei2
    • team6

    Latest posts

    • Google Summer of Code 2025 opens soon
    • Summer 2025: Join us for 7 weeks of paid internship!
    • The Virtual Brain & Xircuits: a success story in neuroscience
    • 20 years of Codemart – C20
    • Codemart goes Bilbao: introducing TVB & EBRAINS 2.0 to ASTROTECH students
    • Summer 2024: Join us for 7 weeks of paid internship!
    • Google Summer of Code 2024 opens soon
    • Codemart becomes task leader in multi-year, EU-funded neuroscience projects to build a Digital Twin
    • Codemart delivers Artificial Intelligence for document processing in German health insurance
    • Codemart team building 2023: head over heels

    This is a generic notice.

    • Codemart delivers Artificial Intelligence for document processing in German health insurance

      15

      Sep

      2023

      by  Michael Burgstahler

      • Project
      • Middleware
      • Deployed
      • Antonia - Laptop

        Enterprise AI platforms have basically no UI. Just statistics 😁

      The internet has been buzzing for a while with shocking and amusing applications of the latest Artifical Intelligence platforms – from dressing the Pope in rapper swag, refilming Harry Potter in vertical format or asking an AI to pass a medical exam.

      Real commercial applications of AI are much more mundane but nonetheless amazingly effective if done right.

      Yes, an AI can replace human labor with higher throughput and better results. But Codemart focuses on AI systems handling tasks which are excessively boring and overwhelming for the human mind. 🙄

      For our German health insurance client PBeaKK (Postbeamtenkrankenkasse) in Stuttgart, we delivered an AI which relieves 3 full-time employees of mind-numbing classification work.

      Enter insurance bureaucracy

      While there have been some success stories automating insurance claim processing or generating custom quotes for customers, the real challenges await downstream – in the convoluted bowels of internal document processing.

      As a health insurer, PBeaKK receives hundreds of documents from their approx. 380.000 insurants – every single day!

      Reading, understanding, classifying and filing these documents accordingly has been an enormous manual task for over 100 years in this company. 🗄

      Obviously, PBeaKK had installed digitization processes from OCR to queueing, scheduling and categorized archiving. But these processes had a bottleneck:

      • There are over 200 archival categories, frequently changing and not all properly maintained.
      • PBeaKK case workers leave archival hints in the form of unstructured, digital "post-it notes". These range from 1 letter up to 1000 words.
      • It's impossible to define a central authority of proper category assignments since there are too many edge cases and moving targets.

      In the past, up to 3 employees were tasked to interpret these post-it notes and assign proper archival categories for every document.

      On a good day, 1 employee could handle about 200 documents from this ever-growing pool.

      Language-processing neural networks are failing

      Paula & Romina

      Paula & Romina: Codemart experts for neuroscience and Artificial Intelligence

      Codemart's team for this AI project consists of two developers with a background in cutting-edge neuroscience.

      The available training data were 200.000 exemplary "post-it notes". For privacy and scalability reasons, we could not get access to the underlying documents or even document metadata.

      Since this problem space was so unique and the data so incoherent, we set up a "battlefield" of competing AI technologies and architectures. And the results were very counter-intuitive:

      • Established models for language processing (Recurrent neural networks) failed miserably because they couldn't access contextual information.
      • Best performance was delivered by models built for image analysis (Convolutional neural networks) – but used for ASCII text!

      Armed with this unexpected champion, Codemart could build a successful system for production in just 2 months.

      To ensure high confidence in its judgment, the system consists of 2 very different models running in parallel – both built with PyTorch:

      • A 3-layer neural network seeded with a pretrained Transformer from HuggingFace (providing baseline knowledge of German language).
      • A language-neutral, classic machine learning model.

      Both models together have to arrive at the same judgment with reasonable confidence.

      Training data for the PBeaKK AI

      A small snippet of the available training data demonstrates the extreme variance of post-it notes. Sometimes it's just a… comma 😱

      The smartest office messenger around

      Codemart's AI platform for PBeaKK (dubbed "AntonIA") is used in production since February 2022 and retrained every 6 months.

      Retraining is run on a cloud-based Jupyter notebook (Google Colaboratory) and requires 10 epochs with 2 hours for each. This computing load is covered by just one NVIDIA Tesla T4 GPU with 16 GB RAM.

      To put the size of AntonIA into context:
      The well-known ChatGPT AI has 175 billion parameters. AntonIA can do its job with only 900.000 parameters.

      In production, the Codemart AI easily handles 600-1000 archival tasks per day, running on very modest hardware as transparent web service. No fancy GPU farms necessary.

      AntonIA is embedded in a structured internal document processing workflow:

      • Initial archival tasks with case worker notes are handed over to "Anton": a classic, deterministic rule engine with regular expressions.
      • Documents with notes outside of this rulebook are handed over to AntonIA.
      • When AntonIA is able to classify a document with more than 88% confidence, it's filed automatically.
      • Documents with lower categorization confidence are still handed over to human experts.
      Dr. Michael Schöpf of PBeaKK

      Spearheading the AI revolution at PBeaKK: project lead Dr. Michael Schöpf

      Our client PBeaKK is super happy with this achievement and has booked Codemart to add further enhancements: with extended access to metadata, AntonIA will be able to increase its confidence and also trigger more complex document routing actions.

      And rest assured, the displaced 3 employees who were doing this manually have been cheering ever since! They are now working on different tasks which are muuuuch more rewarding. 😅