Machine Learning - How we train a “Black Box” to automate accounts
We’re running a series of conversations with colleagues in AutoEntry. Previously, Ciara McDaniel wrote about launching a software product and we talked to Wayne Thompson about running a successful sales team.
This blog is by Lucian Ciobanu, senior data scientist in AutoEntry. He discusses his move to Ireland, his career and how he and his colleagues work on AutoEntry’s machine learning technology.
Pursuing Excellence - from Romania to Ireland via Portugal
One of the biggest moments of my life was moving to Ireland. I’m originally from Romania but I previously lived in Portugal for 13 years.
I was 25 when I moved to Portugal; doing a Masters Degree in distributing systems. I spent 13 years there, I started research and then a PhD and post doctoral research.
In Portugal I studied machine learning and computer vision. I worked in European projects there, publishing research papers, some of which are in academic journals to this day. I loved what I did. But at some point I felt I needed to move on and I came to Ireland.
Starting up at a Startup
I’ve been in Ireland for five years. I love living and working here. I worked for two startups, AutoEntry being the second; the first was for one year and 9 months. It was a similar path to AutoEntry - building a machine learning product.
Then I moved onto AutoEntry. I was there from the beginning, when the machine learning application was built. I’d say it was successful. Of course, our story is not a startup any more. We’re growing very fast. We doubled in size over the past year.
Machine Learning is Dynamic
We used to have a software to extract data from invoices - net amount, total amount etc. But it was rudimentary compared to what we have now. When we measured the accuracy of the extraction, it had room for improvement. It was 30-something percent accurate. That was rigid.
But machine learning is dynamic. It’s a sort of black box - you put the documents as input, you say what you want as expected output, and the black box itself adapts. Being so dynamic and having this internal ability to adapt and to learn, improved the data extraction greatly.
When we started machine learning at first three years ago, we had 25% accuracy on extraction. Now we have 85% and we think there’s still room for improvement.
Not just that, but apart from being good at specific jobs, adapting and self adapting is key. At some point you’ll get different invoices from different suppliers, the black box has to adapt to a new pattern. Whenever we have a new pattern it adapts to that.
Improving is standard in machine learning. Feature extraction - or feature engineering - is where you need to teach the black box (the machine learning) what to look for. Specific features in an invoice are important for doing a good job. Most of the work in those first three years was our team analyzing what features contribute the most.
What “comes naturally” to a machine?
So how do you “teach” a machine?
Let’s say you have a total amount in an invoice; obviously in most cases, it’s at the bottom of the invoice - the “total”. There are certain things that come naturally to humans and that knowledge needs to be transferred to machines. We teach them. And one of those is that the total amount is at the bottom of the document.
In this example “bottom” is actually the Y coordinate of the total amount. Then it’ll find a keyword nearby (“total”, for instance). If you find that “total” keyword; that’s an indication - a little sign for the machine to look out for. So we’re already talking about two features - location and meaning. Normally totals are a bigger or a bold font. So we’re now talking about 3 features - location, meaning and writing style.
Again, this is the beauty of machine learning. I don’t have to tell it how important the feature is - it knows the weight (or importance) automatically.
Trial and error
So, we carry out manual inspection. Hours and hours of it are what contributes to a machine learning model. This is most of what we do. Having those processes in place to monitor accuracy over time. We carry out every experiment possible to measure the accuracy.
It involves lots of experimentation, like we’re in a lab! We dump lots of data and experiment with new ideas. The amount of ideas we explore, not all of them find their way into production. It’s about feature extraction in particular.
We work with invoices and receipts the most. And we have a fast pace for improving accuracy. At the beginning of 2021 we were at 80% accuracy. There is a limit - we can never be 100%. That said, it’s five months into 2021 and we’re at 85% already. That fast pace is one thing we’ve excelled at and we’re proud of.
We are preparing the next stage of machine learning. There are many pathways going forward that we already identified in the last six months. These are totally new research mini-projects aiming to enrich the product, in addition to improving the accuracy, which is the core of what we do.
AutoEntry - cutting edge accounts automation
AutoEntry is making working life easier for accountants, bookkeepers and business owners around the world. Find out how you can benefit from this technology with a free trial.
An intro to data entry automation
Our free guide explains how you can harness digitisation to save time and improve productivity.