Text extraction from ID card using deep learning
In this article, we’ll detail the methodologies behind the crafting of a Proof of Concept (POC) workflow, tailored for ID card text extraction. This solution was developed using five open source deep learning models implemented in Python. We will offer insights into the use of instance segmentation, classification and Optical Character Recognition (OCR) techniques. In a world that’s rapidly going digital, the ability to extract information quickly and accurately from physical documents is becoming indispensable. Whether it’s for customer onboarding in the banking sector, verifying identity in online services, or streamlining administrative tasks in various industries, ID card information extraction plays a pivotal role. But as anyone who has manually entered data can attest, manual extraction is prone to errors, tedious, and time-consuming. With advances in machine learning and Computer Vision, we now have the tools to automate this process, making it faster, more accurate, and adaptable to a wide range of ID card formats. This work was conducted by Ambroise Berthe by, an R&D Computer Vision Engineer at Ikomia, in early 2022. The insights shared in this article draw inspiration from his comprehensive report. Overview of the solution The solution we’ve designed comprises a series of independent open source algorithms capable of: Detecting and outlining all identification documents present in an image using an instance segmentation algorithm. Cropping and straightening each detected object to ensure the text is always horizontal and readable from left to right. Text detection: Identifying the positions of all words in the identification document. Text recognition: Recognizing the characters in all the previously detected words. Classifying these character strings based on their position and content to extract the main information, such as name(s), date, and place of birth. Building the algorithm In this solution, we will delve into the components that make up the identity documents reading system. For this POC, our goal is to fine-tune the algorithms for several document variants, including: French ID card (old and new versions) French driving license (old and new versions) Passport French Resident card Nevertheless, with the right dataset, this solution can be tailored to accommodate any kind of document. Building the dataset In today’s world, the most effective methods for creating algorithms to perform complex tasks are based on deep learning. This supervised technique requires a substantial amount of reliable data to operate accurately. Therefore, dataset creation was the first step in this project. Multiple datasets were essential for this project, as several different task models are involved. Database for Document Segmentation We needed a database to train a model capable of segmenting identification documents within an image. We chose segmentation over simple detection to precisely extract the image area for processing. This decision was crucial as the documents photographed might contain extra text that could have disrupted the subsequent algorithm steps: Example of the reverse side of a driving license with undesired text in the background. For each image a file was produced, detailing the class and the polygon outlining each identification document. This dataset comprises approximately 100 images per document type, totaling nearly 1100 annotated images. Example of labeled image for instance segmentation. Top right: recto of old French ID card, top left: verso of old French ID card, bottom: recto of an old French driving license. From this initial dataset, we cropped and straightened all the images. This created the image database that would be annotated for OCR and Key Information Extraction (KIE). This image set was also used to train the model to straighten the images so that the text is horizontal. Database for OCR & KIE We decided to annotate text at the word level rather than at the sentence level. While annotating at the word level is more time-consuming, it allows for easier subsequent manipulation of the database. Specifically, it’s simpler to merge detections than to split one. This involves assigning to every word in the image: A bounding box surrounding the word. The corresponding character string. The word’s class (e.g., First Name, Last Name, Date of Birth, etc.). Example of labeled image for OCR and KIE. A box is drawn over each word with their associated classes. Purple: ‘Other’, Green: ‘Surname’, yellow: ‘first name’, blue: ‘date of birth’ and orange: ‘place of birth’. Model selection This project incorporates five open source deep learning models implemented in Python. Choosing the right algorithm for each task is crucial. Apart from an algorithm’s inherent performance, it must also be compatible with others, both in terms of input-output nature and of the execution environment. …
Back Contact Posted by:admin - Nov. 13, 2023, 7:59 a.m.