Unless the writing is completely legible and usually modern, even advanced Optical Character Recognition (OCR) systems give rise to transcription problems and provide results with many errors that need to be edited afterwards, a time-consuming process.

The Computational Perception and Learning Research Group in the Computer Languages and Systems Department at the Universitat Jaume I, in collaboration with the Universidad Politécnica de Valencia, have developed a new assisted system for the transcription of written text called 'State',  a transcription system that integrates a series of tools with which images can be processed in order to remove noise and clean up the original image.

State system document transcription OCR
State is a new assisted system for the transcription of written text.  Credit: Universitat Jaume I
With State, the page structure can be detected, the text can be recognized and mistakes can be quickly and easily edited with interactive tools such as an electronic pen applied directly on the text. Andrés Marzal, one of the researchers in the project, says,“It is a practical solution to the problem of a supervised transcription, since it shortens the most time-consuming phase, that is, editing the automatic transcription so that it is true to the original.”

The researchers say State makes it possible to save up to 50% of the time devoted to transcribing and correcting ancient texts and manuscripts, depending on the error rate offered by the OCR used, which means many hours in the case of digitisation of large documentary resources.

One of the main contributions of State is the architecture of the system: the recognition engine is run in a machine different to that of the users, who connect to it simultaneously through the Internet and access the recognition engine via a web service to which they subscribe in order to obtain transcriptions on demand. Another advantage of this new system is the use of an adaptive server, that is, one that learns from examples. A natural work method consists in sending what one of the transcribers considers to be worth learning to the server, which can generate an improved version of the OCR that is immediately made available to all users.

“This is a very flexible and changeable tool because during one single session it allows users to connect to several recognition engines or to adapt the engine to the features of a specific document. It also offers business models in which invoicing is carried out in terms of transcription workflow,” Andrés Marzal points out.

Finally, researchers have also worked on a multimodal interface to make it easier for transcribers to handle this tool. Although the current version is used with a mouse, a keyboard and a pen-sensitive screen, plans have been developed for other interaction devices to be incorporated. “Interaction must be as natural as possible, particularly given the fact that users can work for several hours a day. Offering an intuitive interface is a must,” the researcher explains.

The prototype designed by researchers is an alpha version, so it can already be used. In fact, it has been recently installed in the Miguel de Cervantes Virtual Library and will be used in the Jaume I Archive for transcribing ancient documents.

As a new line of work, the Computational Perception and Learning Research Group at the UJI is considering the implementation of other OCRs specialised in certain kinds of typography that are frequent in ancient texts. In the medium term, new devices are also expected to be incorporated into the application, including touch or multitouch screens or voice.

“We have to head towards what technology offers us at a reasonable price: touch-sensitive screens or the incorporation of voice for executing commands,” says Marzal.