Indexing and automatic recognition of handwritten text

Authors

  • Celio Hernández Tornero
  • Verónica Romero Gómez
  • Joan Andreu Sánchez Peiró
  • Alejandro Héctor Toselli Rossi
  • Enrique Vidal Ruiz

DOI:

https://doi.org/10.14672/0.2018.1432

Abstract

It is speculated that the amount of manuscripts accumulated in libraries and archives around the world far exceeds the amount of (original) text printed or typed to the present. Just a small amount of these documents has been digitized so far, and only part of it has been transcribed. Therefore, the most interesting information contained in the vast majority of digital images (i.e., the information transmitted by the text), remains inaccessible for easy reading, editing, indexing and search. In this article, projects and effective solutions recently developed within their frameworks are presented, both for the search of information and for the complete transcription of historical handwritten documents.

Downloads

Download data is not yet available.

References

Rashad; Al-Khatif, Wasfi G.; Mahmoud, Sabri (2017), “A survey on handwritten documents word spotting”, International Journal of Multimedia Information Retrieval, 6 (1): 31-47.

Bluche, Théodore (2015), Deep Neural Networks for Large Vocabulary Handwritten Text Recognition, Tesis doctoral, Université Paris Sud - Paris XI.

Bluche, Théodore; Hamel, Sebastien; Kermovant, Christopher; Puigcerver, Joan; Stutzmann, Dominique; Toselli, Alejandro; Vidal, Enrique (2017), “Preparatory KWS experiments for large-scale indexing of a vast medieval manuscript collection in the HIMANIS project”, In Proceedings of the International Conference on Document Analysis and Recognition, 311-18.

Dempster, A.P.; Laird, N.M.; Rubin, D.B. (1977) “Maximum likelihood from incomplete data via the EM algorithm (with discussion)”, Journal of the Royal Statistical Society, ser. B. 39 (1): 1-38.

Fiel, Stefan; Grüning, Tobias; Gatos, Basilis; Dien, Markus; Kleber, Florian (2017), “cBAD: ICDAR 2017 competition on baseline detection”, Proceedings of the International Conference on Document Analysis and Recognition.

Fischer, A.; Keller, A.; Frinken, V; Bunke, H. (2010), “Lexicon-free handwritten word spotting using character HMMs”, Pattern Recognition Letters, 33 (7): 934-42.

Frinken, V; Fischer, A; Manmatha, R; Bunke, H. (2012), “A Novel Word Spotting Method Based on Recurrent Neural Networks”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (2): 211-24.

Giotis, Angelos P.; Sfikas, Giorgos; Gatos, Basilis; Nikou, Christophoros (2017), “A survey of document image word spotting techniques”, Pattern Recognition, 68: 310-32.

Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009), “A novel connectionist system for unconstrained handwriting recognition”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 31 (5): 855-68.

Jelinek, Frederick (1998), Statistical Methods for Speech Recognition, Cambridge (Mass.), MIT Press.

Kim, G.; Govindaraju, V.; Srihari, S.N. (1999), “An architecture for handwritten text recognition systems”, International Journal on Document Analysis and Recognition, 2 (1): 37-44.

Makhoul, J.; Schwartz, R.; Lapre, C.; Bazzi, I. (1998), “A script-independent methodology for optical character recognition”, Pattern Recognition, 31: 1285-94.

Pastor i Gadea, Moisés (2007), Aportaciones al reconocimiento automático de texto manuscrito, Tesis doctoral, Universitat Politècnica de València.

Plamondon, R.; Srihari, S.N. (2000), “On-line and off-line handwriting recognition: a comprehensive survey”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 22 (1): 63-84.

Pratikakis, I.; Zagoris, K.; Gatos, B.; Puigcerver, Joan; Toselli, Alejandro H.; Vidal, Enrique (2016), “ICFHR2016 handwritten keyword spotting competition (h-kws 2016)”, 15th International Conference on Frontiers in Handwriting Recognition, IEEE: 613-18.

Puigcerver, Joan; Toselli, Alejandro H.; Vidal, Enrique (2015), “ICDAR2015 competition on keyword spotting for handwritten documents”, Document Analysis and Recognition (ICDAR), IEEE: 1176-80.

Romero, Verónica; Toselli, Alejandro H.; Vidal, Enrique (2012), Multimodal Interactive Handwritten Text Transcription, Machine Perception and Artificial Intelligence (volume 80), Singapore, World Scientific Publishing.

Sánchez, Joan Andreu; Romero, Verónica; Toselli, Alejandro H.; Vidal, Enrique (2014), “ICFHR2014 competition on handwritten text recognition on transcriptorium datasets (HTRtS)”, 15th International Conference on Frontiers in Handwriting Recognition, IEEE: 181-6.

—, (2015), “ICDAR 2015 competition HTRtS: Handwritten text recognition on the tranScriptorium dataset”, 13th International Conference on Document Analysis and Recognition, IEEE: 1166-70.

Steinherz, T; Rivlin, E.; Intrator, N. (1999), “Off-line cursive script word recognition-a survey”, International Journal on Document Analysis and Recognition, 2: 90-110.

Toselli, Alejandro H; Romero, Verónica; Pastor i Gadea, M.; Vidal, E (2010), “Multimodal interactive transcription of text images”, Pattern Recognition, 43 (5): 1814-25.

Toselli, Alejandro H; Vidal, Enrique; Casacuberta, Francisco (2011), Multimodal Interactive Pattern Recognition and Applications, Springer.

Toselli, Alejandro H; Vidal, Enrique; Romero, Verónica; Frinken, Volkmar (2016), “HMM word graph based keyword spotting in handwritten document images”, Information Sciences, 370-371: 497-518.

Toselli, Alejandro H; Leiva, Luis A.; Bordes-Cabrera, Isabel; Hernández-Tornero, Celio; Bosch, Vicent; Vidal, Enrique (2017), “Transcribing a 17thcentury botanical manuscript: Longitudinal evaluation of document layout detection and interactive transcription”, Digital Scholarship in the Humanities, 33 (1): 173-202.

Toselli, Alejandro H; Vidal, Enrique (2013), “Fast HMM-Filler approach for Key Word Spotting in Handwritten Documents”, 12th International Conference on Document Analysis and Recognition: 501-5.

Vidal, Enrique (2017), “Advances in handwritten keyword indexing and search technologies”, Codicology and Palaeography in the Digital Age 4, eds. Patrick Sahle; Hannah Busch; Franz Fischer. Norderstedt, Books on Demand: 103-19.

Published

2023-06-06