A data model and a cataloguing, storage and retrieval system for ancient document archives

By Pasquale Savino, Anna Tonazzini, Franca Debole


Digitalization of ancient manuscripts is becoming a common practice in many archives and libraries, mainly for preservation purposes. This opens many new opportunities for the diffusion of these precious cultural assets, since several scholars and researchers, as well as the general public, may access and use them for research purposes, for study, and for general information. This is made possible if the documents, their descriptions, and the result of all processing activities performed on them are acquired at a good quality and can be easily accessed by using simple and powerful retrieval mechanisms.

Acquired manuscripts suffer of degradations that may require different types of elaborations on the digital images, to improve their visual quality and legibility, or to discover hidden text that is not visible. Natural Language Processing requires the creation of transcriptions of the text contained in the manuscript, as well as encoding of the document structure and creation of user annotations.

This paper presents a document management system and a metadata schema that make possible the storage and content-based retrieval of original documents, elaborations performed to improve their readability, textual transcriptions, and linguistic annotations. The archive will offer the possibility of describing, storing and accessing all the available manuscript versions, document transcriptions and annotations, and to search and retrieve documents based on all this information.

Full Text:



K. Knox and R. Easton, “Recovery of lost writings on historical manuscripts with ultraviolet illumination,” in Proc. of Fifth International Symposium on Multispectral Color Science (Part of PICS 2003 Conference), Rochester, NY, 2003, pp. 301-306.

E. Salerno, A. Tonazzini, and L. Bedini, “Digital image analysis to enhance underwritten text in the Archimedes palimpsest,” Int. J. on Document Analysis and Recognition, vol. 9, pp. 79-87.

E. Console, A. Tonazzini, E. Salerno, P. Savino and F. Bruno, “Integrating optical imaging and digital processing for nondestructive diagnosis of artifacts”, in Proc. of TECHNART 2015.

E. Salerno, F. Martinelli and A. Tonazzini, “Nonlinear model identification and seethrough cancellation from recto-verso data”, Int. Journal on Document Analysis and Recognition, 16, 177-187

A. Tonazzini, G. Bianco and E. Salerno, “Registration and Enhancement of Double-sided Degraded Manuscripts”, in Proc. 10th International Conference on Document Analysis and Recognition, 546-550

A: Tonazzini, E. Salerno, M. Mochi and L. Bedini, “Blind Source Separation techniques for detecting hidden texts and textures in document images”, in Proceedings Int. Conference ICIAR 2004, Porto, Portugal, September 29 - October 1, 2004, Lecture Notes in Computer Science 3212, 241-248

[G. Amato, C. Gennaro, F. Rabitti and P. Savino, “Milos: A Multimedia Content Management System for Digital Library Applications”, in Proc. of the 8th European Conference ECDL, Lecture Notes in Computer Science, 3232, 14-25

P. Savino and A. Tonazzini, “Digital restoration of ancient color manuscripts from geometrically misaligned recto-verso pairs, Journal of Cultural Heritage”, Vol 19, pp. 511-521.

A. Tonazzini, P. Savino and E. Salerno “A non-stationary density model to separate overlapped texts in degraded documents”, Signal, Image and Video Processing, Springer, Vol. 9, pp. 155-16

A.M. Del Grosso, A. Bellandi, E. Giovannetti, S. Marchi and O. Nahli, “Scanning is Just the Beginning: Exploiting Text and Language Technologies to Enhance the Value of Historical Manuscripts”, Proc. IEEE 5th International Congress on Information Science and Technology (CiSt 2018), pag. 214-219

M. Artini, A. Bardi, F. Biagini, F. Debole, S. La Bruzzo, P. Manghi, M. Mikulicic, P. Savino, and F. Zoppi, “Data interoperability and curation: the European film gateway experience”, Proc. IRCDL 2012, pag. 33-44

F. Debole, E. Salerno, P. Savino, and A. Tonazzini, “Editing metadata to support the content analysis, storage and retrieval of ancient documents “, Proc. 5th International Congress on "Science and Technology for the Safeguard of Cultural Heritage in the Mediterranean Basin" (Istanbul, Turkey, 22-25 November 2011). Proceedings, vol. III (2nd Part) pp. 180 - 185. Valmar, 2012.