A data model and a cataloguing, storage and retrieval system for ancient document archives

Pasquale Savino; Anna Tonazzini; Franca Debole

doi:10.57675/IMIST.PRSM/ijist-v3i5.132

A data model and a cataloguing, storage and retrieval system for ancient document archives

Pasquale Savino ISTI-CNR - Italian Research Council - Information Science and Technology Institute http://orcid.org/0000-0002-8841-5440
Anna Tonazzini ISTI-CNR - Italian Research Council - Information Science and Technology Institute
Franca Debole ISTI-CNR - Italian Research Council - Information Science and Technology Institute

DOI: http://dx.doi.org/10.57675/IMIST.PRSM/ijist-v3i5.132

Abstract

Digitalization of ancient manuscripts is becoming a common practice in many archives and libraries, mainly for preservation purposes. This opens many new opportunities for the diffusion of these precious cultural assets, since several scholars and researchers, as well as the general public, may access and use them for research purposes, for study, and for general information. This is made possible if the documents, their descriptions, and the result of all processing activities performed on them are acquired at a good quality and can be easily accessed by using simple and powerful retrieval mechanisms.Acquired manuscripts suffer of degradations that may require different types of elaborations on the digital images, to improve their visual quality and legibility, or to discover hidden text that is not visible. Natural Language Processing requires the creation of transcriptions of the text contained in the manuscript, as well as encoding of the document structure and creation of user annotations.This paper presents a document management system and a metadata schema that make possible the storage and content-based retrieval of original documents, elaborations performed to improve their readability, textual transcriptions, and linguistic annotations. The archive will offer the possibility of describing, storing and accessing all the available manuscript versions, document transcriptions and annotations, and to search and retrieve documents based on all this information.

Published

Sep 14, 2019

How to Cite

SAVINO, Pasquale; TONAZZINI, Anna; DEBOLE, Franca. A data model and a cataloguing, storage and retrieval system for ancient document archives. International Journal of Information Science and Technology, [S.l.], v. 3, n. 5, p. 6 - 15, sep. 2019. ISSN 2550-5114. Available at: <https://innove.org/ijist/index.php/ijist/article/view/132>. Date accessed: 12 july 2025. doi: http://dx.doi.org/10.57675/IMIST.PRSM/ijist-v3i5.132.

Citation Formats

Issue

Vol 3 No 5 (2019)

Section

Special Issue : Machine Learning and Natural Language Processing

The submitting author warrants that the submission is original and that she/he is the author of the submission together with the named co-authors; to the extend the submission incorporates text passages, figures, data or other material from the work of others, the submitting author has obtained any necessary permission.

Articles in this journal are published under the Creative Commons Attribution Licence (CC-BY). This is to get more legal certainty about what readers can do with published articles, and thus a wider dissemination and archiving, which in turn makes publishing with this journal more valuable for you, the authors.

In order for iJIST to publish and disseminate research articles, we need publishing rights. This is determined by a publishing agreement between the author and iJIST.

By submitting an article the author grants to this journal the non-exclusive right to publish it. The author retains the copyright and the publishing rights for his article without any restrictions.

Privacy Statement

The names and email addresses entered in this journal site will be used exclusively for the stated purposes of this journal and will not be made available for any other purpose or to any other party.