Virtual restoration and content analysis of ancient degraded manuscripts

By Anna Tonazzini, Pasquale Savino, Emanuele Salerno, Muhammad Hanif, Franca Debole

Abstract


In recent years, extensive campaigns of digitization of the documental heritage conserved in libraries and archives have been performed, with the primary goal to ensure the
preservation and fruition of this important part of the human cultural and historical patrimony. Besides protecting conservation, the availability of high quality digital copies has increasingly stimulated the use of image processing techniques, to perform a number of
operations on documents and manuscripts, without harming the often precious and fragile originals. Among those, virtual restoration tasks are crucial, as they facilitate the traditional work of philologists and paleographers, and constitute a first step towards an automatic analysis of the written contents. Here we report our experience in this field, referring, as a case study, to the problem of removing one of the most
frequent and impairing degradations affecting ancient manuscripts, i.e., the bleed-through distortion. We show that techniques of blind source separation are versatile tools to either cancel these unwanted interferences or isolate specific features for content analysis goals. Specialized algorithms, based on recto-verso models and sparse image representation, are then shown to be able to perform a fine and selective removal of the degradation, while preserving the original appearance of the manuscript.


Full Text:

PDF

References


C. Brockmann, M. Friedrich, O. Hahn, B. Neumann, and I. Rabin, Eds.,

Natural Sciences and Technology in Manuscript Studies, ser. Manuscript Cultures. Hamburg: University of Hamburg, 2014, vol. 7.

E. Dubois and A. Pathak, “Reduction of bleed-through in scanned

manuscript documents,” in Proc. IS&T Image Processing, Image Quality, Image Capture Systems Conference, 2001, pp. 177–180.

Q. Wang and C. L. Tan, “Matching of double-sided document images to remove interference,” in Proc. IEEE CVPR 2001, 2001, p. 1084.

J. Wang, M. S. Brown, and C. L. Tan, “Accurate alignment of doublesided manuscripts for bleed-through removal,” in Proc. 8-th IAPR Workshop on Document Analysis Systems, 2008, pp. 69–75.

A. Tonazzini, G. Bianco, and E. Salerno, “Registration and enhancement of double-sided degraded manuscripts acquired in multispectral modality,” in Proc. 10th International Conference on Document Analysis and Recognition ICDAR 2009, 2009, pp. 546 – 550.

V. Rabeux, N. Journet, and J. P. Domenger, “Document recto-verso

registration using a dynamic time warping algorithm,” in Proc. Int. Conf. on Document Analysis and Recognition (ICDAR), 2011, pp. 1230–1234.

B. Li, W. Wang, and H. Ye, “Multi-sensor image registration based on algebraic projective invariants,” Optics express, vol. 21, pp. 9824–9838, 2013.

A. Myronenko and S. Xubo, “Intensity-based image registration by minimizing residual complexity,” IEEE Transactions on Medical Imaging, vol. 29, p. 18821891, 2010.

J. Wang and C. L. Tan, “Non-rigid registration and restoration of doublesided historical manuscripts,” in Proc. Int. Conf. on Document Analysis and Recognition (ICDAR), 2011, p. 13741378.

R. Rowley-Brooke, F. Piti, and A. Kokaram, “Nonrigid recto-verso

registration using page outline structure and content preserving warps,” in Proc. 2nd International Workshop on Historical Document Imaging and Processing, HIP 2013, 2013, p. 813.

P. Savino and A. Tonazzini, “Digital restoration of ancient color

manuscripts from geometrically misaligned recto-verso pairs,” Journal

of Cultural Heritage, vol. 19, pp. 511–521, 2016.

D. Fadoua, F. L. Bourgeois, and H. Emptoz, “Restoring ink bleedthrough degraded document images using a recursive unsupervised classification technique,” Document Analysis Systems VII, Lecture Notes in Computer Science, vol. 3872. Springer, pp. 27–38, 2006.

C. Wolf, “Document ink bleed-through removal with two hidden markov random fields and a single observation field,” IEEE Trans. Pattern Anal. Mach. Intell., pp. 431–447, 2010.

B. Sun, S. Li, X. P. Zhang, and J. Sun, “Blind bleed-through removal

for scanned historical document image with conditional random fields,” IEEE Trans. Image Process., pp. 5702–5712, 2016.

B. Ophir and D. Malah, “Show-through cancellation in scanned images using blind source separation techniques,” in Proc. Int. Conf. on Image Processing ICIP, vol. III, 2007, pp. 233–236.

G. A. Hanasusanto, Z. Wu, and M. S. Brown, “Ink-bleed reduction using functional minimization,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2010, pp. 825–832.

Y. Huang, M. S. Brown, and D. Xu, “User assisted ink-bleed reduction,” IEEE Transactions on Image Processing, vol. 19, no. 10, pp. 2646–2658, 2010.

R. F. Moghaddam and M. Cheriet, “A variational approach to degraded document enhancement,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 32, no. 8, pp. 1347–1361, 2010.

R. Rowley-Brooke and A. Kokaram, “Bleed-through removal in degraded documents,” Proc. SPIE 8297 Document Recognition and Retrieval XIX, 82970T-10, 2012.

F. Merrikh-Bayat, M. Babaie-Zadeh, and C. Jutten, “Using non-negative matrix factorization for removing show-through,” in Proc. LVA/ICA, 2010, pp. 482–489.

F. Martinelli, E. Salerno, I. Gerace, and A. Tonazzini, “Non-linear model and constrained ml for removing back-to-front interferences from rectoverso documents,” Pattern Recognition, vol. 45, pp. 596–605, 2012.

E. Salerno, F. Martinelli, and A. Tonazzini, “Nonlinear model identification and seethrough cancellation from recto-verso data,” Int. J. on Document Analysis and Recognition, vol. 16, pp. 177–187, 2013.

R. Rowley-Brooke, F. Piti, and A. Kokaram, “A non-parametric framework for document bleed-through removal,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2013, pp.

–2960.

A. Cichocki and S. Amari, Adaptive Blind Signal and Image Processing. New York: Wiley, 2002.

A. Hyv¨arinen, J. Karhunen, and E. Oja, Independent Component Analysis. New York: Wiley, 2001.

A. Tonazzini, E. Salerno, M. Mochi, and L. Bedini, “Blind source

separation techniques for detecting hidden texts and textures in document images,” in Proc. International Conference on Image Analysis and Recognition ICIAR 2004, 2004, pp. 241–248.

A. Tonazzini, L. Bedini, and E. Salerno, “Independent component

analysis for document restoration,” Int. Journal on Document Analysis

and Recognition, vol. 7, pp. 17–27, 2004.

E. Salerno, A. Tonazzini, and L. Bedini, “Digital image analysis to

enhance underwritten text in the archimedes palimpsest,” Int. Journal

on Document Analysis and Recognition, vol. 9, pp. 79–87, April 2007.

A. Tonazzini, E. Salerno, and L. Bedini, “Fast correction of bleedthrough distortion in grayscale documents by a blind source separation

technique,” Int. Journal on Document Analysis and Recognition, vol. 10, pp. 17–25, June 2007.

G. Sharma, “Show-through cancellation in scans of duplex printed

documents,” IEEE Tans. Image Processing, vol. 10, no. 5, pp. 736–754,

A. Tonazzini, P. Savino, and E. Salerno, “A non-stationary density model to separate overlapped texts in degraded documents,” Signal, Image and Video Processing, vol. 9, pp. 155–164, 2015.

T. Ogawa and M. Haseyama, “Image inpainting based on sparse representations with a perceptual metric,” EURASIP J. Adv. Signal Process., vol. 179, pp. 1200–1212, 2013.

M. Hanif, A. Tonazzini, P. Savino, and E. Salerno, “Sparse representation based inpainting for the restoration of document images affected by bleed-through,” Proceedings MDPI, vol. 2, p. 93, 2018.

J. Sauvola and M. Pietik¨ainen, “Adaptive document image binarization,” Pattern Recognition, vol. 33, p. 225236, 2000.