PREPROCESSING OF HISTORICAL MANUSCRIPTS USING PHASE CONGRUENCY FEATURES AND GAUSSIAN MIXTURE MODEL
Epigraphs are important sources of reshaping our history and culture. It is the necessity of the day to preserve them for the use of future generation. Modern paleographers find it difficult to decipher the information in the epigraphs these days because of number of reasons. It is due to the erosion of document material over the period of time, due to the existence of different types of noises and unknown character sets of ancient time. To read the information in these types of documents, first characters have to be extracted. Here, we are proposing a model for the extraction of characters through binarization and removal of background noise. This consists of phase feature based preprocessing and Gaussian model based background elimination using expectation maximization (EM) algorithm. Enhancement and preprocessing are carried out using different types of specialized filters which help in character extraction. Two phase features namely weighted mean phase angle (PAmean) and maximum moment of phase congruency covariance (PCCmax) are calculated to differentiate the foreground from the background. EM algorithm removes the background noise completely where foreground characters are untouched. Image is binarized by considering the phase coherence based algorithms and finally background elimination is done. Proposed algorithm is tested on different historical documents and experimental results show the robustness of proposed method on various inscriptions. Results obtained are matched with many of the classical algorithms currently in existence.
inscription, binarization, phase congruency, expectation maximization.