Título: A Preprocessing and Analyzing Method of Images in PDF Documents for Mathematical Expression Retrieval
Autores: Tian, Xuedong; Hebei University
Yu, Botao; Hebei University
Sun, Jing; Hebei University
Fecha: 2013-12-29
Publicador: TELKOMNIKA: Indonesian journal of electrical engineering
Fuente:
Tipo: info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Tema: No aplica
Descripción: PDF documents are the important information resources for a mathematical expression retrieval system. As a major component of PDF documents, the image objects must be converted to coded form with the help of character recognition and document analysis technology firstly for content based searching. Therefore, the quality of these images becomes the key factor which decides the correctness in this conversion process. Considering the characteristics of PDF images and mathematical expressions, a preprocessing and analyzing method was proposed which includes the modules of PDF image extraction, graying, binarization, denoising, skew correction and layout parameter detection. The features of mathematical expressions were adequately considered to avoid the information loss in image converting process and the adverse interference both to the analysis and correction process resulted from formulas. The experimental results show that the method is effective in improving the accuracy and efficiency of document image recognition, analysis and retrieval.
Idioma: No aplica