Text Image Extraction and Summarization

  • Neha Joshi

Abstract

with the huge amount of increase in data these
days data processing has become very important. It filters
a large amount of data. The text mining tool finds relation
between the words in text content and analyzes the results
as well. Deriving quality information from text forms the
crux of data analysis or text mining. This paper focuses on
a text mining application. Text information present in
images is recognized and is summarized according to
requirement, i.e. number of lines that text needs to
summarize is dependent on user. Text mining thus is used
to save time of the user, increase the data efficiency. It is
used to make computation on data that a human would
definitely fail to do, that is for analytics of large volumes of
data. Hence text image extraction and summarization is a
necessity in the current scenario. If the efficiency of the
proposed model is optimized, this model of Text Image
Extraction and Summarization can be very beneficial. It
can used as a ready to go , click image and get summary
application in a variety of situations. In this proposed
model, even the number of lines the content has to be
specified, thus ensuring that the extent of summarization is
completely user controlled.

Keywords: Text mining, Natural Language Processing, Optical Character Recognition, Summarization, Image Processing

References

[1] Ravina Mithe, Supriya Indalkar, Nilam Divekar , ‗Optical
Character Recognition‘ in International Journal of Recent
Technology and Engineering, Vol 2 Issue 1 march 2013.
[2] The Tesseract open source OCR engine
http://code.google.com/p/tesseract-ocr
[3] Lisa F Rau,Paul S Jacobs,Uri Zernik , ‗Information
Extraction and Text Summarisation using linguistic
knowledge acquisition‘ in Information Processing and
Management, Volume 25,Issue 4 Page No -419-428.
[4] S. B. Kotsiantis, D. Kanellopoulos and P. E. Pintelas,
‗Data Preprocessing for Supervised Leaning‘, International
Journal Of Computer Science Volume 1 Number 1 2006 ISSN
1306-4428
[5] Jonathan Webster ,Chunya Kit, ―Tokenization as Initial phase
in NLP‖,City Polytechnic of Hong Kong,in proccedings of
14th Conference on Computational Linguistics,Vol 4 page-
1106-1110
[6]A Mitthal,P Kumarguru ―,Optical Character Recognition
tool”,IIIT D
Dr. S. Vijayarani, Ms. J. Ilamathi, Ms. Nithya
,‘ Preprocessing Techniques for Text Mining - An Overview‘
in International Journal of Computer Science &
Communication Networks,Vol 5(1),7-16.
[7] Meyer, David and Hornik, Kurt and Feinerer, Ingo
(2008) Text Mining Infrastructure in R. Journal of
Statistical Software, 25 (5). pp. 1-54.
[8] Steven Bird,Edward Loper, ‗ NLTK : Natural Language
Toolkit ‗,in proceedings of
Proceedings of the ACL 2004 on Interactive poster and
demonstration sessions,Article no 31
[9] R. Smith. ―An overview of the Tesseract OCR Engine.‖
Proc 9th Int. Conf. on Document Analysis and Recognition,
IEEE, Curitiba, Brazil, Sep 2007, pp629-633.
[10] The Tesseract open source OCR engine,
http://code.google.com/p/tesseract-ocr.
[11] R.W. Smith, The Extraction and Recognition of Text
from Multimedia Document Images, PhD Thesis, University
of Bristol, November 1987.
[12] Heuristic-Based OCR Post-Correction for Smart Phone
Applications the university of North Carolina at chapel hill
department of computer science honors thesis Author: Wing-
Soon Wilson Lian 2009.
[13] Implementing Optical Character Recognition on the
Android Operating System for Business Cards By Sonia
Bhaskar, Nicholas Lavassar, Scott Green EE 368 Digital
Image Processing
Statistics
0 Views | 0 Downloads
How to Cite
Joshi, N. (2019). Text Image Extraction and Summarization. Asian Journal For Convergence In Technology (AJCT). Retrieved from http://asianssr.org/index.php/ajct/article/view/750
Section
Article