Text Image Extraction and Summarization

Neha Joshi

Text Image Extraction and Summarization

Neha Joshi

Keywords: Text mining, Natural Language Processing, Optical Character Recognition, Summarization, Image Processing

Abstract

with the huge amount of increase in data these
days data processing has become very important. It filters
a large amount of data. The text mining tool finds relation
between the words in text content and analyzes the results
as well. Deriving quality information from text forms the
crux of data analysis or text mining. This paper focuses on
a text mining application. Text information present in
images is recognized and is summarized according to
requirement, i.e. number of lines that text needs to
summarize is dependent on user. Text mining thus is used
to save time of the user, increase the data efficiency. It is
used to make computation on data that a human would
definitely fail to do, that is for analytics of large volumes of
data. Hence text image extraction and summarization is a
necessity in the current scenario. If the efficiency of the
proposed model is optimized, this model of Text Image
Extraction and Summarization can be very beneficial. It
can used as a ready to go , click image and get summary
application in a variety of situations. In this proposed
model, even the number of lines the content has to be
specified, thus ensuring that the extent of summarization is
completely user controlled.

References

[1] Ravina Mithe, Supriya Indalkar, Nilam Divekar , ‗Optical
Character Recognition‘ in International Journal of Recent
Technology and Engineering, Vol 2 Issue 1 march 2013.
[2] The Tesseract open source OCR engine
http://code.google.com/p/tesseract-ocr
[3] Lisa F Rau,Paul S Jacobs,Uri Zernik , ‗Information
Extraction and Text Summarisation using linguistic
knowledge acquisition‘ in Information Processing and
Management, Volume 25,Issue 4 Page No -419-428.
[4] S. B. Kotsiantis, D. Kanellopoulos and P. E. Pintelas,
‗Data Preprocessing for Supervised Leaning‘, International
Journal Of Computer Science Volume 1 Number 1 2006 ISSN
1306-4428
[5] Jonathan Webster ,Chunya Kit, ―Tokenization as Initial phase
in NLP‖,City Polytechnic of Hong Kong,in proccedings of
14th Conference on Computational Linguistics,Vol 4 page-
1106-1110
[6]A Mitthal,P Kumarguru ―,Optical Character Recognition
tool”,IIIT D
Dr. S. Vijayarani, Ms. J. Ilamathi, Ms. Nithya
,‘ Preprocessing Techniques for Text Mining - An Overview‘
in International Journal of Computer Science &
Communication Networks,Vol 5(1),7-16.
[7] Meyer, David and Hornik, Kurt and Feinerer, Ingo
(2008) Text Mining Infrastructure in R. Journal of
Statistical Software, 25 (5). pp. 1-54.
[8] Steven Bird,Edward Loper, ‗ NLTK : Natural Language
Toolkit ‗,in proceedings of
Proceedings of the ACL 2004 on Interactive poster and
demonstration sessions,Article no 31
[9] R. Smith. ―An overview of the Tesseract OCR Engine.‖
Proc 9th Int. Conf. on Document Analysis and Recognition,
IEEE, Curitiba, Brazil, Sep 2007, pp629-633.
[10] The Tesseract open source OCR engine,
http://code.google.com/p/tesseract-ocr.
[11] R.W. Smith, The Extraction and Recognition of Text
from Multimedia Document Images, PhD Thesis, University
of Bristol, November 1987.
[12] Heuristic-Based OCR Post-Correction for Smart Phone
Applications the university of North Carolina at chapel hill
department of computer science honors thesis Author: Wing-
Soon Wilson Lian 2009.
[13] Implementing Optical Character Recognition on the
Android Operating System for Business Cards By Sonia
Bhaskar, Nicholas Lavassar, Scott Green EE 368 Digital
Image Processing

Published

2019-04-11

How to Cite

Joshi, N. (2019). Text Image Extraction and Summarization. Asian Journal For Convergence In Technology (AJCT) ISSN -2350-1146. Retrieved from http://asianssr.org/index.php/ajct/article/view/750

Download Citation

Issue

VOL V Issue I

Section

Article

To ensure uniformity of treatment among all contributors, other forms may not be substituted for this form, nor may any wording of the form be changed. This form is intended for original material submitted to AJCT and must accompany any such material in order to be published by AJCT. Please read the form carefully.
The undersigned hereby assigns to the Asian Journal of Convergence in Technology Issues ("AJCT") all rights under copyright that may exist in and to the above Work, any revised or expanded derivative works submitted to AJCT by the undersigned based on the Work, and any associated written, audio and/or visual presentations or other enhancements accompanying the Work. The undersigned hereby warrants that the Work is original and that he/she is the author of the Work; to the extent the Work incorporates text passages, figures, data or other material from the works of others, the undersigned has obtained any necessary permission. See Retained Rights, below.

AUTHOR RESPONSIBILITIES
AJCT distributes its technical publications throughout the world and wants to ensure that the material submitted to its publications is properly available to the readership of those publications. Authors must ensure that The Work is their own and is original. It is the responsibility of the authors, not AJCT, to determine whether disclosure of their material requires the prior consent of other parties and, if so, to obtain it.

RETAINED RIGHTS/TERMS AND CONDITIONS
1. Authors/employers retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
2. Authors/employers may reproduce or authorize others to reproduce The Work and for the author's personal use or for company or organizational use, provided that the source and any AJCT copyright notice are indicated, the copies are not used in any way that implies AJCT endorsement of a product or service of any employer, and the copies themselves are not offered for sale.
3. Authors/employers may make limited distribution of all or portions of the Work prior to publication if they inform AJCT in advance of the nature and extent of such limited distribution.
4. For all uses not covered by items 2 and 3, authors/employers must request permission from AJCT.
5. Although authors are permitted to re-use all or portions of the Work in other works, this does not include granting third-party requests for reprinting, republishing, or other types of re-use.

INFORMATION FOR AUTHORS
AJCT Copyright Ownership
It is the formal policy of AJCT to own the copyrights to all copyrightable material in its technical publications and to the individual contributions contained therein, in order to protect the interests of AJCT, its authors and their employers, and, at the same time, to facilitate the appropriate re-use of this material by others.
Author/Employer Rights
If you are employed and prepared the Work on a subject within the scope of your employment, the copyright in the Work belongs to your employer as a work-for-hire. In that case, AJCT assumes that when you sign this Form, you are authorized to do so by your employer and that your employer has consented to the transfer of copyright, to the representation and warranty of publication rights, and to all other terms and conditions of this Form. If such authorization and consent has not been given to you, an authorized representative of your employer should sign this Form as the Author.
Reprint/Republication Policy
AJCT requires that the consent of the first-named author and employer be sought as a condition to granting reprint or republication rights to others or for permitting use of a Work for promotion or marketing purposes.

GENERAL TERMS

1. The undersigned represents that he/she has the power and authority to make and execute this assignment.
2. The undersigned agrees to indemnify and hold harmless AJCT from any damage or expense that may arise in the event of a breach of any of the warranties set forth above.
3. In the event the above work is accepted and published by AJCT and consequently withdrawn by the author(s), the foregoing copyright transfer shall become null and void and all materials embodying the Work submitted to AJCT will be destroyed.
4. For jointly authored Works, all joint authors should sign, or one of the authors should sign as authorized agent
for the others.

Licenced by :

Creative Commons Attribution 4.0 International License.

Text Image Extraction and Summarization

Abstract

References

Most read articles by the same author(s)