Application of data mining techniques for avoiding underestimation of an event

  • Avijit Kumar Chaudhuri
  • Dr. Anirban Das
  • Dr. Deepankar Sinha
  • Dr. Dilip K. Banerjee
Keywords: Data mining techniques; cervical cancer; Type 1 and Type 2 errors; integrated-approach; under estimation


Medical records comprise varied data types; artificial intelligence and data-mining methods (DMTs) are useful to draw insights and patterns. Several scholars claim that there is no universal way of addressing diagnosis issues, and a mixed model is desirable to resolve these concerns. In this paper, the authors compare the proven approaches and propose a framework to integrate the findings from various techniques to evade Type 2 and Type 1 errors. The dataset chosen for this purpose includes medical data on HPV disease. Two sets of dataset – disease and treatment dataset and features found significant from ensemble method – the random forest were used and to predict the disease. The results show that traditional methods such as Logistic Regression(LR) performed better with features found significant using  Random Forest(RF). However, this approach fails when the dichotomy of data (i.e., disease or no disease) is not distinct. Decision Tree(DT) analysis shows consistent performance across all variants of the dataset chosen in this paper. The paper suggests an amalgamation of association rules and a prediction approach (with or without integration) that provides higher accuracy. 


[1] Liao, S. H., Chu, P. H., & Hsiao, P. Y., 2012. Data mining techniques and applica-tions–A decade review from 2000 to 2011. Expert systems with applications, 39(12), 11303-11311.
[2] Jothi, N., & Husain, W., 2015. Data mining in healthcare–a review. Procedia com-puter science, 72, 306-313.
[3] Koutsky, L., 1997. Epidemiology of genital human papillomavirus infection. The American journal of medicine, 102(5), 3-8.
[4] Gabbey. A. E., Jacquelyn. C. Human Papillomavirus Infection Medically reviewed by Debra Rose Wilson, PhD, MSN, RN, IBCLC, AHN-BC, CHT, 2017.
[5] Shouman, M., Turner, T., & Stocker, R., 2012, March. Using data mining tech-niques in heart disease diagnosis and treatment. In 2012 Japan-Egypt Conference on Electronics, Communications and Computers (pp. 173-177). IEEE.
[6] Sankaranarayanan, R., & Ferlay, J., 2006. Worldwide burden of gynaecological cancer: the size of the problem. Best practice & research Clinical obstetrics & gynaecolo-gy, 20(2), 207-225.
[7] WHO 2007 WHO/ICO Information Centre on HPV and Cervical Cancer (HPV Information Centre). Summary report on HPV and cervical cancer statistics in India 2007. [Last Assessed on 2008 May 1]. Available from: .
[8] Gribskov, M., McLachlan, A. D., & Eisenberg, D., 1987. Profile analysis: detec-tion of distantly related proteins. Proceedings of the National Academy of Scienc-es, 84(13), 4355-4358.
How to Cite
Chaudhuri, A. K., Das, D. A., Sinha, D. D., & Banerjee, D. D. K. (2021). Application of data mining techniques for avoiding underestimation of an event. Asian Journal For Convergence In Technology (AJCT) ISSN -2350-1146, 7(1), 179-189.

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.