Early prediction of heart disease using the most significant features of diabetes by machine learning techniques

  • Avijit Kumar Chaudhuri
  • Dr. Anirban Das
  • Dr. Deepankar Sinha
  • Dr. Dilip K. Banerjee


Medical science is witnessing high levels of specialization with doctors specializing in specific areas, say, heart disease (HD), diabetes, nephrology, and the like. In the process, patients have to make multiple visits for treatment of simultaneous ailments. Studies show that there is overlap in causes of different diseases. One such co-existence is observed in patients with diabetes suffering from HD too. In many cases, one precedes the other. Hence, it is worth diagnosing that a patient having a particular ailment is likely to develop another. Artificial Intelligence and machine learning methods are widely used in healthcare.  There are few references to such work using data mining approaches. HD is a primary cause of death worldwide. Studies show that diabetes patients also have HD. This paper aims to identify the association and common risk factors between diabetes and HD - this finding aid in anticipating the HD of a diabetic patient. The authors use proven data mining approaches - logistics regression, decision tree, and random forest to arrive at the most accurate results. The validation is done using unsupervised method: K-means Clustering. The initial investigation demonstrates that body-mass-index (BMI) and age are among the key risk factors for diabetes; and smoking habit, age, gender-male and diabetes (glucose level) lead to HD. 31% of diabetic patients had HD.

Keywords: Data mining, heart disease, diabetes, decision tree, random forest, logistic regression


Download data is not yet available.


[1] Raghupathi, W., &Raghupathi, V., 2014. Big data analytics in healthcare: promise and potential, Health information science and systems, 2(1), 3.
[2] Tomar D., Agarwal S. , 2013. A survey on Data Mining approaches for Healthcare, Int. J. Bio-Sci. Bio-Technol , 5:241–266.
[3] Yoo I., Alafaireet P., Marinov M., Pena-Hernandez K., Gopidi R., Chang J.-F., Hua L., , 2012. Data mining in healthcare and biomedicine: A survey of the literature, J. Med. Syst., 36:2431–2448.
[4] Crockett, D. &Eliason B., 2017. What is Data Mining in Healthcare, Health Catalyst.
[5] http://www.heart.org/HEARTORG/Conditions/More/Diabetes/WhyDiabetesMatters/%20 Cardiovascular - Disease Diabetes_UCM_313865_Article.jsp#.XRddylQzbIU
[6] Engelgau MM., Geiss LS., Saaddine JB., Boyle JP., Benjamin SM., Gregg EW., Tierney EF., Rios-Burrows N., Mokdad AH., Ford ES., Imperatore G., Narayan KM., 2004. The evolving diabetes burden in the United States, Ann Intern Med 140:945–950.
[7] Haffner SM., Lehto S., Ronnemaa T., Pyorala K., Laakso M., 1998. Mortality from coronary Heart disease in subjects with type 2 diabetes and in non diabetic subjects with and without prior myocardial infarction, N Engl J Med 339:229–234.
[8] Hu FB., Stampfer MJ., Solomon CG., Liu S., Willett WC., Speizer FE., Nathan DM., Manson JE., 2001. The impact of diabetes mellitus on mortality from all causes and coronary Heart disease in women: 20 years of follow-up, Arch Intern Med 161:1717–1723.
[9] Fox CS., Coady S., Sorlie PD., Levy D., Meigs JB., D’Agostino RB Sr., Wilson, PW., Savage PJ., 2004. Trends in cardiovascular complications of diabetes, JAMA 292:2495–2499.
[10] Mokdad AH., Ford ES., Bowman BA., Dietz WH., Vinicor F., Bales VS., Marks JS. , 2003. Prevalence of obesity, diabetes, and obesity related health risk factors, JAMA 289:76 –79.
0 Views | 0 Downloads
How to Cite
Chaudhuri, A. K., Das, D. A., Sinha, D. D., & Banerjee, D. D. K. (2021). Early prediction of heart disease using the most significant features of diabetes by machine learning techniques. Asian Journal For Convergence In Technology (AJCT), 7(1), 168-178. Retrieved from https://asianssr.org/index.php/ajct/article/view/1063