Performance Analysis of Dimensionality Reduction Techniques in Cancer Detection using Microarray Data

  • Swati B. Bhonde
  • Dr. Jayashree R. Prasad
Keywords: Cancer prediction, deep learning, dimensionality reduction, precision medicine, gene expressions


Cancer is one of the major causes of deaths worldwide. This disease is more ghastly as it doesn’t announce itself until it reaches in an advance stage. Still, mortality rate for cancer can be decreased if we diagnose & provide treatment at earliest. Though there are traditional clinical trials to predict a cancer there does not a single test which can correctly identify this disease. In the recent years DNA Microarray technology has been significantly used to analyze & predict the cancer. Analysis of gene expressions is not only interesting but also challenging as it is not only the concern of accuracy but also matter of life or death of a patient. DNA Microarray data is high dimensional, noisy & redundant, it makes task of classification more complicated as high computational cost is involved. Therefore feature selection & feature reduction becomes important task prior to classification. This paper presents comparative performance analysis of different dimensionality reduction techniques implemented on TCGA PANCANCER dataset.


[1] Globocan, “New Global Cancer Data.” cancer-data.
[2] M. A. Hambali, T. O. Oladele, and K. S. Adewole, “Microarray cancer feature selection: Review, challenges and research directions,” Int. J. Cogn. Comput. Eng., vol. 1, no. October, pp. 78–97, 2020, doi: 10.1016/j.ijcce.2020.11.001.
[3] A. Bhola and S. Singh, “Visualisation and Modelling of High-Dimensional Cancerous Gene Expression Dataset,”
J. Inf. Knowl. Manag., vol. 18, no. 1, 2019, doi: 10.1142/S0219649219500011.
[4] M. O. Arowolo, M. O. Adebiyi, A. A. Adebiyi, and O. J. Okesola, “A Hybrid Heuristic Dimensionality Reduction Methods for Classifying Malaria Vector Gene Expression Data,” IEEE Access, vol. 8, pp. 182422–182430, 2020, doi: 10.1109/access.2020.3029234.
[5] C. S. Kong, J. Yu, F. C. Minion, and K. Rajan, “Identification of biologically significant genes from combinatorial microarray data,” ACS Comb. Sci., vol. 13, no. 5, pp. 562–571, 2011, doi: 10.1021/co200111u.
[6] S. Khalid, T. Khalil, and S. Nasreen, “A survey of feature selection and feature extraction techniques in machine learning,” Proc. 2014 Sci. Inf. Conf. SAI 2014, pp. 372– 378, 2014, doi: 10.1109/SAI.2014.6918213.
[7] F. Song, D. Mei, and H. Li, “Feature selection based on linear discriminant analysis,” Proc. - 2010 Int. Conf. Intell. Syst. Des. Eng. Appl. ISDEA 2010, vol. 1, pp. 746– 749, 2010, doi: 10.1109/ISDEA.2010.311.
[8] E. Pamukçu, H. Bozdogan, and S. Çalik, “A novel hybrid dimension reduction technique for undersized high dimensional gene expression data sets using information complexity criterion for cancer classification,” Comput. Math. Methods Med., vol. 2015, 2015, doi: 10.1155/2015/370640.
[9] F. Rafii, B. D. R. Hassani, and M. A. Kbir, “New approach for microarray data decision making with respect to multiple sources,” ACM Int. Conf. Proceeding Ser., vol. Part F1294, 2017, doi: 10.1145/3090354.3090463.
[10] J. Taveira De Souza, A. Carlos De Francisco, and D. C. De Macedo, “Dimensionality Reduction in Gene Expression Data Sets,” IEEE Access, vol. 7, pp. 61136– 61144, 2019, doi: 10.1109/ACCESS.2019.2915519.
[11] Adiwijaya, U. N. Wisesty, E. Lisnawati, A. Aditsania, and
D. S. Kusumo, “Dimensionality reduction using Principal Component Analysis for cancer detection based on microarray data classification,” J. Comput. Sci., vol. 14, no. 11, pp. 1521–1530, 2018, doi: 10.3844/jcssp.2018.1521.1530.
[12] A. Antoniadis, S. Lambert-Lacroix, and F. Leblanc, “Effective dimension reduction methods for tumor classification using gene expression data,” Bioinformatics, vol. 19, no. 5, pp. 563–570, 2003, doi: 10.1093/bioinformatics/btg062.
[13] A. Lopez-Rincon, M. Martinez-Archundia, G. U. Martinez-Ruiz, A. Schoenhuth, and A. Tonda, “Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection,” BMC Bioinformatics, vol. 20, no. 1, pp. 1–17, 2019, doi: 10.1186/s12859-019-3050-8.
[14] Y. Zhang, Q. Deng, W. Liang, and X. Zou, “An Efficient Feature Selection Strategy Based on Multiple Support Vector Machine Technology with Gene Expression Data,” Biomed Res. Int., vol. 2018, 2018, doi: 10.1155/2018/7538204.
[15] O. Rehman, H. Zhuang, A. M. Ali, A. Ibrahim, and Z. Li,

“Validation of miRNAs as breast cancer biomarkers with a machine learning approach,” Cancers (Basel)., vol. 11, no. 3, pp. 1–10, 2019, doi: 10.3390/cancers11030431.
[16] UCI, “TCGA Pancancer dataset,” [Online]. Available: ancer+RNA-Seq#.
[17] W. Astuti and Adiwijaya, “Support vector machine and principal component analysis for microarray data classification,” J. Phys. Conf. Ser., vol. 971, no. 1, 2018, doi: 10.1088/1742-6596/971/1/012003.
How to Cite
Bhonde, S. B., & Prasad, D. J. R. (2021). Performance Analysis of Dimensionality Reduction Techniques in Cancer Detection using Microarray Data. Asian Journal For Convergence In Technology (AJCT) ISSN -2350-1146, 7(1), 53-57.

Most read articles by the same author(s)

Obs.: This plugin requires at least one statistics/report plugin to be enabled. If your statistics plugins provide more than one metric then please also select a main metric on the admin's site settings page and/or on the journal manager's settings pages.