Application of Ensemble Learning in Predicting Lung Cancer Based on Patient Data

M. Alfa Rizy; Ahmad  Fauzi; Siti Nurhaliza

Authors

M. Alfa Rizy

Universitas Indonesia

Author
Ahmad Faruzi

Universitas Indonesia

Author
Siti Nurhaliza

Universitas Indonesia

Author

Keywords:

Lung cancer prediction, Ensemble learning, Machine learning, Risk factors, Healthcare analytics

Abstract

Lung cancer is one of the leading causes of cancer-related mortality worldwide, necessitating early detection and accurate risk prediction to improve patient outcomes. This study investigates the application of ensemble learning techniques for predicting lung cancer based on patient data, including demographic, lifestyle, and clinical variables such as age, gender, smoking habits, oxygen saturation levels, and family history. The dataset was preprocessed to handle missing values, encode categorical variables, and normalize numerical features. Three ensemble learning models—Random Forest, Gradient Boosting, and XGBoost—were trained and evaluated using performance metrics such as accuracy, precision, recall, F1-score, and AUC-ROC. Among these, XGBoost achieved the highest accuracy of 93%, with an AUC-ROC of 95%, demonstrating its robustness and generalizability. Feature importance analysis identified oxygen saturation, smoking habits, family history, and exposure to pollution as the most significant predictors of lung cancer risk. The findings highlight the potential of ensemble learning in enhancing predictive accuracy for healthcare applications. Future work should focus on expanding the dataset and integrating these models into clinical decision-support systems to support personalized healthcare delivery.

Information

Application of Ensemble Learning in Predicting Lung Cancer Based on Patient Data

Authors

M. Alfa Rizy

Ahmad Faruzi

Siti Nurhaliza

Keywords:

Abstract

Downloads

Published

Issue

Section

How to Cite

Most read articles by the same author(s)

Similar Articles