Application of Ensemble Learning in Predicting Lung Cancer Based on Patient Data
Keywords:
Lung cancer prediction, Ensemble learning, Machine learning, Risk factors, Healthcare analyticsAbstract
Lung cancer is one of the leading causes of cancer-related mortality worldwide, necessitating early detection and accurate risk prediction to improve patient outcomes. This study investigates the application of ensemble learning techniques for predicting lung cancer based on patient data, including demographic, lifestyle, and clinical variables such as age, gender, smoking habits, oxygen saturation levels, and family history. The dataset was preprocessed to handle missing values, encode categorical variables, and normalize numerical features. Three ensemble learning models—Random Forest, Gradient Boosting, and XGBoost—were trained and evaluated using performance metrics such as accuracy, precision, recall, F1-score, and AUC-ROC. Among these, XGBoost achieved the highest accuracy of 93%, with an AUC-ROC of 95%, demonstrating its robustness and generalizability. Feature importance analysis identified oxygen saturation, smoking habits, family history, and exposure to pollution as the most significant predictors of lung cancer risk. The findings highlight the potential of ensemble learning in enhancing predictive accuracy for healthcare applications. Future work should focus on expanding the dataset and integrating these models into clinical decision-support systems to support personalized healthcare delivery.
Downloads
Published
Issue
Section
How to Cite
Most read articles by the same author(s)
- M. Alfa Rizy, Application of Ensemble Learning in Predicting Lung Cancer Based on Patient Data , A Journal of Thoracic Oncology: Vol. 2 No. 1 (2025): JOTI JOURNA OF ONCOLOGY THORACIC INDONESIA