Predict whether a patient is at risk of heart disease using medical attributes and machine learning. The goal is to analyze patient health data and build a model that can assist in early detection of heart disease risk.
Heart disease dataset based on the UCI Heart Disease dataset.
File used:
heart_disease_uci.csv
Features used:
age– Age of the patientsex– Gender of the patientcp– Chest pain typetrestbps– Resting blood pressurechol– Serum cholesterol levelfbs– Fasting blood sugarrestecg– Resting electrocardiographic resultsthalch– Maximum heart rate achievedexang– Exercise induced anginaoldpeak– ST depression induced by exerciseslope– Slope of the peak exercise ST segmentca– Number of major vessels colored by fluoroscopythal– Thalassemia
Target variable:
-
target- 0 → No Heart Disease
- 1 → Heart Disease Present
- Python
- Pandas
- Matplotlib
- Seaborn
- Scikit-learn
- Loaded the heart disease dataset using Pandas.
- Explored dataset structure and summary statistics.
- Handled missing values in numerical and categorical columns.
- Encoded categorical variables using LabelEncoder.
- Converted the target column into binary classification.
- Performed Exploratory Data Analysis (EDA) using visualizations.
- Split the dataset into training and testing sets.
- Standardized numerical features using StandardScaler.
- Trained a Decision Tree Classifier.
- Evaluated the model using accuracy, confusion matrix, classification report, and ROC curve.
- Identified important features affecting predictions.
- Visualized the trained decision tree.
- Allowed real-time patient input for prediction.
- Decision Tree Classifier
Key parameters:
max_depth = 5min_samples_leaf = 3class_weight = balancedrandom_state = 42
- The model successfully classifies patients into heart disease risk or no risk.
- Evaluation metrics such as accuracy, confusion matrix, and ROC-AUC provide insight into model performance.
- Feature importance analysis highlights the most influential medical attributes.
- Visualization of the decision tree helps understand the model's decision-making process.
- Certain health indicators like chest pain type, cholesterol level, and maximum heart rate strongly influence heart disease prediction.
- Decision Trees provide interpretable results, which is useful in healthcare applications.
- Machine learning can assist doctors by providing early risk prediction.
- main.py – Complete Python implementation of the model
- heart_disease_uci.csv – Dataset used for training and testing
- README.md – Project documentation
This output shows the total number of rows and columns in the dataset after loading it into the program.
It helps understand the size and structure of the dataset used for training the machine learning model.
This section displays statistical information about the dataset including:
- Mean
- Standard deviation
- Minimum and maximum values
- Quartiles
It helps understand the distribution and range of medical attributes in the dataset.
This graph shows the distribution of patients with and without heart disease. It helps understand the balance of classes in the dataset.
The heatmap shows correlations between numerical features in the dataset. It helps identify relationships between medical attributes.
This section shows the evaluation results of the trained machine learning model including:
- Accuracy Score
- Precision
- Recall
- F1-score
These metrics help measure how well the model predicts heart disease risk.
The confusion matrix evaluates the classification results by showing:
- True Positives
- True Negatives
- False Positives
- False Negatives
The ROC curve illustrates the model’s ability to distinguish between classes. A higher AUC score indicates better model performance.
This visualization shows which features have the highest impact on heart disease prediction according to the decision tree model.
This diagram represents the structure of the trained Decision Tree model and how it makes classification decisions.