Credit Card Default Prediction - (App_Link)
A comprehensive machine learning project to predict credit card default payments using various classification algorithms. This project implements an end-to-end pipeline from data ingestion to model deployment using best practices in MLOps.
- Project Overview
- Dataset Information
- Features
- Project Structure
- Installation
- Usage
- Model Performance
- Web Application
- Technologies Used
- Contributing
In an increasingly dynamic financial landscape, XYZ Financial Services faces the critical challenge of accurately predicting credit risk. This project develops a predictive model that estimates the probability of credit default based on credit card owners' characteristics such as age, gender, education, marital status, credit limit, and payment history.
Enable XYZ Financial Services to:
- Identify high-risk credit clients
- Tailor risk mitigation strategies
- Adjust credit limits appropriately
- Offer targeted financial counseling
- Reduce default rates and improve portfolio health
- Source: UCI Machine Learning Repository (Data Source Link)
- Dataset: UCI_Credit_Card.csv
- Size: 30,000 instances with 25 features
- Target Variable:
default.payment.next.month(Binary: 1 = default, 0 = no default) - Time Period: April 2005 to September 2005 (Taiwan)
- Demographic: Age, Gender, Education, Marital Status
- Credit Information: Credit Limit Balance
- Repayment Status: Past 6 months payment status (PAY_0 to PAY_6)
- Bill Statements: Past 6 months bill amounts (BILL_AMT1 to BILL_AMT6)
- Payment Amounts: Past 6 months payment amounts (PAY_AMT1 to PAY_AMT6)
- Data Pipeline: Automated data ingestion, transformation, and preprocessing
- Model Training: Multiple algorithm comparison and hyperparameter tuning
- Model Evaluation: Comprehensive performance metrics and visualization
- Web Application: Interactive Streamlit app for real-time predictions
- Logging: Comprehensive logging system for debugging and monitoring
- Exception Handling: Robust error handling throughout the pipeline
- Modular Architecture: Clean, maintainable, and scalable codebase
Credit Card Default Prediction/
βββ artifacts/ # Artifacts folder stores all the outputs of ML pipeline
β βββ credit_data.csv
β βββ model.pkl
β βββ preprocessor.pkl
β βββ test.csv
β βββ train.csv
βββ config/ # Configuration files
β βββ model.yaml
βββ logs/ # Application logs
βββ notebooks/ # Jupyter notebooks for analysis
β βββ 1-exploratory_data_analysis-EDA.ipynb
β βββ 2-data_preprocessing.ipynb
β βββ 3-model_training_and_evaluation.ipynb
β βββ csv_outputs/ # Model performance results
β βββ datasets/ # Raw dataset
β βββ feature_importance_outputs/ # Feature importance plots
β βββ test_performance_outputs/ # Test performance visualizations
β βββ validation_performance_outputs/ # Validation performance visualizations
βββ src/ # Source code
β βββ components/ # Core components
β β βββ data_ingestion.py
β β βββ data_transformation.py
β β βββ model_trainer.py
β βββ pipeline/ # Training and prediction pipelines
β β βββ train_pipeline.py
β β βββ predict_pipeline.py
β βββ constant/ # Constants
β βββ utils/ # Utility functions
β βββ exception.py # Custom exception handling
β βββ logger.py # Logging configuration
βββ streamlit_app.py # Web application
βββ upload_data.py # Upload data into MongoDB
βββ requirements.txt # Dependencies
βββ setup.py # Package setup
βββ README.md # Project documentation
- Clone the repository:
git clone https://github.com/vineet416/Credit_Card_Default_Prediction.git
cd Credit_Card_Default_Prediction- Create a virtual environment:
conda create -p venv python==3.12 -y
conda activate venv/- Install dependencies and package:
pip install -r requirements.txt- Run the training pipeline:
from src.pipeline.train_pipeline import TrainPipeline
# Initialize and run training pipeline
pipeline = TrainPipeline()
pipeline.run_pipeline()- Or run via command line:
python src/pipeline/train_pipeline.pyLaunch the Streamlit web application:
streamlit run streamlit_app.pyThe web app provides an intuitive interface for:
- Input credit card holder information
- Real-time default probability prediction
- Interactive feature input with explanations
The project evaluates multiple machine learning algorithms:
| Model | ROC AUC | Precision | Recall | F1 Score | Accuracy |
|---|---|---|---|---|---|
| Random Forest | 0.777 | 0.57 | 0.483 | 0.523 | 0.805 |
| Gradient Boosting | 0.759 | 0.585 | 0.41 | 0.483 | 0.805 |
| XGBoost | 0.758 | 0.504 | 0.515 | 0.509 | 0.781 |
| K-Nearest Neighbors | 0.693 | 0.375 | 0.521 | 0.436 | 0.702 |
Best Model: Random Forest with the highest ROC AUC score of 0.777 and F1 Score of 0.523.
- Random Forest provides the best balance of precision and recall
- All models achieve good Areas under the ROC curve around 0.7
- Feature importance analysis reveals payment history as the most predictive factor
π Web Application - (App_Link)
The Streamlit web application includes:
- User-friendly Interface: Intuitive input fields for all features
- Real-time Predictions: Instant default probability calculation
- Feature Explanations: Detailed descriptions of input parameters
- Interactive Visualizations: Dynamic charts and plots
- Responsive Design: Works on desktop and mobile devices
- Basic information input (age, gender, education, marital status)
- Credit limit and payment history tracking
- Bill amounts and payment amounts for past 6 months
- Instant prediction results with probability scores
- Visualizations of top 5 features influencing the prediction
- Programming Language: Python 3.12+
- Machine Learning: scikit-learn, XGBoost, imbalanced-learn
- Data Processing: pandas, numpy
- Visualization: matplotlib, seaborn
- Web Framework: Streamlit
- Configuration: PyYAML
- Database: pymongo (MongoDB)
- Development: Jupyter Notebooks
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Vineet Patel
- Email: vineetpatel468@gmail.com
- GitHub: @vineet416
- LinkedIn: @vineet416
- UCI Machine Learning Repository for providing the dataset
- The open-source community for the amazing tools and libraries
- Streamlit for the web application framework and ease of deployment
β If you found this project helpful, please give it a star!