Skip to content

vineet416/Credit_Card_Default_Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

75 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Credit Card Default Prediction - (App_Link)

A comprehensive machine learning project to predict credit card default payments using various classification algorithms. This project implements an end-to-end pipeline from data ingestion to model deployment using best practices in MLOps.

πŸ“‹ Table of Contents

🎯 Project Overview

In an increasingly dynamic financial landscape, XYZ Financial Services faces the critical challenge of accurately predicting credit risk. This project develops a predictive model that estimates the probability of credit default based on credit card owners' characteristics such as age, gender, education, marital status, credit limit, and payment history.

Business Goal

Enable XYZ Financial Services to:

  • Identify high-risk credit clients
  • Tailor risk mitigation strategies
  • Adjust credit limits appropriately
  • Offer targeted financial counseling
  • Reduce default rates and improve portfolio health

πŸ“Š Dataset Information

  • Source: UCI Machine Learning Repository (Data Source Link)
  • Dataset: UCI_Credit_Card.csv
  • Size: 30,000 instances with 25 features
  • Target Variable: default.payment.next.month (Binary: 1 = default, 0 = no default)
  • Time Period: April 2005 to September 2005 (Taiwan)

Key Features

  • Demographic: Age, Gender, Education, Marital Status
  • Credit Information: Credit Limit Balance
  • Repayment Status: Past 6 months payment status (PAY_0 to PAY_6)
  • Bill Statements: Past 6 months bill amounts (BILL_AMT1 to BILL_AMT6)
  • Payment Amounts: Past 6 months payment amounts (PAY_AMT1 to PAY_AMT6)

✨ Features

  • Data Pipeline: Automated data ingestion, transformation, and preprocessing
  • Model Training: Multiple algorithm comparison and hyperparameter tuning
  • Model Evaluation: Comprehensive performance metrics and visualization
  • Web Application: Interactive Streamlit app for real-time predictions
  • Logging: Comprehensive logging system for debugging and monitoring
  • Exception Handling: Robust error handling throughout the pipeline
  • Modular Architecture: Clean, maintainable, and scalable codebase

πŸ“ Project Structure

Credit Card Default Prediction/
β”œβ”€β”€ artifacts/                          # Artifacts folder stores all the outputs of ML pipeline
β”‚   β”œβ”€β”€ credit_data.csv
β”‚   β”œβ”€β”€ model.pkl
β”‚   β”œβ”€β”€ preprocessor.pkl
β”‚   β”œβ”€β”€ test.csv
β”‚   └── train.csv
β”œβ”€β”€ config/                            # Configuration files
β”‚   └── model.yaml
β”œβ”€β”€ logs/                              # Application logs
β”œβ”€β”€ notebooks/                         # Jupyter notebooks for analysis
β”‚   β”œβ”€β”€ 1-exploratory_data_analysis-EDA.ipynb
β”‚   β”œβ”€β”€ 2-data_preprocessing.ipynb
β”‚   β”œβ”€β”€ 3-model_training_and_evaluation.ipynb
β”‚   β”œβ”€β”€ csv_outputs/                   # Model performance results
β”‚   β”œβ”€β”€ datasets/                      # Raw dataset
β”‚   β”œβ”€β”€ feature_importance_outputs/    # Feature importance plots
β”‚   β”œβ”€β”€ test_performance_outputs/      # Test performance visualizations
β”‚   └── validation_performance_outputs/ # Validation performance visualizations
β”œβ”€β”€ src/                               # Source code
β”‚   β”œβ”€β”€ components/                    # Core components
β”‚   β”‚   β”œβ”€β”€ data_ingestion.py
β”‚   β”‚   β”œβ”€β”€ data_transformation.py
β”‚   β”‚   └── model_trainer.py
β”‚   β”œβ”€β”€ pipeline/                      # Training and prediction pipelines
β”‚   β”‚   β”œβ”€β”€ train_pipeline.py
β”‚   β”‚   └── predict_pipeline.py
β”‚   β”œβ”€β”€ constant/                      # Constants
β”‚   β”œβ”€β”€ utils/                         # Utility functions
β”‚   β”œβ”€β”€ exception.py                   # Custom exception handling
β”‚   └── logger.py                      # Logging configuration
β”œβ”€β”€ streamlit_app.py                   # Web application
β”œβ”€β”€ upload_data.py                     # Upload data into MongoDB
β”œβ”€β”€ requirements.txt                   # Dependencies
β”œβ”€β”€ setup.py                          # Package setup
└── README.md                         # Project documentation

πŸš€ Installation

  1. Clone the repository:
git clone https://github.com/vineet416/Credit_Card_Default_Prediction.git
cd Credit_Card_Default_Prediction
  1. Create a virtual environment:
conda create -p venv python==3.12 -y
conda activate venv/
  1. Install dependencies and package:
pip install -r requirements.txt

πŸ’» Usage

Training the Model

  1. Run the training pipeline:
from src.pipeline.train_pipeline import TrainPipeline

# Initialize and run training pipeline
pipeline = TrainPipeline()
pipeline.run_pipeline()
  1. Or run via command line:
python src/pipeline/train_pipeline.py

Web Application

Launch the Streamlit web application:

streamlit run streamlit_app.py

The web app provides an intuitive interface for:

  • Input credit card holder information
  • Real-time default probability prediction
  • Interactive feature input with explanations

πŸ“ˆ Model Performance

The project evaluates multiple machine learning algorithms:

Model ROC AUC Precision Recall F1 Score Accuracy
Random Forest 0.777 0.57 0.483 0.523 0.805
Gradient Boosting 0.759 0.585 0.41 0.483 0.805
XGBoost 0.758 0.504 0.515 0.509 0.781
K-Nearest Neighbors 0.693 0.375 0.521 0.436 0.702

Best Model: Random Forest with the highest ROC AUC score of 0.777 and F1 Score of 0.523.

Key Insights:

  • Random Forest provides the best balance of precision and recall
  • All models achieve good Areas under the ROC curve around 0.7
  • Feature importance analysis reveals payment history as the most predictive factor

🌐 Web Application - (App_Link)

The Streamlit web application includes:

  • User-friendly Interface: Intuitive input fields for all features
  • Real-time Predictions: Instant default probability calculation
  • Feature Explanations: Detailed descriptions of input parameters
  • Interactive Visualizations: Dynamic charts and plots
  • Responsive Design: Works on desktop and mobile devices

Application Features:

  • Basic information input (age, gender, education, marital status)
  • Credit limit and payment history tracking
  • Bill amounts and payment amounts for past 6 months
  • Instant prediction results with probability scores
  • Visualizations of top 5 features influencing the prediction

πŸ› οΈ Technologies Used

  • Programming Language: Python 3.12+
  • Machine Learning: scikit-learn, XGBoost, imbalanced-learn
  • Data Processing: pandas, numpy
  • Visualization: matplotlib, seaborn
  • Web Framework: Streamlit
  • Configuration: PyYAML
  • Database: pymongo (MongoDB)
  • Development: Jupyter Notebooks

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ‘€ Author

Vineet Patel

πŸ™ Acknowledgments

  • UCI Machine Learning Repository for providing the dataset
  • The open-source community for the amazing tools and libraries
  • Streamlit for the web application framework and ease of deployment

⭐ If you found this project helpful, please give it a star!

About

🎯 End-to-end ML pipeline for credit card default prediction using Random Forest, XGBoost & Gradient Boosting. Features comprehensive EDA, model comparison, and interactive Streamlit web app. Achieves 0.78 ROC AUC with production-ready MLOps architecture.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors