This project implements a machine learning pipeline for detecting fraudulent financial transactions. The system analyzes transaction attributes such as transaction amount, customer behavior, and terminal activity to classify whether a transaction is legitimate or fraudulent.
Fraud detection is a critical application of machine learning in financial systems, helping institutions detect suspicious activities and prevent financial losses.
The goal of this project is to build a machine learning model capable of accurately identifying fraudulent transactions using transaction data.
The system performs:
- Data preprocessing
- Exploratory data analysis
- Feature engineering
- Model training and tuning
- Fraud prediction
The dataset used in this project is a simulated financial transaction dataset containing both legitimate and fraudulent transactions.
Fraud scenarios included in the dataset:
- Transactions above a certain threshold amount are marked as fraudulent.
- Fraudulent activity originating from compromised payment terminals.
- Customer accounts performing abnormal spending behavior due to leaked credentials.
Key dataset features:
| Column | Description |
|---|---|
| TRANSACTION_ID | Unique transaction identifier |
| TX_DATETIME | Date and time of transaction |
| CUSTOMER_ID | Unique customer identifier |
| TERMINAL_ID | Merchant terminal identifier |
| TX_AMOUNT | Transaction amount |
| TX_FRAUD | Fraud label (0 = Legitimate, 1 = Fraud) |
Due to size limitations, the dataset is not included in this repository.
Several models were trained and evaluated:
- Logistic Regression
- Random Forest
- XGBoost
The best performing models were saved for deployment.
models/best_xgboost.pkl
Due to GitHub's file size limit (100MB), the Random Forest model is hosted externally.
Download here:
https://drive.google.com/file/d/1_HhKgvmPDDnjKwT2R4RiLN_ZeHjSxnhF/view?usp=sharing
After downloading:
- Extract the ZIP file
- Place the model file inside: models/
Example: models/ ├── best_xgboost.pkl └── best_randomforest.pkl
Clone the repository git clone https://github.com/shrashtimittal/fraud-transaction-detection.git
cd fraud-transaction-detection
Install dependencies pip install -r requirements.txt
You can run training or evaluation scripts from the src directory.
Example: python src/train_models.py
For predictions: python src/predict.py
fraud-transaction-detection │ ├── models │ └── best_xgboost.pkl │ ├── src │ ├── analyze_results.py │ ├── app.py │ ├── data_loader.py │ ├── eda_plots.py │ ├── eda_split.py │ ├── evaluate.py │ ├── predict.py │ ├── save_best_models.py │ ├── train_models.py │ └── tune_models.py │ ├── requirements.txt └── README.md
- Fraudulent transactions often show abnormal transaction amounts.
- Terminal based fraud patterns can be detected through transaction clustering.
- Customer behavior analysis improves fraud detection accuracy.
- Real-time fraud detection pipeline
- Integration with streaming transaction data
- Deployment as an API service
- Advanced anomaly detection models
Shrashti Mittal
Machine Learning & AI Enthusiast