This project focuses on proactive fraud detection for financial transactions using machine learning. The objective is to build a predictive model that identifies fraudulent transactions and provides actionable business insights to improve fraud prevention infrastructure.
The dataset contains 6.3+ million transactions simulated over 30 days.
- Detect fraudulent transactions (
isFraud) - Improve detection beyond the existing rule-based system
- Reduce financial losses due to fraud
- Provide actionable recommendations for infrastructure upgrade
The dataset includes the following key features:
| Feature | Description |
|---|---|
step |
Time unit (1 step = 1 hour) |
type |
Transaction type (CASH-IN, CASH-OUT, DEBIT, PAYMENT, TRANSFER) |
amount |
Transaction amount |
oldbalanceOrg |
Sender balance before transaction |
newbalanceOrig |
Sender balance after transaction |
oldbalanceDest |
Receiver balance before transaction |
newbalanceDest |
Receiver balance after transaction |
isFraud |
Target variable (1 = Fraud) |
isFlaggedFraud |
Rule-based fraud flag |
Merchant accounts do not have destination balance information. We created a flag:
isMerchant = nameDest.startswith('M')Destination balances for merchants were treated as missing values.
Created new features:
orig_balance_diff= oldbalanceOrg − newbalanceOrigdest_balance_diff= newbalanceDest − oldbalanceDesthour= step % 24isMerchant
These features help detect account-draining patterns.
Fraud cases represent a very small percentage of transactions. We used:
class_weight='balanced'in Random Forest to address imbalance.
Why Random Forest?
- Handles nonlinear patterns
- Robust to outliers
- Handles imbalance
- Provides feature importance
- High performance on tabular data
Since the dataset is highly imbalanced, we evaluated using:
- Confusion Matrix
- Precision
- Recall
- F1 Score
- ROC-AUC Score
Maximize Recall for Fraud Class (Missing fraud = financial loss)
Top important features:
orig_balance_diffamounttype_TRANSFERtype_CASH_OUToldbalanceOrghour
Fraudsters typically:
- Transfer funds
- Drain accounts fully
- Perform CASH-OUT transactions
These patterns align with business logic.
Current system flags only transfers > 200,000. It misses many fraudulent behaviors.
Replace static rules with:
- ML-based fraud probability scoring
- Real-time risk assessment
- Multi-level transaction verification
- Deploy model as real-time scoring API
- Introduce risk-based authentication
- Implement monitoring dashboard
- Retrain model periodically
- Conduct A/B testing before full deployment
After implementation, success can be measured using:
- Increase in fraud detection rate
- Reduction in financial loss
- Controlled false positive rate
- Improved operational efficiency
- Python
- Pandas
- NumPy
- Scikit-learn
- Matplotlib
- Seaborn
- Fraud primarily occurs in TRANSFER and CASH-OUT transactions.
- Fraudsters often empty accounts completely.
- ML-based detection significantly outperforms rule-based detection.
- Proactive fraud scoring improves risk management and financial protection.