Skip to content

Nikhil3107jaiswal/Fraud_Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

💳 Financial Fraud Detection using Machine Learning

📌 Project Overview

This project focuses on proactive fraud detection for financial transactions using machine learning. The objective is to build a predictive model that identifies fraudulent transactions and provides actionable business insights to improve fraud prevention infrastructure.

The dataset contains 6.3+ million transactions simulated over 30 days.


🎯 Business Objective

  • Detect fraudulent transactions (isFraud)
  • Improve detection beyond the existing rule-based system
  • Reduce financial losses due to fraud
  • Provide actionable recommendations for infrastructure upgrade

📊 Dataset Description

The dataset includes the following key features:

Feature Description
step Time unit (1 step = 1 hour)
type Transaction type (CASH-IN, CASH-OUT, DEBIT, PAYMENT, TRANSFER)
amount Transaction amount
oldbalanceOrg Sender balance before transaction
newbalanceOrig Sender balance after transaction
oldbalanceDest Receiver balance before transaction
newbalanceDest Receiver balance after transaction
isFraud Target variable (1 = Fraud)
isFlaggedFraud Rule-based fraud flag

⚠️ Note: Destination balances are not available for merchant accounts (IDs starting with "M").


🧹 Data Cleaning & Preprocessing

✔ Handling Structural Missing Values

Merchant accounts do not have destination balance information. We created a flag:

isMerchant = nameDest.startswith('M')

Destination balances for merchants were treated as missing values.


✔ Feature Engineering

Created new features:

  • orig_balance_diff = oldbalanceOrg − newbalanceOrig
  • dest_balance_diff = newbalanceDest − oldbalanceDest
  • hour = step % 24
  • isMerchant

These features help detect account-draining patterns.


✔ Handling Imbalanced Data

Fraud cases represent a very small percentage of transactions. We used:

class_weight='balanced'

in Random Forest to address imbalance.


🤖 Model Used

🔹 Random Forest Classifier

Why Random Forest?

  • Handles nonlinear patterns
  • Robust to outliers
  • Handles imbalance
  • Provides feature importance
  • High performance on tabular data

📈 Model Evaluation

Since the dataset is highly imbalanced, we evaluated using:

  • Confusion Matrix
  • Precision
  • Recall
  • F1 Score
  • ROC-AUC Score

🎯 Key Priority:

Maximize Recall for Fraud Class (Missing fraud = financial loss)


🔍 Key Predictors of Fraud

Top important features:

  1. orig_balance_diff
  2. amount
  3. type_TRANSFER
  4. type_CASH_OUT
  5. oldbalanceOrg
  6. hour

Fraudsters typically:

  • Transfer funds
  • Drain accounts fully
  • Perform CASH-OUT transactions

These patterns align with business logic.


🏢 Business Insights

🔹 Existing Rule-Based System Limitation

Current system flags only transfers > 200,000. It misses many fraudulent behaviors.

🔹 Recommended Solution

Replace static rules with:

  • ML-based fraud probability scoring
  • Real-time risk assessment
  • Multi-level transaction verification

🚀 Infrastructure Recommendations

  1. Deploy model as real-time scoring API
  2. Introduce risk-based authentication
  3. Implement monitoring dashboard
  4. Retrain model periodically
  5. Conduct A/B testing before full deployment

📊 Measuring Success

After implementation, success can be measured using:

  • Increase in fraud detection rate
  • Reduction in financial loss
  • Controlled false positive rate
  • Improved operational efficiency

🛠 Tech Stack

  • Python
  • Pandas
  • NumPy
  • Scikit-learn
  • Matplotlib
  • Seaborn

📌 Conclusion

  • Fraud primarily occurs in TRANSFER and CASH-OUT transactions.
  • Fraudsters often empty accounts completely.
  • ML-based detection significantly outperforms rule-based detection.
  • Proactive fraud scoring improves risk management and financial protection.

About

End-to-end fraud detection project using ML to identify high-risk financial transactions. Includes data cleaning, feature engineering, imbalance handling, and performance evaluation with business insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors