Author: Zaki Abiyu Aqilah
Tools: R (openxlsx, C50, reshape2)
Goal: Predict credit risk rating (1–5) based on customer profile
This project builds a C5.0 decision tree model to classify loan applicants into five risk categories (1 = lowest risk, 5 = highest risk). The model helps financial institutions make data-driven credit approval decisions.
Key Results:
- Testing Accuracy: 96%
- Correct predictions: 96 out of 100
- Most important feature: Active KPR status (100% importance)
The dataset contains 1,000+ loan applications with the following variables:
| Variable | Type | Description |
|---|---|---|
pendapatan_setahun_juta |
Numeric | Annual income (million Rupiah) |
kpr_aktif |
Factor | Active mortgage status (YA/TIDAK) |
durasi_pinjaman_bulan |
Numeric | Loan tenure (months) |
jumlah_tanggungan |
Numeric | Number of dependents |
risk_rating |
Factor (1-5) | Target variable |
Note: The original Excel file is not included in this repository. See
data/README.mdfor data structure details.
| Predicted \ Actual | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| 1 | 20 | 0 | 0 | 0 | 0 |
| 2 | 4 | 13 | 0 | 0 | 0 |
| 3 | 0 | 0 | 38 | 0 | 0 |
| 4 | 0 | 0 | 0 | 9 | 0 |
| 5 | 0 | 0 | 0 | 0 | 16 |
| Metric | Value |
|---|---|
| Accuracy | 96% |
| Correct Predictions | 96 / 100 |
| Wrong Predictions | 4 / 100 |
| Tree Size | 17 rules |
| Variable | Importance (%) |
|---|---|
kpr_aktif (Active KPR) |
100.00 |
jumlah_tanggungan (Dependents) |
75.50 |
pendapatan_setahun_juta (Annual Income) |
68.13 |
durasi_pinjaman_bulan (Loan Tenure) |
57.13 |
Insight: Active KPR status is the most dominant factor in determining credit risk rating.
- Dependents ≤ 4 → Risk Rating 3 (255 cases)
- Dependents > 4:
- Income > 248 million → Risk Rating 4
- Income ≤ 248 million:
- Loan tenure ≤ 24 months → Risk Rating 4
- Loan tenure > 24 months → Risk Rating 5
- Income ≤ 95 million → Risk Rating 2
- Income > 95 million:
- Loan tenure > 36 months → Risk Rating 2
- Loan tenure ≤ 36 months:
- Income > 201 million → Risk Rating 1
- Income ≤ 201 million → (further splits, see full output)
Download detailed outputs: Confusion Matrix CSV | Variable Importance CSV | Full Model Summary
Three sample applications were tested:
| Application | Income (M) | Active KPR | Tenure (months) | Dependents | Predicted Risk |
|---|---|---|---|---|---|
| App 1 | 200 | YA | 12 | 6 | 4 (High Risk) |
| App 2 | 150 | TIDAK | 64 | 6 | 2 (Low Risk) |
| App 3 | 300 | TIDAK | 24 | 2 | 1 (Lowest Risk) |
- Clone this repository
- Place your Excel file (CreditRisk_R.xlsx) in the
data/folder - Run
script/model.Rin RStudio or R console