Methodology
How PD, LGD, and ECL are estimated — and how the stress testing works.
Data
In this lab, I use the Freddie Mac Single-Family Loan-Level dataset (2010–2016 vintages).
There are two main files:
- Origination data: FICO, LTV, DTI, loan amount, interest rate, state, property type, etc.
- Servicing data: monthly loan status, delinquency, balance updates, and loss information.
Each year has around 350,000 loans. Over 7 years, that’s about 2.4 million loans.
I also added macroeconomic data from the FRED API, such as unemployment rate, home price index (HPI), and mortgage rates.
Default Definition
A loan is treated as defaulted if:
- It becomes 90+ days past due (delinquency status ≥ 3), or
- It has a zero balance code related to loss (codes 02–09 like short sale, foreclosure, REO, etc.).
If the zero balance code is 01, that means the borrower prepaid the loan voluntarily.
PD Model
The PD model predicts the probability that a loan will default in the next 12 months.
I observe each loan after 12 months of seasoning.
The target is: does the loan default in the following 12 months?
I train three models using time-based splits:
- Logistic Regression (baseline and easy to interpret)
- XGBoost (with calibration so probabilities are realistic)
- Random Forest (ensemble-based model)
The data split is:
Train: 2010–2013
Validate: 2014
Test: 2015–2016
Important features include FICO, LTV, DTI, interest rate, state, occupancy, loan purpose,
and macro variables like unemployment and HPI changes.
LGD Model
The LGD model estimates how much money is lost when a loan defaults.
LGD is calculated as:
(Exposure − Recoveries + Expenses) / Exposure
The value is limited between 0 and 1.
I only use loans that were actually liquidated and had positive exposure.
An XGBoost regression model is trained to predict LGD.
I evaluate it using RMSE and check calibration by grouping predictions into deciles.
ECL Engine
Expected Credit Loss (ECL) is calculated as:
ECL = PD × LGD × EAD
I follow IFRS 9 staging rules:
- Stage 1: Low risk → 12-month ECL
- Stage 2: Increased risk → Lifetime ECL
- Stage 3: Defaulted/impaired → Lifetime ECL
EAD is the current outstanding loan balance.
Stress Scenarios
To test how the portfolio behaves under stress, I apply macroeconomic shocks:
- Increase unemployment rate : +N percentage points to baseline unemployment rate
- Decrease home prices (HPI) : +N% to baseline HPI year-over-year change (negative = price decline)
- Increase mortgage rates : +N percentage points to 30-year fixed rate
I define four scenarios:
- Baseline (no change)
- Mild stress
- Severe stress
- GFC-like stress
These shocks are applied to macro variables before running predictions
Validation
I use time-based splitting instead of random splits to avoid look-ahead bias.
Train: 2010–2013
Validate: 2014
Test: 2015–2016
For PD: I check AUC, KS, Brier score, and calibration plots.
For LGD: I evaluate RMSE, MAE, R², and decile calibration.