LendingClub Credit Risk Analysis on Loan Defaults

Dec 2025

Python / Risk Analytics / EDA / Feature Selection / Clustering / Outlier Detection / ML Classification

About Project

Type: Personal Data Mining / Risk Analytics Project
Domain: Credit Risk / Lending / Default Prediction
Scope: Large-scale borrower dataset (1M+ records) with 150+ engineered and curated features for default risk analysis
Deliverables: End-to-end notebooks + final report + reusable feature sets (raw, scaled+PCA) for clustering and classification

Objective

Analyze loan default risk by segmenting borrowers, detecting anomalous risk profiles, and building predictive models that identify high-risk borrowers—supporting credit decisioning and risk monitoring.

Tools & Technologies

Python, Pandas, NumPy, Scikit-learn, XGBoost, scikit-learn-extra (K-Medoids), PCA, t-SNE, Git

Key Work & Impact

Built an end-to-end analytics workflow covering cleaning, preprocessing, feature selection, modeling, and evaluation—structured as reproducible notebooks with project-generated datasets.

Performed feature selection using multiple methods (RFE, SelectKBest, Fisher Score, Mutual Information) to reduce noise and improve model signal quality across 150+ variables.

Segmented borrower profiles using clustering (K-Means, K-Medoids, Agglomerative) to uncover distinct risk archetypes and provide interpretable group-level insights beyond a single risk score.

Detected anomalous borrower behavior using outlier detection (Local Outlier Factor + distance-based approaches) to flag edge-case risk patterns that traditional averages can hide.

Trained and tuned classification models (Logistic Regression, Random Forest, XGBoost) and evaluated performance using cross-validation and ROC analysis.

Optimized for “risk capture” by targeting recall on the charged-off class—achieving 67% recall and 0.43 F1-score to better identify high-risk borrowers for review or policy tightening.

Created PCA/t-SNE visual diagnostics and correlation analysis to validate structure in the data and communicate findings clearly to non-technical stakeholders.

External Links