LendingClub Credit Risk Analysis on Loan Defaults

About Project
Objective
Analyze loan default risk by segmenting borrowers, detecting anomalous risk profiles, and building predictive models that identify high-risk borrowers—supporting credit decisioning and risk monitoring.
Tools & Technologies
Python, Pandas, NumPy, Scikit-learn, XGBoost, scikit-learn-extra (K-Medoids), PCA, t-SNE, Git
Key Work & Impact
Built an end-to-end analytics workflow covering cleaning, preprocessing, feature selection, modeling, and evaluation—structured as reproducible notebooks with project-generated datasets.
Performed feature selection using multiple methods (RFE, SelectKBest, Fisher Score, Mutual Information) to reduce noise and improve model signal quality across 150+ variables.
Segmented borrower profiles using clustering (K-Means, K-Medoids, Agglomerative) to uncover distinct risk archetypes and provide interpretable group-level insights beyond a single risk score.
Detected anomalous borrower behavior using outlier detection (Local Outlier Factor + distance-based approaches) to flag edge-case risk patterns that traditional averages can hide.
Trained and tuned classification models (Logistic Regression, Random Forest, XGBoost) and evaluated performance using cross-validation and ROC analysis.
Optimized for “risk capture” by targeting recall on the charged-off class—achieving 67% recall and 0.43 F1-score to better identify high-risk borrowers for review or policy tightening.
Created PCA/t-SNE visual diagnostics and correlation analysis to validate structure in the data and communicate findings clearly to non-technical stakeholders.