Predictive Insights with XGBoost and SHAP

Predictive Insights with XGBoost and SHAP

In this project, I assisted in building a complete machine learning and visualization workflow to model and explain how various urban and environmental indicators relate to Land Surface Temperature (LST).

Categories

python

Technical Workflow:

  • Data Handling:
    Loaded and cleaned the dataset  in Python using pandas, filtered invalid or zero entries, and prepared ~13K valid observations.

  • Statistical Analysis:
    Computed Pearson correlation matrix with annotated p-values to evaluate variable relationships.
    Visualized results with Seaborn heatmaps using a custom color palette (plasma) and clear significance markers.

  • Model Development:
    Implemented XGBoost Regressor to predict LST using environmental and built-up features.
    Tuned model parameters for depth, learning rate, and estimators.
    Evaluated using R² and RMSE metrics for both training and testing data.

  • Model Explainability:
    Applied SHAP (TreeExplainer) to interpret model predictions and feature contributions.
    Generated:

    • SHAP bar plots for global feature importance

    • Beeswarm plots for distribution and direction of impact

    • Custom combined plot with dual axes for interpretability

  • Visualization Styling:
    Used Matplotlib and Seaborn for consistent, publication-ready visuals with “Times New Roman” font and custom color themes.

Tech Stack

Python, pandas, NumPy, XGBoost, SHAP, Seaborn, Matplotlib, SciPy

Lets Work Together

The technological revolution is changing aspect of our lives, and the fabric of society itself. it’s also changing the way we learn and what we learn