!pip install shap

Collecting shap
  Downloading shap-0.46.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (24 kB)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from shap) (1.26.4)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from shap) (1.13.1)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from shap) (1.3.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from shap) (2.1.4)
Requirement already satisfied: tqdm>=4.27.0 in /usr/local/lib/python3.10/dist-packages (from shap) (4.66.5)
Requirement already satisfied: packaging>20.9 in /usr/local/lib/python3.10/dist-packages (from shap) (24.1)
Collecting slicer==0.0.8 (from shap)
  Downloading slicer-0.0.8-py3-none-any.whl.metadata (4.0 kB)
Requirement already satisfied: numba in /usr/local/lib/python3.10/dist-packages (from shap) (0.60.0)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.10/dist-packages (from shap) (2.2.1)
Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /usr/local/lib/python3.10/dist-packages (from numba->shap) (0.43.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->shap) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->shap) (2024.2)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->shap) (2024.1)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->shap) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->shap) (3.5.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas->shap) (1.16.0)
Downloading shap-0.46.0-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (540 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 540.1/540.1 kB 11.2 MB/s eta 0:00:00
Downloading slicer-0.0.8-py3-none-any.whl (15 kB)
Installing collected packages: slicer, shap
Successfully installed shap-0.46.0 slicer-0.0.8

!pip install aif360

Collecting aif360
  Downloading aif360-0.6.1-py3-none-any.whl.metadata (5.0 kB)
Requirement already satisfied: numpy>=1.16 in /usr/local/lib/python3.10/dist-packages (from aif360) (1.26.4)
Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from aif360) (1.13.1)
Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.10/dist-packages (from aif360) (2.1.4)
Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.10/dist-packages (from aif360) (1.3.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from aif360) (3.7.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->aif360) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->aif360) (2024.2)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->aif360) (2024.1)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->aif360) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->aif360) (3.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360) (4.53.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360) (3.1.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas>=0.24.0->aif360) (1.16.0)
Downloading aif360-0.6.1-py3-none-any.whl (259 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 259.7/259.7 kB 8.3 MB/s eta 0:00:00
Installing collected packages: aif360
Successfully installed aif360-0.6.1

pip install 'aif360[Reductions]'

Requirement already satisfied: aif360[Reductions] in /usr/local/lib/python3.10/dist-packages (0.6.1)
Requirement already satisfied: numpy>=1.16 in /usr/local/lib/python3.10/dist-packages (from aif360[Reductions]) (1.26.4)
Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from aif360[Reductions]) (1.13.1)
Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.10/dist-packages (from aif360[Reductions]) (2.1.4)
Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.10/dist-packages (from aif360[Reductions]) (1.3.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from aif360[Reductions]) (3.7.1)
Collecting fairlearn~=0.7 (from aif360[Reductions])
  Downloading fairlearn-0.10.0-py3-none-any.whl.metadata (7.0 kB)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->aif360[Reductions]) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->aif360[Reductions]) (2024.2)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->aif360[Reductions]) (2024.1)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->aif360[Reductions]) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->aif360[Reductions]) (3.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[Reductions]) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[Reductions]) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[Reductions]) (4.53.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[Reductions]) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[Reductions]) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[Reductions]) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[Reductions]) (3.1.4)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas>=0.24.0->aif360[Reductions]) (1.16.0)
Downloading fairlearn-0.10.0-py3-none-any.whl (234 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 234.1/234.1 kB 6.8 MB/s eta 0:00:00
Installing collected packages: fairlearn
Successfully installed fairlearn-0.10.0

pip install 'aif360[inFairness]'

Requirement already satisfied: aif360[inFairness] in /usr/local/lib/python3.10/dist-packages (0.6.1)
Requirement already satisfied: numpy>=1.16 in /usr/local/lib/python3.10/dist-packages (from aif360[inFairness]) (1.26.4)
Requirement already satisfied: scipy>=1.2.0 in /usr/local/lib/python3.10/dist-packages (from aif360[inFairness]) (1.13.1)
Requirement already satisfied: pandas>=0.24.0 in /usr/local/lib/python3.10/dist-packages (from aif360[inFairness]) (2.1.4)
Requirement already satisfied: scikit-learn>=1.0 in /usr/local/lib/python3.10/dist-packages (from aif360[inFairness]) (1.3.2)
Requirement already satisfied: matplotlib in /usr/local/lib/python3.10/dist-packages (from aif360[inFairness]) (3.7.1)
Collecting skorch (from aif360[inFairness])
  Downloading skorch-1.0.0-py3-none-any.whl.metadata (11 kB)
Collecting inFairness>=0.2.2 (from aif360[inFairness])
  Downloading inFairness-0.2.3-py3-none-any.whl.metadata (8.1 kB)
Collecting POT>=0.8.0 (from inFairness>=0.2.2->aif360[inFairness])
  Downloading POT-0.9.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (32 kB)
Requirement already satisfied: torch>=1.13.0 in /usr/local/lib/python3.10/dist-packages (from inFairness>=0.2.2->aif360[inFairness]) (2.4.0+cu121)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->aif360[inFairness]) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->aif360[inFairness]) (2024.2)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.24.0->aif360[inFairness]) (2024.1)
Requirement already satisfied: joblib>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->aif360[inFairness]) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn>=1.0->aif360[inFairness]) (3.5.0)
Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[inFairness]) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[inFairness]) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[inFairness]) (4.53.1)
Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[inFairness]) (1.4.7)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[inFairness]) (24.1)
Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[inFairness]) (10.4.0)
Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib->aif360[inFairness]) (3.1.4)
Requirement already satisfied: tabulate>=0.7.7 in /usr/local/lib/python3.10/dist-packages (from skorch->aif360[inFairness]) (0.9.0)
Requirement already satisfied: tqdm>=4.14.0 in /usr/local/lib/python3.10/dist-packages (from skorch->aif360[inFairness]) (4.66.5)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.2->pandas>=0.24.0->aif360[inFairness]) (1.16.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->inFairness>=0.2.2->aif360[inFairness]) (3.16.0)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->inFairness>=0.2.2->aif360[inFairness]) (4.12.2)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->inFairness>=0.2.2->aif360[inFairness]) (1.13.2)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->inFairness>=0.2.2->aif360[inFairness]) (3.3)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->inFairness>=0.2.2->aif360[inFairness]) (3.1.4)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->inFairness>=0.2.2->aif360[inFairness]) (2024.6.1)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.13.0->inFairness>=0.2.2->aif360[inFairness]) (2.1.5)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.13.0->inFairness>=0.2.2->aif360[inFairness]) (1.3.0)
Downloading inFairness-0.2.3-py3-none-any.whl (45 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.8/45.8 kB 1.6 MB/s eta 0:00:00
Downloading skorch-1.0.0-py3-none-any.whl (239 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 239.4/239.4 kB 11.6 MB/s eta 0:00:00
Downloading POT-0.9.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (835 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 835.4/835.4 kB 29.5 MB/s eta 0:00:00
Installing collected packages: POT, skorch, inFairness
Successfully installed POT-0.9.4 inFairness-0.2.3 skorch-1.0.0

import pandas as pd
import numpy as np
import sklearn
import matplotlib.pyplot as plt
import seaborn as sns
import shap
import lightgbm as lgbm

from lightgbm import LGBMClassifier

from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV, StratifiedKFold
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report, roc_auc_score, log_loss

from scipy.stats import skew

from imblearn.over_sampling import SMOTE
from imblearn.over_sampling import SMOTE

from aif360.metrics import BinaryLabelDatasetMetric
from aif360.datasets import BinaryLabelDataset
from aif360.algorithms.preprocessing import Reweighing

import joblib

/usr/local/lib/python3.10/dist-packages/dask/dataframe/__init__.py:42: FutureWarning: 
Dask dataframe query planning is disabled because dask-expr is not installed.

You can install it with `pip install dask[dataframe]` or `conda install dask`.
This will raise in a future version.

  warnings.warn(msg, FutureWarning)
/usr/local/lib/python3.10/dist-packages/inFairness/utils/ndcg.py:37: FutureWarning: We've integrated functorch into PyTorch. As the final step of the integration, `functorch.vmap` is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use `torch.vmap` instead; see the PyTorch 2.0 release notes and/or the `torch.func` migration guide for more details https://pytorch.org/docs/main/func.migrating.html
  vect_normalized_discounted_cumulative_gain = vmap(
/usr/local/lib/python3.10/dist-packages/inFairness/utils/ndcg.py:48: FutureWarning: We've integrated functorch into PyTorch. As the final step of the integration, `functorch.vmap` is deprecated as of PyTorch 2.0 and will be deleted in a future version of PyTorch >= 2.3. Please use `torch.vmap` instead; see the PyTorch 2.0 release notes and/or the `torch.func` migration guide for more details https://pytorch.org/docs/main/func.migrating.html
  monte_carlo_vect_ndcg = vmap(vect_normalized_discounted_cumulative_gain, in_dims=(0,))

# Load the Drive helper and mount
from google.colab import drive

# This will prompt for authorization.
drive.mount('/content/drive')

%ls "/content/drive/My Drive/Colab Notebooks/hmdaNY_02092024_1603_Ready_2.csv"

Mounted at /content/drive
'/content/drive/My Drive/Colab Notebooks/hmdaNY_02092024_1603_Ready_2.csv'

df = pd.read_csv('/content/drive/My Drive/Colab Notebooks/hmdaNY_02092024_1603_Ready_2.csv')

print(df.shape,'\n')
print(df.dtypes)

(146718, 15) 

action_taken                         object
applicant_income_000s               float64
ethnicity_race_sex                   object
hud_median_family_income            float64
hud_median_family_income_missing      int64
lien_status_name                     object
loan_amount_000s                      int64
loan_purpose_name                    object
loan_type_name                       object
loan_to_income_ratio                float64
minority_population                 float64
minority_population_missing           int64
property_type_name                   object
tract_to_msamd_income               float64
tract_to_msamd_income_missing         int64
dtype: object

# Creation of binary column for approval - denails
df['action_taken_binary'] = df['action_taken'].map({'denied': 1,'approved': 0})

# Drop the original 'action_taken' column
df = df.drop(columns=['action_taken'])

print(df['action_taken_binary'].value_counts())

action_taken_binary
0    116408
1     30310
Name: count, dtype: int64

# Split the data
train_df, test_df = train_test_split(df, test_size=0.3, random_state=42)

train_df.to_csv('/content/drive/My Drive/Colab Notebooks/hmdaNY_03092024_1603_train_df.csv', index=False)
test_df.to_csv('/content/drive/My Drive/Colab Notebooks/hmdaNY_03092024_1603_test_df.csv', index=False)

print(train_df.shape)
print(test_df.shape)

(102702, 15)
(44016, 15)

# Log transformation
# Scaling

# 'loan_to_income_ratio'
plt.hist(train_df['loan_to_income_ratio'], bins=100, edgecolor='black')

plt.title('Distribution of loan_to_income_ratio')
plt.xlabel('loan_to_income_ratio')
plt.ylabel('Frequency')

plt.show()


# Calculating the skewness of loan_to_income_ratio
original_skewness = skew(train_df['loan_to_income_ratio'])
print(f"Skewness: {original_skewness}")

Skewness: 4.0683291922765275

# Log transformation: yes
# Scaling: yes


# applicant_income_000s
plt.hist(train_df['applicant_income_000s'], bins=30, edgecolor='black')

plt.title('Distribution of Applicant Income (in 000s)')
plt.xlabel('Applicant Income (000s)')
plt.ylabel('Frequency')

plt.show()


# Calculating the skewness of applicant_income_000s
original_skewness = skew(train_df['applicant_income_000s'])
print(f"Skewness: {original_skewness}")

Skewness: 3.39818358160144

# Log transformation: No
# Scaling: yes

# hud_median_family_income
plt.hist(train_df['hud_median_family_income'], bins=30, edgecolor='black')

plt.title('Distribution of hud_median_family_income')
plt.xlabel('hud_median_family_income')
plt.ylabel('Frequency')

plt.show()


# Calculating the skewness of hud_median_family_income
original_skewness = skew(train_df['hud_median_family_income'])
print(f"Skewness: {original_skewness}")

Skewness: 1.2479818056447094

# Log transformation: yes
# scaling: yes

# loan_amount_000s
plt.hist(train_df['loan_amount_000s'], bins=30, edgecolor='black')

plt.title('Distribution of loan_amount_000s')
plt.xlabel('loan_amount_000s')
plt.ylabel('Frequency')

plt.show()


# Calculating the skewness of loan_amount_000s
original_skewness = skew(train_df['loan_amount_000s'])
print(f"Skewness: {original_skewness}")

Skewness: 1.824575061311811

# ONLY scaling.

# minority_population
plt.hist(train_df['minority_population'], bins=30, edgecolor='black')

plt.title('Distribution of minority_population')
plt.xlabel('minority_population')
plt.ylabel('Frequency')

plt.show()


# Calculating the skewness of minority_population
original_skewness = skew(train_df['minority_population'])
print(f"Skewness: {original_skewness}")

Skewness: 1.1907000593514665

# Log transformation optional
# Scaling yes

# Applicant Income
plt.hist(train_df['tract_to_msamd_income'], bins=30, edgecolor='black')

plt.title('Distribution of tract_to_msamd_income')
plt.xlabel('tract_to_msamd_income')
plt.ylabel('Frequency')

plt.show()


# Calculating the skewness of tract_to_msamd_income
original_skewness = skew(train_df['tract_to_msamd_income'])
print(f"Skewness: {original_skewness}")

Skewness: 1.9468038744343557

columns_to_log = ['loan_to_income_ratio', 'tract_to_msamd_income', 'loan_amount_000s', 'applicant_income_000s']

train_df[columns_to_log] = train_df[columns_to_log].apply(np.log1p)
test_df[columns_to_log] = test_df[columns_to_log].apply(np.log1p)

# Features to Scale
columns_to_scaling = ['loan_to_income_ratio',
                    'applicant_income_000s',
                    'hud_median_family_income',
                    'loan_amount_000s',
                    'minority_population',
                    'tract_to_msamd_income'
                   ]

# Fitting preprocessing on training data
scaler = StandardScaler()
scaler.fit(train_df[columns_to_scaling])

## Applying preprocessing to both sets
train_df[columns_to_scaling] = scaler.transform(train_df[columns_to_scaling])
test_df[columns_to_scaling] = scaler.transform(test_df[columns_to_scaling])

# Check the mean and standard deviation
print("Means of scaled features:")
print(train_df[columns_to_scaling].mean())
print("\nStandard deviations of scaled features:")
print(train_df[columns_to_scaling].std())

Means of scaled features:
loan_to_income_ratio        2.184859e-16
applicant_income_000s      -2.292788e-16
hud_median_family_income   -2.480970e-16
loan_amount_000s           -1.053202e-15
minority_population        -7.131233e-17
tract_to_msamd_income      -2.276737e-15
dtype: float64

Standard deviations of scaled features:
loan_to_income_ratio        1.000005
applicant_income_000s       1.000005
hud_median_family_income    1.000005
loan_amount_000s            1.000005
minority_population         1.000005
tract_to_msamd_income       1.000005
dtype: float64

# Checking the range
print("\nMin values:")
print(train_df[columns_to_scaling].min())
print("\nMax values:")
print(train_df[columns_to_scaling].max())

Min values:
loan_to_income_ratio        -2.212567
applicant_income_000s       -2.513814
hud_median_family_income    -1.002217
loan_amount_000s            -3.041207
minority_population         -1.023697
tract_to_msamd_income      -11.541474
dtype: float64

Max values:
loan_to_income_ratio        5.389856
applicant_income_000s       3.714564
hud_median_family_income    2.033801
loan_amount_000s            2.021953
minority_population         2.410389
tract_to_msamd_income       3.007542
dtype: float64

# Features to encode
categorical_columns = ['loan_type_name',
                             'loan_purpose_name',
                             'property_type_name',
                             'lien_status_name',
                             'ethnicity_race_sex']

# Fitting preprocessing on training data only
encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
encoder.fit(train_df[categorical_columns])

# Applying preprocessing on both sets
train_df_LogEnco = encoder.transform(train_df[categorical_columns])
test_df_LogEnco = encoder.transform(test_df[categorical_columns])

# Creating new column names for the encoded features
new_column_names = encoder.get_feature_names_out(categorical_columns)

# Converting the encoded arrays to DataFrames
train_df_LogEnco = pd.DataFrame(train_df_LogEnco, columns=new_column_names, index=train_df.index)
test_df_LogEnco = pd.DataFrame(test_df_LogEnco, columns=new_column_names, index=test_df.index)

# Dropping the original categorical columns and add the encoded columns
train_df = train_df.drop(columns=categorical_columns).join(train_df_LogEnco)
test_df = test_df.drop(columns=categorical_columns).join(test_df_LogEnco)

#https://medium.com/@vinodkumargr/11-column-transformer-in-ml-sklearn-column-transformer-in-machine-learning-48479f8cb48f#:~:text=Scikit%2DLearn's%20Column%20Transformer%20is,transformer%20should%20be%20applied%20to.
#https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html

print(train_df)

        applicant_income_000s  hud_median_family_income  \
93790               -0.560030                 -0.621935   
3731                 0.380770                 -0.279059   
58545                2.404752                 -0.279059   
142437              -0.189761                  2.033801   
21819                0.030613                 -0.279059   
...                       ...                       ...   
110268              -1.733987                 -0.603233   
119879              -1.167283                 -0.603233   
103694              -0.716095                 -0.621935   
131932               0.091232                  2.033801   
121958               0.009887                  0.020179   

        hud_median_family_income_missing  loan_amount_000s  \
93790                                  0         -0.235946   
3731                                   0          0.774233   
58545                                  0          1.897857   
142437                                 0          0.573865   
21819                                  0          0.066945   
...                                  ...               ...   
110268                                 0         -2.694126   
119879                                 0         -0.268873   
103694                                 0         -0.311898   
131932                                 0          0.679899   
121958                                 0         -1.136912   

        loan_to_income_ratio  minority_population  \
93790              -0.117323             1.438200   
3731                0.707355            -0.335849   
58545               0.781324            -0.119502   
142437              0.898603             1.477005   
21819              -0.136738             0.080362   
...                      ...                  ...   
110268             -1.862718             1.527486   
119879              0.354152            -0.740041   
103694             -0.104487             2.104069   
131932              0.817267            -0.266481   
121958             -1.473303            -0.368130   

        minority_population_missing  tract_to_msamd_income  \
93790                             0              -1.883378   
3731                              0               0.883041   
58545                             0               0.680940   
142437                            0              -1.704510   
21819                             0               0.811845   
...                             ...                    ...   
110268                            0              -1.092325   
119879                            0              -0.024931   
103694                            0              -1.151373   
131932                            0              -0.389044   
121958                            0               0.302802   

        tract_to_msamd_income_missing  action_taken_binary  ...  \
93790                               0                    0  ...   
3731                                0                    0  ...   
58545                               0                    0  ...   
142437                              0                    0  ...   
21819                               0                    0  ...   
...                               ...                  ...  ...   
110268                              0                    1  ...   
119879                              0                    0  ...   
103694                              0                    0  ...   
131932                              0                    1  ...   
121958                              0                    0  ...   

        ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female  \
93790                                                 0.0                                   
3731                                                  0.0                                   
58545                                                 0.0                                   
142437                                                0.0                                   
21819                                                 0.0                                   
...                                                   ...                                   
110268                                                0.0                                   
119879                                                0.0                                   
103694                                                0.0                                   
131932                                                0.0                                   
121958                                                0.0                                   

        ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male  \
93790                                                 0.0                                 
3731                                                  0.0                                 
58545                                                 0.0                                 
142437                                                0.0                                 
21819                                                 0.0                                 
...                                                   ...                                 
110268                                                0.0                                 
119879                                                0.0                                 
103694                                                0.0                                 
131932                                                0.0                                 
121958                                                0.0                                 

        ethnicity_race_sex_not hispanic or latino_asian_female  \
93790                                                 0.0        
3731                                                  1.0        
58545                                                 0.0        
142437                                                0.0        
21819                                                 0.0        
...                                                   ...        
110268                                                0.0        
119879                                                0.0        
103694                                                0.0        
131932                                                0.0        
121958                                                0.0        

        ethnicity_race_sex_not hispanic or latino_asian_male  \
93790                                                 0.0      
3731                                                  0.0      
58545                                                 0.0      
142437                                                0.0      
21819                                                 1.0      
...                                                   ...      
110268                                                0.0      
119879                                                0.0      
103694                                                0.0      
131932                                                0.0      
121958                                                0.0      

        ethnicity_race_sex_not hispanic or latino_black or african american_female  \
93790                                                 0.0                            
3731                                                  0.0                            
58545                                                 0.0                            
142437                                                0.0                            
21819                                                 0.0                            
...                                                   ...                            
110268                                                0.0                            
119879                                                0.0                            
103694                                                1.0                            
131932                                                0.0                            
121958                                                0.0                            

        ethnicity_race_sex_not hispanic or latino_black or african american_male  \
93790                                                 0.0                          
3731                                                  0.0                          
58545                                                 0.0                          
142437                                                0.0                          
21819                                                 0.0                          
...                                                   ...                          
110268                                                1.0                          
119879                                                0.0                          
103694                                                0.0                          
131932                                                0.0                          
121958                                                0.0                          

        ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female  \
93790                                                 0.0                                            
3731                                                  0.0                                            
58545                                                 0.0                                            
142437                                                0.0                                            
21819                                                 0.0                                            
...                                                   ...                                            
110268                                                0.0                                            
119879                                                0.0                                            
103694                                                0.0                                            
131932                                                0.0                                            
121958                                                0.0                                            

        ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male  \
93790                                                 0.0                                          
3731                                                  0.0                                          
58545                                                 0.0                                          
142437                                                0.0                                          
21819                                                 0.0                                          
...                                                   ...                                          
110268                                                0.0                                          
119879                                                0.0                                          
103694                                                0.0                                          
131932                                                0.0                                          
121958                                                0.0                                          

        ethnicity_race_sex_not hispanic or latino_white_female  \
93790                                                 0.0        
3731                                                  0.0        
58545                                                 0.0        
142437                                                1.0        
21819                                                 0.0        
...                                                   ...        
110268                                                0.0        
119879                                                0.0        
103694                                                0.0        
131932                                                0.0        
121958                                                0.0        

        ethnicity_race_sex_not hispanic or latino_white_male  
93790                                                 1.0     
3731                                                  0.0     
58545                                                 1.0     
142437                                                0.0     
21819                                                 0.0     
...                                                   ...     
110268                                                0.0     
119879                                                1.0     
103694                                                0.0     
131932                                                0.0     
121958                                                1.0     

[102702 rows x 43 columns]

print(train_df.isnull().sum())

applicant_income_000s                                                                         0
hud_median_family_income                                                                      0
hud_median_family_income_missing                                                              0
loan_amount_000s                                                                              0
loan_to_income_ratio                                                                          0
minority_population                                                                           0
minority_population_missing                                                                   0
tract_to_msamd_income                                                                         0
tract_to_msamd_income_missing                                                                 0
action_taken_binary                                                                           0
loan_type_name_Conventional                                                                   0
loan_type_name_FHA-insured                                                                    0
loan_type_name_FSA/RHS-guaranteed                                                             0
loan_type_name_VA-guaranteed                                                                  0
loan_purpose_name_Home improvement                                                            0
loan_purpose_name_Home purchase                                                               0
loan_purpose_name_Refinancing                                                                 0
property_type_name_Manufactured housing                                                       0
property_type_name_Multifamily dwelling                                                       0
property_type_name_One-to-four family dwelling (other than manufactured housing)              0
lien_status_name_Not secured by a lien                                                        0
lien_status_name_Secured by a first lien                                                      0
lien_status_name_Secured by a subordinate lien                                                0
ethnicity_race_sex_hispanic or latino_american indian or alaska native_female                 0
ethnicity_race_sex_hispanic or latino_american indian or alaska native_male                   0
ethnicity_race_sex_hispanic or latino_asian_female                                            0
ethnicity_race_sex_hispanic or latino_asian_male                                              0
ethnicity_race_sex_hispanic or latino_black or african american_female                        0
ethnicity_race_sex_hispanic or latino_black or african american_male                          0
ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female        0
ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male          0
ethnicity_race_sex_hispanic or latino_white_female                                            0
ethnicity_race_sex_hispanic or latino_white_male                                              0
ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female             0
ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male               0
ethnicity_race_sex_not hispanic or latino_asian_female                                        0
ethnicity_race_sex_not hispanic or latino_asian_male                                          0
ethnicity_race_sex_not hispanic or latino_black or african american_female                    0
ethnicity_race_sex_not hispanic or latino_black or african american_male                      0
ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female    0
ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male      0
ethnicity_race_sex_not hispanic or latino_white_female                                        0
ethnicity_race_sex_not hispanic or latino_white_male                                          0
dtype: int64

features =[    'action_taken_binary',
 #               'applicant_income_000s',
                'hud_median_family_income',
                'hud_median_family_income_missing',
#                'loan_amount_000s',
                'loan_to_income_ratio',
                'minority_population',
                'minority_population_missing',
                'tract_to_msamd_income',
                'tract_to_msamd_income_missing',
                'loan_type_name_Conventional',
                'loan_type_name_FHA-insured',
                'loan_type_name_FSA/RHS-guaranteed',
                'loan_type_name_VA-guaranteed',
                'loan_purpose_name_Home improvement',
                'loan_purpose_name_Home purchase',
                'loan_purpose_name_Refinancing',
                'property_type_name_Manufactured housing',
                'property_type_name_Multifamily dwelling',
                'property_type_name_One-to-four family dwelling (other than manufactured housing)',
                'lien_status_name_Not secured by a lien',
                'lien_status_name_Secured by a first lien',
                'lien_status_name_Secured by a subordinate lien',
                "ethnicity_race_sex_hispanic or latino_american indian or alaska native_female",
                "ethnicity_race_sex_hispanic or latino_american indian or alaska native_male",
                "ethnicity_race_sex_hispanic or latino_asian_female",
                "ethnicity_race_sex_hispanic or latino_asian_male",
                "ethnicity_race_sex_hispanic or latino_black or african american_female",
                "ethnicity_race_sex_hispanic or latino_black or african american_male",
                "ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female",
                "ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male",
                "ethnicity_race_sex_hispanic or latino_white_female",
                "ethnicity_race_sex_hispanic or latino_white_male",
                "ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female",
                "ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male",
                "ethnicity_race_sex_not hispanic or latino_asian_female",
                "ethnicity_race_sex_not hispanic or latino_asian_male",
                "ethnicity_race_sex_not hispanic or latino_black or african american_female",
                "ethnicity_race_sex_not hispanic or latino_black or african american_male",
                "ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female",
                "ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male",
                "ethnicity_race_sex_not hispanic or latino_white_female",
                'ethnicity_race_sex_not hispanic or latino_white_male'
           ]


train_df = train_df[features]
test_df = test_df[features]

# Train set
X_train = train_df.drop('action_taken_binary', axis=1)
y_train = train_df['action_taken_binary']

# Test Set
X_test = test_df.drop('action_taken_binary', axis=1)
y_test = test_df['action_taken_binary']

# Checking the shapes of the resulting splits
print(f'X_train shape: {X_train.shape}')
print(f'y_train shape: {y_train.shape}')
print(f'X_test shape: {X_test.shape}')
print(f'y_test shape: {y_test.shape}')

X_train shape: (102702, 40)
y_train shape: (102702,)
X_test shape: (44016, 40)
y_test shape: (44016,)

print(train_df['ethnicity_race_sex_not hispanic or latino_white_male'])

93790     1.0
3731      0.0
58545     1.0
142437    0.0
21819     0.0
         ... 
110268    0.0
119879    1.0
103694    0.0
131932    0.0
121958    1.0
Name: ethnicity_race_sex_not hispanic or latino_white_male, Length: 102702, dtype: float64

# Define variables:
protected_attribute_names=[
                "ethnicity_race_sex_hispanic or latino_american indian or alaska native_female",
                "ethnicity_race_sex_hispanic or latino_american indian or alaska native_male",
                "ethnicity_race_sex_hispanic or latino_asian_female",
                "ethnicity_race_sex_hispanic or latino_asian_male",
                "ethnicity_race_sex_hispanic or latino_black or african american_female",
                "ethnicity_race_sex_hispanic or latino_black or african american_male",
                "ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female",
                "ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male",
                "ethnicity_race_sex_hispanic or latino_white_female",
                "ethnicity_race_sex_hispanic or latino_white_male",
                "ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female",
                "ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male",
                "ethnicity_race_sex_not hispanic or latino_asian_female",
                "ethnicity_race_sex_not hispanic or latino_asian_male",
                "ethnicity_race_sex_not hispanic or latino_black or african american_female",
                "ethnicity_race_sex_not hispanic or latino_black or african american_male",
                "ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female",
                "ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male",
                "ethnicity_race_sex_not hispanic or latino_white_female",
                'ethnicity_race_sex_not hispanic or latino_white_male' #notice we inlcude the privilege group
                            ]

favorable_label = 0  # loan approved
unfavorable_label = 1  # loan denied

# First, we create the dataset
aif_dataset = BinaryLabelDataset(
    df=train_df,
    label_names=['action_taken_binary'],
    protected_attribute_names=protected_attribute_names,
    favorable_label = favorable_label,  # loan approved
    unfavorable_label = unfavorable_label  # loan denied

                                )

#https://www.rdocumentation.org/packages/aif360/versions/0.1.0/topics/aif_dataset
#https://aif360.readthedocs.io/en/latest/
#

# Defining privileged group directly
privileged_groups = [{'ethnicity_race_sex_not hispanic or latino_white_male': 1}]

# Defining unprivileged groups using a loop
unprivileged_groups = []
for attribute in protected_attribute_names:
    if attribute != 'ethnicity_race_sex_not hispanic or latino_white_male':
        unprivileged_groups.append({attribute: 1})

# Checking the groups
print("Privileged group:", privileged_groups)
print("Number of unprivileged groups:", len(unprivileged_groups))
print("First few unprivileged groups:", unprivileged_groups[:3])

Privileged group: [{'ethnicity_race_sex_not hispanic or latino_white_male': 1}]
Number of unprivileged groups: 19
First few unprivileged groups: [{'ethnicity_race_sex_hispanic or latino_american indian or alaska native_female': 1}, {'ethnicity_race_sex_hispanic or latino_american indian or alaska native_male': 1}, {'ethnicity_race_sex_hispanic or latino_asian_female': 1}]

# Calculating metrics
metric = BinaryLabelDatasetMetric(aif_dataset,
                                  unprivileged_groups=unprivileged_groups,
                                  privileged_groups=privileged_groups)

# Printing metrics
print(f"Disparate Impact: {metric.disparate_impact():.4f}")
print(f"Statistical Parity Difference: {metric.statistical_parity_difference():.4f}")

# We calculate and print the mean difference in label predictions
print(f"Mean difference in label predictions: {metric.mean_difference():.4f}")

# Calculate group-specific metrics
for group in unprivileged_groups:
    group_metric = BinaryLabelDatasetMetric(aif_dataset,
                                            unprivileged_groups=[group],
                                            privileged_groups=privileged_groups)

    group_name = list(group.keys())[0]
    print(f"\nGroup: {group_name}")
    print(f"Disparate Impact: {group_metric.disparate_impact():.4f}")
    print(f"Statistical Parity Difference: {group_metric.statistical_parity_difference():.4f}")

Disparate Impact: 0.9534
Statistical Parity Difference: -0.0380
Mean difference in label predictions: -0.0380

Group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_female
Disparate Impact: 0.6647
Statistical Parity Difference: -0.2732

Group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_male
Disparate Impact: 0.5467
Statistical Parity Difference: -0.3694

Group: ethnicity_race_sex_hispanic or latino_asian_female
Disparate Impact: 0.8863
Statistical Parity Difference: -0.0927

Group: ethnicity_race_sex_hispanic or latino_asian_male
Disparate Impact: 0.7809
Statistical Parity Difference: -0.1785

Group: ethnicity_race_sex_hispanic or latino_black or african american_female
Disparate Impact: 0.7363
Statistical Parity Difference: -0.2149

Group: ethnicity_race_sex_hispanic or latino_black or african american_male
Disparate Impact: 0.7181
Statistical Parity Difference: -0.2297

Group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female
Disparate Impact: 0.5137
Statistical Parity Difference: -0.3963

Group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male
Disparate Impact: 0.5764
Statistical Parity Difference: -0.3452

Group: ethnicity_race_sex_hispanic or latino_white_female
Disparate Impact: 0.9077
Statistical Parity Difference: -0.0753

Group: ethnicity_race_sex_hispanic or latino_white_male
Disparate Impact: 0.9181
Statistical Parity Difference: -0.0667

Group: ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female
Disparate Impact: 0.8057
Statistical Parity Difference: -0.1584

Group: ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male
Disparate Impact: 0.7504
Statistical Parity Difference: -0.2034

Group: ethnicity_race_sex_not hispanic or latino_asian_female
Disparate Impact: 1.0054
Statistical Parity Difference: 0.0044

Group: ethnicity_race_sex_not hispanic or latino_asian_male
Disparate Impact: 0.9994
Statistical Parity Difference: -0.0005

Group: ethnicity_race_sex_not hispanic or latino_black or african american_female
Disparate Impact: 0.8262
Statistical Parity Difference: -0.1416

Group: ethnicity_race_sex_not hispanic or latino_black or african american_male
Disparate Impact: 0.8220
Statistical Parity Difference: -0.1451

Group: ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female
Disparate Impact: 0.8331
Statistical Parity Difference: -0.1360

Group: ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male
Disparate Impact: 0.8433
Statistical Parity Difference: -0.1277

Group: ethnicity_race_sex_not hispanic or latino_white_female
Disparate Impact: 0.9999
Statistical Parity Difference: -0.0001

#Initializing SMOTE
smote = SMOTE(random_state=42)

# Application of SMOTE only on the training set to balance the classes
X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)

# Checking the distribution of the target variable after SMOTE
print("Before SMOTE:", y_train.value_counts())
print("\n After SMOTE:", y_train_smote.value_counts())

Before SMOTE: action_taken_binary
0    81489
1    21213
Name: count, dtype: int64

 After SMOTE: action_taken_binary
0    81489
1    81489
Name: count, dtype: int64

# Converting the resampled X_train and y_train into a DataFrame
X_train_s = pd.DataFrame(X_train_smote, columns=X_train.columns)  # We retained original column names
y_train_s = pd.DataFrame(y_train_smote, columns=['action_taken_binary'])

# Combining X and y into a single DataFrame
train_df_smote = pd.concat([X_train_s, y_train_s], axis=1)

# Defining variables:
protected_attribute_names=[
                "ethnicity_race_sex_hispanic or latino_american indian or alaska native_female",
                "ethnicity_race_sex_hispanic or latino_american indian or alaska native_male",
                "ethnicity_race_sex_hispanic or latino_asian_female",
                "ethnicity_race_sex_hispanic or latino_asian_male",
                "ethnicity_race_sex_hispanic or latino_black or african american_female",
                "ethnicity_race_sex_hispanic or latino_black or african american_male",
                "ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female",
                "ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male",
                "ethnicity_race_sex_hispanic or latino_white_female",
                "ethnicity_race_sex_hispanic or latino_white_male",
                "ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female",
                "ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male",
                "ethnicity_race_sex_not hispanic or latino_asian_female",
                "ethnicity_race_sex_not hispanic or latino_asian_male",
                "ethnicity_race_sex_not hispanic or latino_black or african american_female",
                "ethnicity_race_sex_not hispanic or latino_black or african american_male",
                "ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female",
                "ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male",
                "ethnicity_race_sex_not hispanic or latino_white_female",
                'ethnicity_race_sex_not hispanic or latino_white_male' #notice we inlcude the privilege group
                            ]

favorable_label = 0  # loan approved
unfavorable_label = 1  # loan denied

# Creatimng the dataset
aif_dataset = BinaryLabelDataset(
    df=train_df_smote,
    label_names=['action_taken_binary'],
    protected_attribute_names=protected_attribute_names,
    favorable_label = favorable_label,  # loan approved
    unfavorable_label = unfavorable_label  # loan denied

                                )

# Definining privileged group directly
privileged_groups = [{'ethnicity_race_sex_not hispanic or latino_white_male': 1}]

# Defining unprivileged groups using a loop
unprivileged_groups = []
for attribute in protected_attribute_names:
    if attribute != 'ethnicity_race_sex_not hispanic or latino_white_male':
        unprivileged_groups.append({attribute: 1})

# Checking the groups for verification
print("Privileged group:", privileged_groups)
print("Number of unprivileged groups:", len(unprivileged_groups))
print("First few unprivileged groups:", unprivileged_groups[:3])

Privileged group: [{'ethnicity_race_sex_not hispanic or latino_white_male': 1}]
Number of unprivileged groups: 19
First few unprivileged groups: [{'ethnicity_race_sex_hispanic or latino_american indian or alaska native_female': 1}, {'ethnicity_race_sex_hispanic or latino_american indian or alaska native_male': 1}, {'ethnicity_race_sex_hispanic or latino_asian_female': 1}]

# Calculating metrics
metric = BinaryLabelDatasetMetric(aif_dataset,
                                  unprivileged_groups=unprivileged_groups,
                                  privileged_groups=privileged_groups)

# Printing metrics
print(f"Disparate Impact: {metric.disparate_impact():.4f}")
print(f"Statistical Parity Difference: {metric.statistical_parity_difference():.4f}")

# We calculate and print the mean difference in label predictions
print(f"Mean difference in label predictions: {metric.mean_difference():.4f}")

# Calculating group-specific metrics
for group in unprivileged_groups:
    group_metric = BinaryLabelDatasetMetric(aif_dataset,
                                            unprivileged_groups=[group],
                                            privileged_groups=privileged_groups)

    group_name = list(group.keys())[0]
    print(f"\nGroup: {group_name}")
    print(f"Disparate Impact: {group_metric.disparate_impact():.4f}")
    print(f"Statistical Parity Difference: {group_metric.statistical_parity_difference():.4f}")

Disparate Impact: 0.9054
Statistical Parity Difference: -0.0506
Mean difference in label predictions: -0.0506

Group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_female
Disparate Impact: 0.9000
Statistical Parity Difference: -0.0535

Group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_male
Disparate Impact: 0.4300
Statistical Parity Difference: -0.3049

Group: ethnicity_race_sex_hispanic or latino_asian_female
Disparate Impact: 1.3501
Statistical Parity Difference: 0.1873

Group: ethnicity_race_sex_hispanic or latino_asian_male
Disparate Impact: 1.1136
Statistical Parity Difference: 0.0608

Group: ethnicity_race_sex_hispanic or latino_black or african american_female
Disparate Impact: 0.6992
Statistical Parity Difference: -0.1609

Group: ethnicity_race_sex_hispanic or latino_black or african american_male
Disparate Impact: 0.6863
Statistical Parity Difference: -0.1678

Group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female
Disparate Impact: 0.4948
Statistical Parity Difference: -0.2702

Group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male
Disparate Impact: 0.6231
Statistical Parity Difference: -0.2016

Group: ethnicity_race_sex_hispanic or latino_white_female
Disparate Impact: 0.8179
Statistical Parity Difference: -0.0974

Group: ethnicity_race_sex_hispanic or latino_white_male
Disparate Impact: 0.8335
Statistical Parity Difference: -0.0891

Group: ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female
Disparate Impact: 0.8476
Statistical Parity Difference: -0.0815

Group: ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male
Disparate Impact: 0.7295
Statistical Parity Difference: -0.1447

Group: ethnicity_race_sex_not hispanic or latino_asian_female
Disparate Impact: 1.0254
Statistical Parity Difference: 0.0136

Group: ethnicity_race_sex_not hispanic or latino_asian_male
Disparate Impact: 1.0085
Statistical Parity Difference: 0.0045

Group: ethnicity_race_sex_not hispanic or latino_black or african american_female
Disparate Impact: 0.6568
Statistical Parity Difference: -0.1836

Group: ethnicity_race_sex_not hispanic or latino_black or african american_male
Disparate Impact: 0.6555
Statistical Parity Difference: -0.1843

Group: ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female
Disparate Impact: 1.0247
Statistical Parity Difference: 0.0132

Group: ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male
Disparate Impact: 0.9997
Statistical Parity Difference: -0.0002

Group: ethnicity_race_sex_not hispanic or latino_white_female
Disparate Impact: 0.9981
Statistical Parity Difference: -0.0010

# AIF360 dataset
aif_dataset = BinaryLabelDataset(
    df=train_df_smote,  # Using the SMOTE Dataset
    label_names=['action_taken_binary'],
    protected_attribute_names=protected_attribute_names,
    favorable_label=favorable_label,  # loan approved
    unfavorable_label=unfavorable_label  # loan denied
)

# Defining the privileged and unprivileged groups
privileged_groups = [{'ethnicity_race_sex_not hispanic or latino_white_male': 1}]
unprivileged_groups = []
for attribute in protected_attribute_names:
    if attribute != 'ethnicity_race_sex_not hispanic or latino_white_male':
        unprivileged_groups.append({attribute: 1})

# Applying the Reweighing algorithm
RW = Reweighing(unprivileged_groups=unprivileged_groups,
                privileged_groups=privileged_groups)

# Fitting the "reweighing model". Transforming the dataset
reweighted_dataset = RW.fit_transform(aif_dataset)

# Turning the reweighted data back to pandas to use with our logistic regression model
reweighted_df = reweighted_dataset.convert_to_dataframe()[0]

# Re-defining variables
X_train_reweighted = reweighted_df.drop(columns=['action_taken_binary'])
y_train_reweighted = reweighted_df['action_taken_binary']

# Defining the parameter grid for tuning
param_grid = {
    'n_estimators': [100, 200, 300, 500],
    'learning_rate': [0.01, 0.05, 0.1],
    'num_leaves': [31, 62, 127],
    'max_depth': [-1, 5, 10, 15],
    'min_child_samples': [20, 50, 100],
    'subsample': [0.6, 0.8, 1.0],
    'colsample_bytree': [0.6, 0.8, 1.0],
    'reg_alpha': [0, 0.1, 0.5],
    'reg_lambda': [0, 0.1, 0.5]
}

# Setting the lgbm_model and randomizedSearch
lgbm_model = lgbm.LGBMClassifier(random_state=42)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

# We use joblib parallel backend for threading since we got our previous code stuck
with joblib.parallel_backend('threading'):  # This line is added to avoid JAX threading issues
    random_search = RandomizedSearchCV(
        estimator=lgbm_model,
        param_distributions=param_grid,
        n_iter=50,
        scoring='roc_auc',
        cv=skf,
        verbose=2,
        random_state=42,
        n_jobs=-1
    )

    # Fitting the randomized search
    random_search.fit(X_train_reweighted, y_train_reweighted)

# Getting the best lgbm_model and parameters
best_model = random_search.best_estimator_
best_params = random_search.best_params_
print("Best parameters:", best_params)

# Making predictions
y_pred = best_model.predict(X_test)
y_pred_proba = best_model.predict_proba(X_test)[:, 1]

Fitting 5 folds for each of 50 candidates, totalling 250 fits
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=  11.6s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=  11.8s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=  11.4s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=  11.6s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   9.1s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=300, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=1.0; total time=  11.4s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=300, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=1.0; total time=  11.9s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=300, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=1.0; total time=  11.2s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=300, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=1.0; total time=  13.4s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=300, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=1.0; total time=  14.4s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  14.6s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  13.1s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  13.3s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  13.1s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   6.0s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  11.8s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   8.4s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   7.4s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   6.1s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   2.3s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   6.1s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   3.3s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   4.2s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   3.6s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   2.9s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=   8.4s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=   8.2s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=  10.7s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=  10.4s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=0.6; total time=   7.7s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=   9.8s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=0.6; total time=   8.7s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=0.6; total time=   7.9s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=0.6; total time=   7.9s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=0.6; total time=   8.4s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=   4.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=   3.9s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=   3.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=   2.9s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0, subsample=1.0; total time=   3.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=0.8; total time=   7.8s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=0.8; total time=   9.4s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=0.8; total time=   8.3s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=0.8; total time=   6.8s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=0.8; total time=   8.2s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=  10.8s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=   9.5s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=   9.2s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=  10.5s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=   9.8s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   4.7s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   5.7s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   5.7s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   6.5s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   5.9s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.6; total time=   3.8s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.6; total time=   3.8s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.6; total time=   3.8s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.6; total time=   3.7s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=1.0; total time=   3.8s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.6; total time=   6.2s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=1.0; total time=   4.3s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=1.0; total time=   4.3s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=1.0; total time=   4.2s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=1.0; total time=   3.3s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   6.7s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   7.7s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   6.5s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   5.6s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=300, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=   5.5s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=127, reg_alpha=0.5, reg_lambda=0.5, subsample=0.6; total time=  12.5s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=127, reg_alpha=0.5, reg_lambda=0.5, subsample=0.6; total time=  12.2s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=127, reg_alpha=0.5, reg_lambda=0.5, subsample=0.6; total time=  11.0s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=127, reg_alpha=0.5, reg_lambda=0.5, subsample=0.6; total time=  12.6s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=10, min_child_samples=50, n_estimators=300, num_leaves=127, reg_alpha=0.5, reg_lambda=0.5, subsample=0.6; total time=  10.9s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.1, subsample=0.6; total time=   9.1s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.1, subsample=0.6; total time=  11.1s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.1, subsample=0.6; total time=  10.8s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.1, subsample=0.6; total time=   9.6s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=500, num_leaves=31, reg_alpha=0.5, reg_lambda=0.1, subsample=0.6; total time=  11.3s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0.5, subsample=1.0; total time=   5.5s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0.5, subsample=1.0; total time=   4.0s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0.5, subsample=1.0; total time=   4.1s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0.5, subsample=1.0; total time=   4.5s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0.5, subsample=1.0; total time=   4.2s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   9.4s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   9.5s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   7.1s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   7.7s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=200, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   9.4s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  13.4s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  14.3s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  14.4s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  14.5s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  14.3s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   8.2s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=  10.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   9.7s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   8.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   9.2s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=300, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=  12.7s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=300, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=  11.3s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=300, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=  12.5s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=300, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=  12.6s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   2.8s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   3.2s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=300, num_leaves=62, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=  11.8s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   4.6s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   3.4s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   2.8s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=  14.3s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=  14.4s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=  13.7s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=  14.3s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=  13.3s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  13.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  14.4s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  13.7s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  14.9s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=  14.5s
[CV] END colsample_bytree=1.0, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   6.8s
[CV] END colsample_bytree=1.0, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   7.2s
[CV] END colsample_bytree=1.0, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   4.6s
[CV] END colsample_bytree=1.0, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   5.1s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   4.0s
[CV] END colsample_bytree=1.0, learning_rate=0.1, max_depth=-1, min_child_samples=50, n_estimators=200, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   4.9s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   6.5s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   6.0s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   4.0s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   4.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  26.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  26.1s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  26.9s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  27.5s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   5.0s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   3.2s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   3.2s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   3.2s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=   5.4s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  25.9s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=  11.0s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=  12.9s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=  12.9s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=  12.1s
[CV] END colsample_bytree=1.0, learning_rate=0.05, max_depth=15, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=  13.1s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=  14.6s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=  14.2s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=  13.3s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=  14.0s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   3.4s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   3.4s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0.1, subsample=0.6; total time=  14.2s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   4.2s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   4.8s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=10, min_child_samples=50, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   5.0s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  12.5s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  11.9s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  14.0s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  12.3s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   4.1s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   2.9s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   2.5s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  13.5s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   2.5s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   2.6s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=-1, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=   3.1s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=-1, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=   4.3s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=-1, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=   5.4s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=-1, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=   3.9s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=-1, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0, subsample=0.6; total time=   3.1s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   3.0s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   2.9s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   3.0s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   2.9s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=50, n_estimators=100, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=0.6; total time=   3.1s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=  14.0s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=  13.7s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=  14.3s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=  14.3s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   3.9s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   4.6s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   3.2s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=31, reg_alpha=0, reg_lambda=0, subsample=1.0; total time=  14.5s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   3.2s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=100, num_leaves=31, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   3.1s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=  11.0s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=  11.0s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   8.6s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=   9.2s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=  10.4s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=20, n_estimators=200, num_leaves=127, reg_alpha=0.5, reg_lambda=0, subsample=1.0; total time=  12.0s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=  11.9s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=  11.9s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=   9.7s
[CV] END colsample_bytree=0.8, learning_rate=0.05, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.1, subsample=1.0; total time=  10.6s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=  17.6s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=  18.0s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=  17.5s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=  19.3s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=10, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=  18.8s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=15, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  20.1s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=15, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  18.5s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=15, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  17.9s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=15, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  18.5s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=15, min_child_samples=20, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  19.6s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  14.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  13.9s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  14.8s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  13.6s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   6.4s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   4.2s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0, subsample=0.8; total time=  14.2s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   4.3s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   4.9s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0, reg_lambda=0, subsample=0.8; total time=   6.4s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  19.1s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  18.4s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  18.1s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  17.7s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   2.6s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   2.6s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   3.9s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   3.3s
[CV] END colsample_bytree=0.8, learning_rate=0.1, max_depth=-1, min_child_samples=20, n_estimators=100, num_leaves=31, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=   2.6s
[CV] END colsample_bytree=1.0, learning_rate=0.01, max_depth=-1, min_child_samples=50, n_estimators=500, num_leaves=62, reg_alpha=0.1, reg_lambda=0.5, subsample=1.0; total time=  17.9s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=0.8; total time=   3.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=0.8; total time=   3.0s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=0.8; total time=   2.8s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=0.8; total time=   3.1s
[CV] END colsample_bytree=0.6, learning_rate=0.01, max_depth=5, min_child_samples=20, n_estimators=100, num_leaves=62, reg_alpha=0.5, reg_lambda=0.1, subsample=0.8; total time=   4.4s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  23.6s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  21.5s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  21.3s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  21.0s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=   6.7s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=   4.7s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=   4.7s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=15, min_child_samples=100, n_estimators=500, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.8; total time=  24.0s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=   7.0s
[CV] END colsample_bytree=0.8, learning_rate=0.01, max_depth=5, min_child_samples=100, n_estimators=200, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=0.8; total time=   4.5s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=1.0; total time=  10.7s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=1.0; total time=  12.1s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=1.0; total time=  12.4s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=1.0; total time=  10.7s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=15, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   6.0s
[CV] END colsample_bytree=0.6, learning_rate=0.1, max_depth=5, min_child_samples=100, n_estimators=500, num_leaves=62, reg_alpha=0.5, reg_lambda=0.5, subsample=1.0; total time=  12.3s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=15, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   4.3s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=15, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   4.3s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=15, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   4.4s
[CV] END colsample_bytree=0.6, learning_rate=0.05, max_depth=15, min_child_samples=20, n_estimators=100, num_leaves=127, reg_alpha=0.1, reg_lambda=0.5, subsample=0.6; total time=   4.4s
[LightGBM] [Warning] Found whitespace in feature_names, replace with underlines
[LightGBM] [Info] Number of positive: 81489, number of negative: 81489
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.064936 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 3976
[LightGBM] [Info] Number of data points in the train set: 162978, number of used features: 38
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.500000 -> initscore=0.000000
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
Best parameters: {'subsample': 0.8, 'reg_lambda': 0.5, 'reg_alpha': 0.1, 'num_leaves': 127, 'n_estimators': 500, 'min_child_samples': 100, 'max_depth': 15, 'learning_rate': 0.1, 'colsample_bytree': 0.6}

# Model performance evaluation:

# Accuracy
accuracy = accuracy_score(y_test, y_pred)

# Precision
precision = precision_score(y_test, y_pred)

# Recall
recall = recall_score(y_test, y_pred)

# F1 Score
f1 = f1_score(y_test, y_pred)

# Calculate ROC-AUC
roc_auc = roc_auc_score(y_test, y_pred_proba)

# Negative log loss
neg_log_loss = log_loss(y_test, y_pred_proba)

# Results
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
print(f"ROC-AUC: {roc_auc:.4f}")
print(f"Negative Log Loss: {neg_log_loss:.4f}")

Accuracy: 0.7421
Precision: 0.3914
Recall: 0.4465
F1 Score: 0.4172
ROC-AUC: 0.7008
Negative Log Loss: 0.5332

# For each group we will print in descendent order the different metrics

# Adding the predicted values to a new test set for evaluation
X_test_with_pred = X_test.copy()
X_test_with_pred['y_test'] = y_test  # Columns for y_test and y_pred.
X_test_with_pred['y_pred'] = y_pred

# Predicted probabilities for class 1 (default)
X_test_with_pred['y_pred_proba'] = best_model.predict_proba(X_test)[:, 1]

# List of intersectional group columns
ethnicity_race_sex_cols = [col for col in X_test_with_pred.columns if col.startswith('ethnicity_race_sex')]

# Creation of a column where each row represents an intersectional group
X_test_with_pred['intersectional_group'] = X_test_with_pred[ethnicity_race_sex_cols].idxmax(axis=1)

# Groupping by the intersectional group to evaluate the model performance for each intersectional group
grouped = X_test_with_pred.groupby('intersectional_group')

# Evaluation of the metrics for each group
for group_name, group_data in grouped:
    accuracy = accuracy_score(group_data['y_test'], group_data['y_pred'])
    precision = precision_score(group_data['y_test'], group_data['y_pred'], zero_division=0)
    recall = recall_score(group_data['y_test'], group_data['y_pred'], zero_division=0)
    f1 = f1_score(group_data['y_test'], group_data['y_pred'])
    roc_auc = roc_auc_score(group_data['y_test'], group_data['y_pred_proba'])
    neg_log_loss = log_loss(group_data['y_test'], group_data['y_pred_proba'])

    # Showing results
    print(f"# Group: {group_name}")
    print(f"  Accuracy: {accuracy:.4f}")
    print(f"  Precision: {precision:.4f}")
    print(f"  Recall: {recall:.4f}")
    print(f"  F1 Score: {f1:.4f}")
    print(f"  ROC-AUC: {roc_auc:.4f}")
    print(f"  Negative Log Loss: {neg_log_loss:.4f}\n")

# Group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_female
  Accuracy: 0.6897
  Precision: 0.6875
  Recall: 0.7333
  F1 Score: 0.7097
  ROC-AUC: 0.8190
  Negative Log Loss: 0.5684

# Group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_male
  Accuracy: 0.6585
  Precision: 0.5926
  Recall: 0.8421
  F1 Score: 0.6957
  ROC-AUC: 0.8038
  Negative Log Loss: 0.6024

# Group: ethnicity_race_sex_hispanic or latino_asian_female
  Accuracy: 0.8235
  Precision: 0.6667
  Recall: 0.8000
  F1 Score: 0.7273
  ROC-AUC: 0.8000
  Negative Log Loss: 0.5010

# Group: ethnicity_race_sex_hispanic or latino_asian_male
  Accuracy: 0.4815
  Precision: 0.2222
  Recall: 0.2222
  F1 Score: 0.2222
  ROC-AUC: 0.4444
  Negative Log Loss: 0.7215

# Group: ethnicity_race_sex_hispanic or latino_black or african american_female
  Accuracy: 0.6167
  Precision: 0.4821
  Recall: 0.6136
  F1 Score: 0.5400
  ROC-AUC: 0.6932
  Negative Log Loss: 0.7674

# Group: ethnicity_race_sex_hispanic or latino_black or african american_male
  Accuracy: 0.7228
  Precision: 0.6271
  Recall: 0.8605
  F1 Score: 0.7255
  ROC-AUC: 0.8360
  Negative Log Loss: 0.5922

# Group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female
  Accuracy: 0.8667
  Precision: 0.7143
  Recall: 1.0000
  F1 Score: 0.8333
  ROC-AUC: 0.8600
  Negative Log Loss: 0.5357

# Group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male
  Accuracy: 0.5625
  Precision: 0.6000
  Recall: 0.7895
  F1 Score: 0.6818
  ROC-AUC: 0.6518
  Negative Log Loss: 0.7241

# Group: ethnicity_race_sex_hispanic or latino_white_female
  Accuracy: 0.6888
  Precision: 0.4000
  Recall: 0.5641
  F1 Score: 0.4681
  ROC-AUC: 0.6906
  Negative Log Loss: 0.6104

# Group: ethnicity_race_sex_hispanic or latino_white_male
  Accuracy: 0.6948
  Precision: 0.3974
  Recall: 0.5578
  F1 Score: 0.4642
  ROC-AUC: 0.7020
  Negative Log Loss: 0.6046

# Group: ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female
  Accuracy: 0.7315
  Precision: 0.6889
  Recall: 0.6739
  F1 Score: 0.6813
  ROC-AUC: 0.7763
  Negative Log Loss: 0.5611

# Group: ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male
  Accuracy: 0.6525
  Precision: 0.4571
  Recall: 0.4211
  F1 Score: 0.4384
  ROC-AUC: 0.6322
  Negative Log Loss: 0.6571

# Group: ethnicity_race_sex_not hispanic or latino_asian_female
  Accuracy: 0.7576
  Precision: 0.3316
  Recall: 0.4235
  F1 Score: 0.3720
  ROC-AUC: 0.6509
  Negative Log Loss: 0.5311

# Group: ethnicity_race_sex_not hispanic or latino_asian_male
  Accuracy: 0.7572
  Precision: 0.3837
  Recall: 0.4219
  F1 Score: 0.4019
  ROC-AUC: 0.6914
  Negative Log Loss: 0.5279

# Group: ethnicity_race_sex_not hispanic or latino_black or african american_female
  Accuracy: 0.6140
  Precision: 0.4294
  Recall: 0.6771
  F1 Score: 0.5255
  ROC-AUC: 0.6788
  Negative Log Loss: 0.6956

# Group: ethnicity_race_sex_not hispanic or latino_black or african american_male
  Accuracy: 0.6205
  Precision: 0.4610
  Recall: 0.6549
  F1 Score: 0.5411
  ROC-AUC: 0.6841
  Negative Log Loss: 0.6689

# Group: ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female
  Accuracy: 0.6739
  Precision: 0.3077
  Recall: 0.4000
  F1 Score: 0.3478
  ROC-AUC: 0.6028
  Negative Log Loss: 0.5880

# Group: ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male
  Accuracy: 0.6667
  Precision: 0.6522
  Recall: 0.5769
  F1 Score: 0.6122
  ROC-AUC: 0.6663
  Negative Log Loss: 0.7676

# Group: ethnicity_race_sex_not hispanic or latino_white_female
  Accuracy: 0.7672
  Precision: 0.3787
  Recall: 0.3790
  F1 Score: 0.3789
  ROC-AUC: 0.6992
  Negative Log Loss: 0.4978

# Group: ethnicity_race_sex_not hispanic or latino_white_male
  Accuracy: 0.7597
  Precision: 0.3607
  Recall: 0.3736
  F1 Score: 0.3670
  ROC-AUC: 0.6889
  Negative Log Loss: 0.5089

# We got many empty rows due to index misalignment, so we need to reset index for both df:
# X_test and y_pred
X_test_reset = X_test.reset_index(drop=True)
y_pred_reset = pd.DataFrame(y_pred, columns=['action_taken_binary']).reset_index(drop=True)

# Concatenating the two DataFrames
lgbm_trained_df = pd.concat([X_test_reset, y_pred_reset], axis=1)

# Defining variables:
protected_attribute_names=[
                "ethnicity_race_sex_hispanic or latino_american indian or alaska native_female",
                "ethnicity_race_sex_hispanic or latino_american indian or alaska native_male",
                "ethnicity_race_sex_hispanic or latino_asian_female",
                "ethnicity_race_sex_hispanic or latino_asian_male",
                "ethnicity_race_sex_hispanic or latino_black or african american_female",
                "ethnicity_race_sex_hispanic or latino_black or african american_male",
                "ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female",
                "ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male",
                "ethnicity_race_sex_hispanic or latino_white_female",
                "ethnicity_race_sex_hispanic or latino_white_male",
                "ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female",
                "ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male",
                "ethnicity_race_sex_not hispanic or latino_asian_female",
                "ethnicity_race_sex_not hispanic or latino_asian_male",
                "ethnicity_race_sex_not hispanic or latino_black or african american_female",
                "ethnicity_race_sex_not hispanic or latino_black or african american_male",
                "ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female",
                "ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male",
                "ethnicity_race_sex_not hispanic or latino_white_female",
                'ethnicity_race_sex_not hispanic or latino_white_male' #notice we inlcude the privilege group
                            ]

favorable_label = 0  # loan approved
unfavorable_label = 1  # loan denied

# Creating the dataset
aif_dataset = BinaryLabelDataset(
    df=lgbm_trained_df,
    label_names=['action_taken_binary'],
    protected_attribute_names=protected_attribute_names,
    favorable_label = favorable_label,  # loan approved
    unfavorable_label = unfavorable_label  # loan denied

                                )

# Definition of the privileged group directly
privileged_groups = [{'ethnicity_race_sex_not hispanic or latino_white_male': 1}]

# Defining unprivileged groups using a loop
unprivileged_groups = []
for attribute in protected_attribute_names:
    if attribute != 'ethnicity_race_sex_not hispanic or latino_white_male':
        unprivileged_groups.append({attribute: 1})

# Checking groups
print("Privileged group:", privileged_groups)
print("Number of unprivileged groups:", len(unprivileged_groups))
print("First few unprivileged groups:", unprivileged_groups[:3])

Privileged group: [{'ethnicity_race_sex_not hispanic or latino_white_male': 1}]
Number of unprivileged groups: 19
First few unprivileged groups: [{'ethnicity_race_sex_hispanic or latino_american indian or alaska native_female': 1}, {'ethnicity_race_sex_hispanic or latino_american indian or alaska native_male': 1}, {'ethnicity_race_sex_hispanic or latino_asian_female': 1}]

# Calculating metrics
metric = BinaryLabelDatasetMetric(aif_dataset,
                                  unprivileged_groups=unprivileged_groups,
                                  privileged_groups=privileged_groups)

# Printing metrics
print(f"Disparate Impact: {metric.disparate_impact():.6f}")
print(f"Statistical Parity Difference: {metric.statistical_parity_difference():.6f}")

# We calculate and print the mean difference in label predictions
print(f"Mean difference in label predictions: {metric.mean_difference():.6f}")

# Calculating group-specific metrics
for group in unprivileged_groups:
    group_metric = BinaryLabelDatasetMetric(aif_dataset,
                                            unprivileged_groups=[group],
                                            privileged_groups=privileged_groups)

    group_name = list(group.keys())[0]
    print(f"\nGroup: {group_name}")
    print(f"Disparate Impact: {group_metric.disparate_impact():.4f}")
    print(f"Statistical Parity Difference: {group_metric.statistical_parity_difference():.4f}")

Disparate Impact: 0.905503
Statistical Parity Difference: -0.076246
Mean difference in label predictions: -0.076246

Group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_female
Disparate Impact: 0.5556
Statistical Parity Difference: -0.3586

Group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_male
Disparate Impact: 0.4232
Statistical Parity Difference: -0.4654

Group: ethnicity_race_sex_hispanic or latino_asian_female
Disparate Impact: 0.8020
Statistical Parity Difference: -0.1598

Group: ethnicity_race_sex_hispanic or latino_asian_male
Disparate Impact: 0.8263
Statistical Parity Difference: -0.1402

Group: ethnicity_race_sex_hispanic or latino_black or african american_female
Disparate Impact: 0.6610
Statistical Parity Difference: -0.2735

Group: ethnicity_race_sex_hispanic or latino_black or african american_male
Disparate Impact: 0.5154
Statistical Parity Difference: -0.3910

Group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female
Disparate Impact: 0.6610
Statistical Parity Difference: -0.2735

Group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male
Disparate Impact: 0.2711
Statistical Parity Difference: -0.5881

Group: ethnicity_race_sex_hispanic or latino_white_female
Disparate Impact: 0.8151
Statistical Parity Difference: -0.1492

Group: ethnicity_race_sex_hispanic or latino_white_male
Disparate Impact: 0.8271
Statistical Parity Difference: -0.1395

Group: ethnicity_race_sex_not hispanic or latino_american indian or alaska native_female
Disparate Impact: 0.7230
Statistical Parity Difference: -0.2235

Group: ethnicity_race_sex_not hispanic or latino_american indian or alaska native_male
Disparate Impact: 0.8718
Statistical Parity Difference: -0.1035

Group: ethnicity_race_sex_not hispanic or latino_asian_female
Disparate Impact: 0.9711
Statistical Parity Difference: -0.0233

Group: ethnicity_race_sex_not hispanic or latino_asian_male
Disparate Impact: 0.9759
Statistical Parity Difference: -0.0195

Group: ethnicity_race_sex_not hispanic or latino_black or african american_female
Disparate Impact: 0.6223
Statistical Parity Difference: -0.3047

Group: ethnicity_race_sex_not hispanic or latino_black or african american_male
Disparate Impact: 0.6380
Statistical Parity Difference: -0.2921

Group: ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_female
Disparate Impact: 0.8891
Statistical Parity Difference: -0.0895

Group: ethnicity_race_sex_not hispanic or latino_native hawaiian or other pacific islander_male
Disparate Impact: 0.7393
Statistical Parity Difference: -0.2104

Group: ethnicity_race_sex_not hispanic or latino_white_female
Disparate Impact: 1.0071
Statistical Parity Difference: 0.0057

# lgbm_model.fit(X_train_smote, y_train_smote)
# lgbm_model = lgb.LGBMClassifier(random_state=42)

# Creating SHAP explainer for LightGBM
explainer = shap.TreeExplainer(best_model, X_train_reweighted)

# Calculating SHAP values for the test set
shap_values = explainer.shap_values(X_test)

# Calculating mean absolute SHAP values for each feature
mean_abs_shap = np.abs(shap_values).mean(axis=0)

# Creating a DataFrame with feature names and mean absolute SHAP values
shap_importance = pd.DataFrame({
    'feature': X_test.columns,
    'importance': mean_abs_shap
})

# Sorting by importance
shap_importance_sorted = shap_importance.sort_values(by='importance', ascending=False)

100%|===================| 43998/44016 [26:42<00:00]

# Bar Plot for Top 20 Feature Importances
plt.figure(figsize=(12, 8))
sns.barplot(x='importance', y='feature', data=shap_importance_sorted.head(20))
plt.title('Top 20 Feature Importances (based on absolute SHAP values)')
plt.tight_layout()
plt.show()

# Summary Plot
plt.figure(figsize=(12, 8))
shap.summary_plot(shap_values, X_test, plot_type="bar")
plt.show()

# Detailed Summary Plot
plt.figure(figsize=(12, 8))
shap.summary_plot(shap_values, X_test)
plt.show()

/usr/local/lib/python3.10/dist-packages/shap/plots/_beeswarm.py:950: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
  pl.tight_layout()

/usr/local/lib/python3.10/dist-packages/shap/plots/_beeswarm.py:950: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
  pl.tight_layout()

# List of the intersectional group columns
ethnicity_race_sex_cols = [col for col in X_test_with_pred.columns if col.startswith('ethnicity_race_sex')]

# Creating a column that represents each intersectional group
X_test_with_pred['intersectional_group'] = X_test_with_pred[ethnicity_race_sex_cols].idxmax(axis=1)

# Reseting the index of X_test_with_pred and X_test to ensure they match shap_values (we had issues earlier without resetting)
X_test_with_pred = X_test_with_pred.reset_index(drop=True)
X_test = X_test.reset_index(drop=True)

# Grouping by intersectional group to evaluate SHAP values for each group
grouped = X_test_with_pred.groupby('intersectional_group')

# Loop through each intersectional group
for group_name, group_data in grouped:
    print(f"Generating SHAP Waterfall plot for group: {group_name}")

    # Getting the subset of the data for this intersectional group
    X_group = X_test.loc[group_data.index]

    # Creation of a subset for the SHAP values for this group
    shap_values_group = shap_values[group_data.index]  # This bit ensure that SHAP values match X_group

    # Picking a specific row to explain for the waterfall plot
    row_to_explain = group_data.index[0]

    # We generate the SHAP waterfall plot for a single prediction in this group
    shap.waterfall_plot(
        shap.Explanation(
            values=shap_values[row_to_explain],  # SHAP values for the specific row
            base_values=explainer.expected_value,  # Base value for the SHAP model
            data=X_test.iloc[row_to_explain, :],  # Input data for the specific row
            feature_names=X_test.columns  # Feature names
        )
    )

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_male

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_asian_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_asian_male

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_black or african american_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_black or african american_male

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_white_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_white_male

for group_name, group_data in grouped:
    print(f"Generating SHAP Summary plot for group: {group_name}")

    # Geting the subset of the data for this intersectional group
    X_group = X_test.loc[group_data.index]

    # Subsetting the SHAP values for this group
    shap_values_group = shap_values[group_data.index]

    # Generating the SHAP summary plot for the entire group
    shap.summary_plot(shap_values_group, X_group, feature_names=X_test.columns)

Generating SHAP Summary plot for group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_female

/usr/local/lib/python3.10/dist-packages/shap/plots/_beeswarm.py:950: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
  pl.tight_layout()

Generating SHAP Summary plot for group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_male

/usr/local/lib/python3.10/dist-packages/shap/plots/_beeswarm.py:950: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
  pl.tight_layout()

Generating SHAP Summary plot for group: ethnicity_race_sex_hispanic or latino_asian_female

/usr/local/lib/python3.10/dist-packages/shap/plots/_beeswarm.py:950: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
  pl.tight_layout()

Generating SHAP Summary plot for group: ethnicity_race_sex_hispanic or latino_asian_male

/usr/local/lib/python3.10/dist-packages/shap/plots/_beeswarm.py:950: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
  pl.tight_layout()

Generating SHAP Summary plot for group: ethnicity_race_sex_hispanic or latino_black or african american_female

/usr/local/lib/python3.10/dist-packages/shap/plots/_beeswarm.py:950: UserWarning: Tight layout not applied. The left and right margins cannot be made large enough to accommodate all axes decorations.
  pl.tight_layout()

# ALTERNATIVE TO Summary VALUES pere group SINCE SOME OF the charts were showing distorted display.
for group_name, group_data in grouped:
    print(f"Generating SHAP Custom Summary plot for group: {group_name}")

    # Getting the subset of the data for this intersectional group
    X_group = X_test.loc[group_data.index]

    # Subsetting the SHAP values for this group
    shap_values_group = shap_values[group_data.index]

    # Calculating mean absolute SHAP values for the group
    mean_abs_shap = np.abs(shap_values_group).mean(axis=0)

    # Creating a DataFrame with feature names and mean absolute SHAP values
    shap_importance = pd.DataFrame({
        'feature': X_test.columns,
        'importance': mean_abs_shap
    })

    # Sorting df by importance
    shap_importance_sorted = shap_importance.sort_values(by='importance', ascending=False)

    # Plotting the top 20 features
    plt.figure(figsize=(12, 8))
    sns.barplot(x='importance', y='feature', data=shap_importance_sorted.head(20))
    plt.title(f'Top 20 Feature Importances for Group: {group_name}')
    plt.tight_layout()
    plt.show()

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_female

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_male

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_asian_female

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_asian_male

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_black or african american_female

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_black or african american_male

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_white_female

Generating SHAP Custom Summary plot for group: ethnicity_race_sex_hispanic or latino_white_male

for group_name, group_data in grouped:
    print(f"Generating SHAP Waterfall plot for group: {group_name}")

    # Getting the subset of the data for this intersectional group
    X_group = X_test.loc[group_data.index]

    # Subsetting the SHAP values for this group
    shap_values_group = shap_values[group_data.index] # This bit ensure that SHAP values match X_group

    # Calculating the average SHAP values for the group
    avg_shap_values = shap_values_group.mean(axis=0)

    # Generating the SHAP waterfall plot for the average prediction Mind that is for the AVG!
    shap.waterfall_plot(
        shap.Explanation(
            values=avg_shap_values,  # Average SHAP values for the group
            base_values=explainer.expected_value,  # Base value for the SHAP model
            data=X_group.mean(axis=0),  # Average input data for the group
            feature_names=X_test.columns  # Feature names
        )
    )

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_american indian or alaska native_male

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_asian_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_asian_male

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_black or african american_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_black or african american_male

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_native hawaiian or other pacific islander_male

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_white_female

Generating SHAP Waterfall plot for group: ethnicity_race_sex_hispanic or latino_white_male

So far from original dataset:¶

A) Creation of Binary target¶

B) Splitting data¶

C) Feature Engineering¶

C.1) Checking for Skewness¶

Looking for skewness to whether apply either or both, Log transformation - scaling¶

C.2) Log Transformation¶

C.3) Scaling;¶

Standardization (Z-score normalization)¶

C.4) One-hot encoding¶

OneHotEncoder vs get_dummies:¶

¶

¶

D) Define features and target variable:¶

E) Bias measurement:¶

We measure biases BEFORE smote with AIF360. We use Disparate Impact as measurement.¶

Privileged class:¶

E.1) Dataset creation¶

E.2) Defining groups¶

E.3) Metrics¶

F) SMOTE¶

F.1) Bias AFTER SMOTE¶

F.1.1) First¶

F.1.2) Second¶

G) Reweighting¶

H) LightGBM¶

I) Bias measurement:¶

J) SHAP¶

J.1) Overall model assessment¶

J.2) WaterFall plot per group.¶

J.3) Summary per group¶

J.4) Summary per group (average)¶