Skip to content

Used to perform weights of evidence transformation to variables and create plots to assess distributions.

Notifications You must be signed in to change notification settings

VassMorozov/Weights-of-Evidence

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Weights of Evidence Transformation + Auto Grouping for Classification

The purpose of this repository is to allow for a weights of evidence transformation of variables in a Pandas dataframe when performing a single class classification task. This type of transformation is often used in credit scoring models however can be used in a variety of applications. A weights of evidence transformation is typically performed on discrete or categorical variables, however this repository is able to handle numeric variables by providing an auto-grouping functionality.

How to Use:

When working with only categorical variables, the following can be done:

import WeightsOfEvidence
import pandas as pd

df = pd.read_csv('...')
target = '...'
columns = [col for col in df.columns if col != target]

woe = WeightsOfEvidence(df, columns, target)

The transformed columns will then be available in woe.df and have a _woe suffix.

When numeric variables are available, grouping is performed. An example can be seen below:

woe = WeightsOfEvidence(df, columns, target, grouping_method='equal_bins')

This will then create transformed columns as previously mentioned but also produce numeric variables with a _grouped suffix in woe.df. Optimal splitting through decision tree classification can be performed using grouping_method='decision_tree'

Plotting Weights of Evidence Distributions

The weights of evidence for all columns can be plotted using woe.plot_woe() to produce something like the following: image

This example shows the plot for a categorical and numeric variable, where the numeric variable was grouped as previously described.

Notes

  • No missing values should be present in either the columns to transform or the target variable. This class expects missings to be treated prior to calling.
  • Future releases may incorporate additional grouping mechanisms.

About

Used to perform weights of evidence transformation to variables and create plots to assess distributions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages