Classification Models

Classification Model Overview

Classification models are those which predict a categorical dependent variable. We can further sub-divide these algorithms into the "two-class" and "multiclass" versions. Two-class algorithms are used when there are only two values of the dependent variable. For example, in our Bike Buyers example, predicting the words "Yes" and "No"--indicating whether or not they purchased a bike--would require a two-class algorithm whereas, in our Lending Tree example, predicting a loan status of "Fully Paid," "Current," "Grace Period," ... "Charged Off" would require a multiclass algorithm. We will review a sample of the following algorithms below:

  • Two-class logistic regression: fast to train, assumes linear model.

  • Two-class averaged perceptron: fast to train, assumes linear model.

  • Two-class Bayes point machine: fast to train, assumes linear model.

  • Two-class decision forest: accurate and fast to train.

  • Two-class decision jungle: accurate and small memory footprint.

  • Two-class boosted decision tree: accurate and fast to train with large memory footprint.

  • Two-class support vector machine: under 100 independent variables, assumes linear model.

  • Two-class locally deep support vector machine: must have less than 100 independent variables.

  • Two-class neural network: very accurate, long training times.

  • Multiclass logistic regression: fast to train, assumes linear model.

  • Multiclass neural network: very accurate, long training times.

  • Multiclass decision forest: accurate, fast training times.

  • Multiclass decision jungle: accurate, small memory footprint.

  • One-v-all multiclass: depends on the two-class classifier.

Two-Class Logistic Regression

Two-class Averaged Perceptron

Two-class Bayes Point Machine

Two-class Decision Forest

Two-Class Decision Jungle

Two-Class Boosted Decision Tree

Two-class Support Vector Machine

Two-Class Locally Deep Support Vector Machine

Two-Class Neural Network

Multiclass Models

Multiclass Logistic Regression

This works just the same as the two-class version.

Multiclass Neural Network

This works just the same as the two-class version.

Multiclass Decision Forest

This works just the same as the two-class version.

Multiclass Decision Jungle

This works just the same as the two-class version.

One-v-All Multiclass

One-v-all multiclass is an interesting ensamble method that uses a two-class algorithm of your choice to make a two-class comparison for every binary classification. For example, if you want to evaluate five levels of education, this technique will compare level 1 to level 2, level 1 to level 3, level 1 to level 4, and so forth. Then it will combine all of those results together. Watch below: