Regression Models

Regression Model Overview

Now that you've had a taste of how predictive modeling works in Azure ML Studio, let's dive into greater detail on each of your options for predicting numeric variables. "Regression" models are those which predict a numeric dependent variable. We are going to cover brief videos on a selection of the algorithms offered by Azure ML Studio below:

  • Ordinal Regression: data in rank ordered categories

  • Poisson Regression: predicts event counts

  • Fast Forest Quantile Regression: predicts a distribution

  • Linear Regression: fast to train, assumes a linear model

  • Bayesian Linear Regression: good for small data sets, assumes a linear model

  • Neural Network Regression: usually more accurate than others, slow to train

  • Decision Forest Regression: great combination of accuracy and fast training

  • Boosted Decision Tree Regression: still fast and accurate, but requires large memory footprint

Ordinal Regression

Ordinal regression is optimal for dependent variables that represent a rank order--which implies that the distance between each ranking is not necessarily equal. For example, predicting the results of an election would be ideal for an ordinal regression because the distance between the first and second place candidates and the second and third place candidates are not equal in terms of votes recieved.

Poisson Regression

Poisson regression is optimal when predicting "counts." For example, you would use a Poisson regression to predict the vote counts recieved by each election candidate. As a result, the dependent variable would never be negative and would be whole numbers (i.e. integers).

Fast Forest Quantile Regression

We'll Fast Forest Quantile Regression for now. But you can find a description of it here.

Linear Regression

We'll skip Linear Regression as well because we have been using linear regression all along and you should be familiar with it already. The Microsoft documentation for it is right here.

Bayesian Linear Regression

Bayesian Linear Regression is typically more accurate than a regular linear regression and is still dependent on the assumption of a linear relationship between the dependent variable and each independent variable. From Microsoft's website (https://msdn.microsoft.com/en-us/library/azure/dn906022.aspx):

In statistics, the Bayesian approach to regression is often contrasted with the frequentist approach.

The Bayesian approach uses linear regression supplemented by additional information in the form of a prior probability distribution. Prior information about the parameters is combined with a likelihood function to generate estimates for the parameters.

In contrast, the frequentist approach, represented by standard least-square linear regression, assumes that the data contains sufficient measurements to create a meaningful model.

Neural Network Regression

A Neural Network Regression is often referred to as "deep learning" and often used for complex problems such as image classification. Any statistical algorithm can be termed a "neural network" if it uses a adaptive coefficient weights and can approximate non-linear inputs. That's right, neural networks relax the assumption of linear relationships between DVs and IVs. Neural network algorithms work by modeling a "hidden" layer of variable weights based on the original inputs that have improved predictive capabilities over the original inputs. Both the number of hidden layers and hidden variables can be configured in most neural network algorithms. Neural networks often have greater accuracy, but increased processing time, over most other algorithms.

Decision Forest Regression

Decision Forest Regression is an ensamble method based on the Decision Trees algorithm. From Microsoft's website (https://msdn.microsoft.com/en-us/library/azure/dn905862.aspx), Decision trees are non-parametric models that perform a sequence of simple tests for each instance, traversing a binary tree data structure until a leaf node (decision) is reached.

The advantages of decision tree-based algorithms are that they are quickly and efficiently computed, they can handle non-linear relationships, and they handle non-"normal" variables relatively well. Decision Trees calculate "tree-like" branch structures with binary decision nodes (as opposed to traditional regression coefficients).

Boosted Decision Tree Regression

A Boosted Decision Tree Regression is also an ensamble algorithm that is similar to a decision forest, but it allows for distinct branches to reconverge later in the tree. This is deal for independent variables that are highly correlated. The Boosted Decision Tree algorithm tends to improve accuracy for some value ranges with the possible risk of reducing accuracy for other ranges.

That was a LOT to take in! If you'd like to learn more about decision trees, bagged decision trees, random forests, boosted decision trees, or just ensamble algorithms in general, you can find a pretty good write-up here. But for now, all we care about is finding out which one offers the best predictive capabilities.

Now, let's turn our attention to classification-based models that allow us to predict categorical data.