Azure ML Studio Matchbox Recommender

The algorithm and technique used for association analysis (a.k.a. "market basket analysis") works differently from those you've used in prior chapters. Azure offers only one option (as opposed to the 8-12 various options for regression and classification algorithms). In addition, the data used for the Matchbox Recommender needs to be in a different format. Rather than use the consumer as the unit of analysis, we use the instance of a movie being consumed, an instance of a restaraunt being visited, an instance of an item appearing in a shopping cart, etc. Think, for example, of taking all of a store's receipts and taping them together so that you have a list of items appearing in a cart with a unique ID for the consumer and the product in each row. Therefore, the same consumer and the same product will appear many times in this list. In summary, bare minimum three data fields needed for this analysis is a customerID-productID-rating triple.

Follow along with the video below to re-created the movies recommender experiment in Azure ML Studio:

Let's summarize what we learned. First, at a minimum, a recommender algorithm like Azure ML Studio's Matchbox Recommender requires three data fields on each row. First, there must be an identifier for the grouping. This is typically a customer ID of some sort so that all ratings or purchases can be grouped by the customer that made them. However, this could also be an order ID if a customer ID isn't available. An order ID also represents a grouping because many items can appear in a single order. Second, there must be some sort of product ID. This could represent a physical product, but also a movie, restaurant, or something else. Finally, there must be an indicator of the level of interest in that product such as a rating. However, this same Matchbox Recommender technique can be used to predict purchase volume by including a "quantity" ordered rather than a rating.

The video example above also demonstrated several types of scoring techniques. Let's review all options:

  • Ratings Prediction

    • Web Service Input: One of Customer ID, User Id, or Reviewer ID

    • Web Service Output(s): A predicted rating for every item that they have already rated

    • Purpose: To generate recommender evaluation metrics--RMSE and MAE

  • Item Recommendation

    • Web Service Input: One of Customer ID, User Id, or Reviewer ID

    • Web Service Output(s): A predicted rating for the top N items recommended. This output can be based on:

      • Rated items only: for the purpose of evaluating the model and generating an NDCG score

      • All items: this is useful for products that are desirable to "re-consume" (e.g. movies, restaraunts, consumable goods like toilet paper)

      • Unrated items: this is useful for recommending products or services that have never been previously consumed. However, there is an additional input required for this setting (see video)

    • Purpose: To generate recommended items for a known user

  • Related Users

    • Web Service Input: One of Customer ID, User Id, or Reviewer ID

    • Web Service Output(s): A predicted rating for the N customers most similar (based on rating patterns) to the input customer.

    • Purpose: To generate recommended users to a known user

  • Related Items

    • Web Service Input: One Product ID, Movie ID, Restaurant ID, or some other product or service identifier

    • Web Service Output(s): A predicted rating for the N items most similar (based on rating patterns) to the input item.

    • Purpose: To generate recommended items for customers who are not known, but who have already indicated interest in at least one item (e.g. an unregistered customer visits our website and clicks on a product. This tool will generate recommended items to be associated with the item they clicked on.

One more thing, once you've evaluated your recommender, it's time to deploy it for use. Watch this video to see how to properly evaluate a recommender For Rated Items and then conver it to For Unrated Items: