Course Final Project

Instructions

The purpose of this project is to allow you to demonstrate each of the techniques you've learned in this course in a relevant context. You should begin by identifying an unstructured business problem you'd like to help solve and selecting a dataset that you can analyze to provide decision support for that problem. This could be something from your current job or a dataset you find online. Kaggle.com is a great resource for datasets and ideas for your final project. If you don't already have an idea of the business problem you want to tackle, create a Kaggle.com account and begin exploring the datasets for ideas.

Once you've decided on a business problem and accompanying dataset, proceed with each of the requirements below. In addition, document your process at each step for a document deliverable which should be submitted electronically in PDF form. For example, in the Data Preparation step, create a data dictionary to define each of your data fields, document what types of missing data issues you had with your dataset, record how you resolved those issues, etc. There are details on the Final Project Documentation deliverable at the end of these requirements.

High-Level Requirements

  1. Project Documentation: this will be in the form of an electronic PDF file; no need to print anything. The format is "as-short-as-you-can-make-it-while-meeting-all-requirments." There is no specific font, margin, or other style guidelines. Just make it as professional looking as possible.

  2. Tableau Story posted to Tableu Public (see instructions below): the URL for this story should be included in the documentation and submitted below

    • No minimum number of story points required, but it should be sufficient to reveal sevearl useful insights

    • Include at least one dashboard in the story that users can interact with by using filters

  3. Azure ML Studio experiment published to the ML Studio Gallery: the URL for this gallery experiment should be included in the documetnation and submitted below

    • Two of the following four analyses should be included: 1) regression, 2) classification, 3) text analysis, 4) matchbox recommender. However, depending on your unique context, consult with your instructor if you need an exception to this rule.

    • Evidence of sufficient model comparison and evaluation should be documented. This includes both 1) variable selection, and if appropriate, 2) algorithm selection

  4. (contingent) Technical students will have an additional requirement to implement both the Tableau story and Azure ML Web Service in a live system (either website or mobile app)

Detailed Requirements

  1. Data Preparation

    • Resolve missing data: generally speaking, columns and rows with more than 50% of the data missing will likely need to be eliminated altogether. For those with less than 50% missing, is the data numeric or categorical? For numeric data, consider using Azure ML Studio's Clean Missing Data feature. For categorical fields that a missing, consider one of two approaches: 1) if the data is missing because it was forgotten, also us the Clean Missing Data feature, or, 2) if the data is missing because there is no relevant value to use (e.g. consider the case where a product's "weight" is left blank; maybe that's because it's a downloadable software product and color is irrelevant), consider inputting a value like "blank" which will allow that record to be analyzed.

    • Convert data to a usable form: sometimes your data comes in formats that are unusable. For example, if you scrape product data from Amazon.com, you may get a price in the following format: < span class="currency" > $18.99 < /span > . In this case, you'll need to remove those unnecessary HTML tags before you can use the data.

    • Generate or collect new variables: Are there new variables that could be generated from existing variables that would be useful? For example, if you're using Amazon.com product data, the "category" field may come back in the form: "Home, Electronics, Tablets, Apple, iPad, iPad Pro". In that case, you may want to turn that variable into six variables to record the category, sub-category, ..., etc. Similarly, if you collect data about crimes across a variety of locations, you may want to look up additional information about each location like how close it is to a business or residence, weather patterns, or other data relevant to a location.

  2. Data Understanding (completed recursively with the Data Preparation stage)

    • Tableau: create a Tableau Story that includes at least one dashboard. This story should walk the user through all of the relevant charts (including forecasts where relevant and cluster analysis) that will describe what insights you can find from descriptive data analyses. There is no minimum number of story points required. However, your story--and your documentation--should adequately indicate why you created each visualization and what we can learn from it. You should post this story to Tableau Public and then copy the URL link to your Tableau Public story into your documentation. For instructions on how to do this, refer to this video:

    • Statistics: some of these can be calculated in Excel, some in Azure ML Studio, and some in both. This should include a correlation matrix for all numeric variables. You should identify strong correlations and provide a few bullet points on correlations of interest (e.g. Which independent variables are most correlated with the dependent variable? Which independent variables are possibly too highly correlated?). Next, you should calculate skewness and kurtosis for every numeric varialble. For those which violate the limits, include a histogram (you can generate these in the data view of Azure ML Studio). Try a variety of transformations to determine whether any of them will improve the skewness and kurtosis issues. If these problems can be fixed, include the newly transformed versions of the variables in your model training in the next phase. Also, if you are using a regression model, check for non-linear relationships. You can do this in Tableau or Excel. Create new variables to test if your examination reveals the possibility of exponential, logarithmic, or polynomial terms.

  3. Model Testing

    • Your project should include two of the four types of analyses taught in this class: 1) Regression, 2) Classification, 3) Text Analysis, 4) Matchbox Recommender

    • You should demonstrate a comprehensive set of testing across 1) various sets of independent variables, and if appropriate, 2) various algorithms.

    • You should demonstrate your reasoning for which independent variables you chose (e.g. Permutation Feature Analysis, Principal Components Analysis) and which algorithm you chose (R squared, Accuracy, NDCG, etc)

  4. Model Evaluation

    • You should report the evaluation results of every relevant model that you build and test. This will justify why you chose your final set of variables and the particular algorithm used. This section does not need to be particularly long. Just summarize and sort the results.

    • Your evaluation will vary depending on the types of analyses you used. For example, Regression models should inlcude at least R squared (coefficient of determination) and RMSE. Classification models should include at least accuracy and precision. Matchbox Recommender models should include at least RMSE and NDCG.

  5. Deployment

    • Your Azure ML Experiments should be deployed to the Azure ML Gallery

    • Your Tableau Stories and Dashboards should be published in a single workbook to Tableau Public. You can follow a video tutorial on how to perform this here:

Documentation Requirements

  1. You should document every step outlined above. However, do not write more than is necessary. Bullet points are acceptable where appropriate. Only include what is necessary; but don't leave anything out that is useful to show your work.

  2. There is no specific styling format requirement. Just make it a professional business document with a consistent style.

  3. Save the documentation as a PDF and upload it below.

Additional Requirements for Technical Students

  1. Technical students will need to create a web app or native mobile app that the Tableau dashboard and ML Studio Web Service is used within in context.

  2. All projects need prior instructor approval. The web app's (or mobile app) high-level requirements should be pre-determined.

  3. Technical students do not necessarily need to create a Tableau Story. They can create a dashboard (or a set of dashboards) only. However, the dashboard(s) should be embedded into a web page that is part of the web application. In addition, the Tableau JavaScript API should be used to develop additional functionality that is incorporated with the dashboard. While not every technique that is taught in this course needs to be implemented, students should be creative in determining which features would be useful in their specific application.

  4. The Azure ML Studio Web Service(s) should be used (called) in the web app according to high-level requirements agreed upon by the instructor.

Want to try our built-in assessments?


Use the Request Full Access button to gain access to this assessment.