1.5 Machine Learning

So what is the difference between data analytics and machine learning? Data analytics, and predictive analyses in particular, are often initially performed "offline"--meaning that data is exported from a live data source and used to generate predictive analyses like those referred to in the prior section. However, at some point, that data becomes "stale." In other words, as more time passes from when the data was exported, it becomes and poorer and poorer representation of how current customers are behaving. This is important because consumer preferences constantly change as products, services, regulations, trends, and even cultures change.

So why would anyone export data (thus, making it "offline") to perform analyses? You'll be able to see the answer to this question yourself by the end of this book if you follow along. But basically, it's because initial data analysis is often exploratory--meaning that we are exploring the causes of a desired effect. For example, we may suspect that a new website look and feel might promote greater sales. We won't know for sure until we analyse sales data based on the old and new website format. But to confirm our hypothesis, we have to perform a great many number of analyses to be quite sure. Imagine that Amazon is performing this analysis; they complete 1.6 million consumer orders per day. That kind of volume requires a lot of computing power to analyze. Therefore, rather than analyze live data, it makes more sense to export a smaller random sample of data for offline analysis. The data science team will test dozens of statistical tests before settline on a single "best" form of analysis.

However, once that best analyses is determined, it's time to turn our data analytis into true machine learning. Machine learning is accomplished when an analytical model is integrated into an information system and continually (and automatically) "retrained" with the latest consumer data and behaviors. When that happens, the machine "learns" the new behaviors of consumers automatically as those behaviors are captured in an information system. See the image below:

Consider the Amazon case again. Machine learning requires an automated loop that requires little to no human intervetion. Let's review each step beginning with the person icon on the left-middle of the image above: 1) A consumer--"Homer"--visits Amazon.com and makes a purchase, 2) Homer's purchase is recorded in Amazon's operational database, 3) ETL is automatically (e.g. once a day, week, or month) performed to copy Homer's transaction into the data warehouse for analysis, 4) Homer's latest transaction--now in the data warehouse--is used to calibrate statistical predictions of, for example, which products Homer may want to purchase next, 5) the next time Homer visits Amazon.com, the website retrieves the latest product recommendations for Homer that have been updated based on his last purchase, and finally, 6) Homer's next decision--to purchase the recommended product or not to purchase--is recorded back into the same operational database. This cycle continually repeats every time Homer makes a purchase. In summary, machine learning occurs when the predictive models trained during offline predictive data analyses are integrated into information systems and automatically "retrained" with the latest data so that to no human intervention is required to update and calibrate predictions; hence, the "machine" "learns" on its own.

Remember, before machine learning can occur, the data science team must determine exactly what statistical calculations will be performed in step 4 above. That will be the focus of the second two thirds of this book including the stastics review in Excel and Azure Machine Learning Studio. But before that step 4 can even begin, you must explore, describe and understand your data. That is the focus of the first third of this book covering Tableau.

<{http://www.bookeducator.com/Textbook}learningobjectivelink target="ux60i">

Previous Next