In my previous blog I gave a general introduction to Machine Learning (ML). The focus was not on how Machine Learning implementations worked, but rather which areas of applications can be defined. This blog will dig deeper into a couple of topics, mostly on measuring success of your Machine Learning model.
How do you measure the success of your ML model? You might hear things like: ‘The model is so complex it is impossible to measure how good the model really performs’. However, you can measure the performance and in fact you really should. Looking back at the two main application areas where historical data is used (also called ‘supervised learning‘) are:
Regression: model outputs a continuous value, e.g. ‘for a house this size the expected price is 300K’.
Classification: models outputs a discrete value or class, e.g. ‘this is a picture of a dog or a cat’. However, models often output a probabilistic value, e.g. ‘this picture has a 0.6 chance to be a dog’.
Measure of error
Firstly let’s get an idea for how error is measured on regression. An often used term is ‘Error squared‘ (also called ‘R²’). This is the difference between an actual value and the predicted value. This error is then squared to give a heavier penalty to large deviations. Finally, to evaluate the entire model all the sum of all squared errors is taken.
For classification models similar measures are available. However, as many classification models do not just output a class, (‘this is a picture of a dog or a cat’), but actually give the likelihood. The output would be the chance is 0.2 class A, 0.8 class B. This probabilistic output form is extremely useful, but unfortunately they require more complex error measures.
Although both methods have different goals, they both use historical data. For a fair evaluation of the model, it is important that some of the data is reserved for evaluation and never used for training. In both regression and classification models this held back data is both your only reliable indicator of performance and your main protection against overfitting your model.
If your model was trained on all of your data, then the model would try to perfectly predict every value, no matter how strange the pattern needs to be, to fit all the points. Then if you would use the same historic data to evaluate your model, you would find out that it performs perfectly. However, the model would not generalize well beyond your limited dataset. In short, it would fail in the real world.
Not looking at the actual visualization of a model, there is an easier way to determine if your model is in danger of overfitting. Measuring the error on your training set and validation set separately gives a clear indicator.
As your model gets more complex, it performs better on training data. This is a trend which will continue until there are no more errors on your training sample. The same initial improvement should be seen using the test sample. However, there will be a point where the test error start to increase again, as the model is not just accounting for general trends, but starts to account for ‘Noise’ which does not generalize. This is also known as the ‘Bias-Variance tradeoff’. The split between training and evaluation data is a bit arbitrary with common splits like 80% training 20% evaluation. Although it is important to never use your evaluation data during training, there are hold-out methods which basically involve making multiple models and averaging their results. This lets you use all data for training, but individually these models have a clear split between evaluation and training data.
In conclusion, as well as understanding the application of ML it makes sense to understand how you might evaluate your solution regardless of the technical implementation. Finally, just to revisit my last blogpost, the most important part of solving any ML problem is to first start with the Business Problem.
At The Next View we apply Design Thinking methods to all business problems and I see clear benefits in investing time and energy to define the business problem a.k.a. the Design Challenge correctly from the start. After these two blogs I’m convinced you’re really excited to find out about the technology capabilities of the SAP suite. Well good news, that will be the main subject of my next blog: the SAP Machine Learning product portfolio. Any questions so far? Feel free to contact me!
https://www.nextview.nl/wp-content/uploads/2018/01/nextview-logo.svg00nextviewhttps://www.nextview.nl/wp-content/uploads/2018/01/nextview-logo.svgnextview2017-03-02 19:57:582018-01-17 09:01:53How to measure success of Machine Learning Models
Nextview Design Thinking Center
Willem Fenengastraat 4C
1096 BN Amsterdam
Nextview Design Thinking Center
High Tech Campus 27
5656 AE Eindhoven