19-01-2017

Machine Learning is often mentioned as the next big innovation alongside industry terms like big data and data-science with cool applications ranging from self-driving cars to better fraud detection. This blog aims to shed some light on Machine-Learning from a high-level perspective. In consequent blogs I will go more in depth into implementations within the SAP world and finally how to apply it in a simple example.

Machine learning is the subfield of computer science that gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959).”

ML differs from traditional programming in that the program discovers or learns rules from the data. When we think of how we learn ourselves, we tend to learn general concepts or broad skillsets. However ML is best applied to solve a specific task. In this sense ML is similar to a normal program where a specific goal is quite well defined. The difference is that a typical ML problem is more complex and therefore manually hardcoding rules would be unviable.

This is helpful when thinking of ML on a high level, which is often represented as just a mystery box. Even if we do not dive into the details of how it works, we should have a clear idea of what result we want to achieve.

So when thinking of the problem we want to solve, it makes sense to apply ML when there is a repetitive task which has a clear outcome but does not conform to standard clearly defined rules.

What’s in the box?

Machine learning is often grouped in three learning styles:

  •      Supervised: We have an example of both the input and the desired output.
  •      Unsupervised: The program discovers structure (such as groups or clusters) from the input.
  •      Reinforcement: Learning solely by trial and error, and the feedback that comes from the action.

What do we want it to do?

Three main pillars of ML are regression, classification and clustering.

  •      Regression: To predict a value usually on a continuous scale.  A simple example is, can we predict the house price based on historical sales prices.
  •      Classification: To predict a class. For example, can we predict if someone will respond to our mailing offer. The classes in this case being responder or non-responder.
  •      Clustering: Group items on similarity without having explicit input groups, for instance which customers of a streaming service have similar tastes.

This is by no means an exhaustive list as there are other fields in Machine Learning, however many applications do fall into these categories.

By seeing if a problem fits in these categories, it can help you easily see if there is a well-defined ML solution for it. Let’s say for instance you want to know if a customer review on a website is positive or negative, this is at its core a classification problem.

Even a daunting lists of complex sounding algorithms is a bit easier to interpret when looking in which area of application it falls.

Cross industry standard process for Data Mining

A bit of a mouthful but CRISP-DM for short, is a data-mining process model which can help us in tackling a ML problem. The data-mining process can be seen as an umbrella term in which ML can be used as a tool. By making sure that every problem is first evaluated from a business understanding, your project is more likely to deliver real value to your organization.

Data understanding covers both currently available data in your organization but also data you might need to acquire to solve a problem. ML has a very iterative nature represented in CRISP-DM, where choosing a model for instance can have impact on how your data needs to be prepared.

So what about neural-networks and deep-learning?

Mentioned sometimes interchangeably with ML, neural networks are really an algorithmic implementation to solve a certain task. They fall under the Modeling part of CRISP-DM.

So why are they so often in the news with feats such as beating the world’s best Go player and enabling self-driving cars ?

Advancements in computing power, has made it feasible to work with large neural networks and they have often bested other methods. Deep learning refers to a layered neural-network often with some sort of feedback between the layers.

Appealing as they might sound, without having clear tasks and a good understanding of the data, it does not make sense to jump straight to neural-networks as they are definitely not a fix-all. Different neural-network implementations have both different learning styles and applications, so it is easy to get lost in a bottom up approach.

Finally

For a further general introduction into ML I can really recommend the openSAP course Enterprise Machine Learning in a Nutshell it takes about an hour and gives a really good overview without getting bogged down in details. If you really want to dive into the details of ML, Udacity, Coursera and other MOOC offer some great course packages. For ML in the SAP landscape be sure to come back for the next blog