24-03-2017

Although I love going into the details of how to measure success of Machine Learning models, you might be wondering how you can actually apply it into your own SAP landscape. SAP has quite a broad offering and making the right choice can be tough. In this blog I will clarify the various options.

SAP HANA is a core technology in the SAP landscape. For Machine Learning (ML) this is true to a large degree as well. HANA has various architectural options for ML.

Automated Predictive Library (APL)

This library is aimed at automating several steps of ML. Among other things, it generates and picks the best model to do the job. Although it does run on HANA, it is not part of the standard HANA license.

Predictive Analysis Library (PAL)

This is an extensive set of ML algorithms available by default in a HANA installation.

It includes areas like regression, clustering, and time-series analysis, to name a few. The procedures can be graphically modeled inside HANA in Flow-graphs.

R-Server

HANA has the ability to run R-Scripts on a R-Server. The procedure is created and resides in SAP HANA, but the execution is handled by an R-server, which is in a different physical location. The clear upside here is that you’re able to use a vast array of libraries that exist in R. Also, many data scientists have extensive R knowledge and you might already have programs written in R. The real downside here is that you might not use HANA for the heavy lifting and performance can suffer.

The above solutions are all HANA centric. Obviously Python, R, or other solutions can easily access data stored in HANA through an ODBC connection, but in this case HANA is strictly a high performance database.

SAP BusinessObjects Predictive Analytics is another SAP solution offered to apply ML and statistical models.

The goals of predictive analytics are quite ambitious. It not only focuses on allowing you to easily develop models, but also deploy and maintain them.

With the aim of creating a so called ‘predictive factory’, predictive analytics offer a high level of automation. It can prepare data and perform many of the transformations needed to get it in the right format for the ML algorithm. For instance, categories often need to be converted to numerical values.

Furthermore, relevant variables can be automatically selected from large datasets and even combined to create derived features. Latitude and longitude are obvious examples of features that need be combined. Another example is a timestamp, which could be a very weakly correlated feature until you look at the derived variable ‘day of week’.

Although these steps might seem trivial, it is often said that the cleaning and preparation of the data takes up the majority of a data scientist’s time. Especially when looking at a broad dataset with 200+ variables, such automation can really pay dividends. Model training and selection can be automated so various techniques are tried and evaluated to find the best.

HANA or Predictive Analytics?

There is a clear advantage to using  Predictive Analytics in conjunction with HANA and it tightly integrates with the APL and PAL libraries so processing can be done by HANA. Once finished, models can be deployed on HANA as database procedures without any extra rework.

However, having HANA is not a prerequisite. It can connect to most other databases and it has capabilities to integrate with Spark Hadoop, where it can push down operations to avoid massive data transfers.

So, in conclusion, SAP HANA and Predictive Analytics offer some powerful options to apply ML in your business. With HANA, you have a database that offers a broad range of features and support. Predictive Analytics, on the other hand, is a tool focused on creating and maintaining ML models. If you’d like to know more about what opportunities or challenges ML might offer in your organization please reach out to us, we would be happy to help you in taking the next steps.