Piotr Guzik | Devoxx

Piotr Guzik
Piotr Guzik Twitter

From Allegro

I am open-minded, genuine and passionate software engineer. I love coding and experimenting with data. Some time ago I joined Allegro which is the biggest e-commerce platform in Central Europe.

I am responsible for real-time data processing and full-stack development concerning clickstream (gathering the data from web and mobile).

Not long ago I pushed towards data science. I came to conclusion that we should create anomaly detector for our clickstream. The purpose was to check if newly-deployed services, which have access to frontend, send events properly. The idea was to create simple statistical model which can be easily

Blog: http://allegro.tech/

bigdata Big Data & Machine Learning

Anomaly detection in real-time a.k.a. simplicity is the ultimate sophistication


Imagine such situation: you have deployed a service to production and everything seems to work. After some time your phone rings and an analyst says ‘Could you help me with searching latest clickstream produced by your application?’. Well, now it got serious. To make matters worse, you have been notified about the error by your client. It shouldn’t have happened. It should be the other way round.

@Allegro we found a solution for this use-case. I am going to tell you how we managed to detect anomalies (heavy web traffic after successful commercial, or fall of search events, or no clicks on Ad).

We tested all available solutions (Twitter detector, HTM algorithms) and came to conclusion that all machine learning models are too complicated. We didn’t understand them. We created our own simple model. I will show you how we moved from promising idea in R language to final working solution in Scala.

If you like buzzwords these might be for you: #Machine Learning, #Scala, #R, #Statistics, #Simplicity, #Real-time processing