Search This Blog

Sunday, March 6, 2016

Telstra Network Disruptions


The Telstra Network Disruptions competition was held on Kaggle in Nov, 2015 - Feb, 2016.

Objective
The objective was to predict the severity of a service disruption (whether it is a momentary glitch or a serious interruption of connectivity) on the Telstra network.

Data
The data consisted of disruptions along with features related to logs, events, resources, severity types across various locations.
The target variable was the severity of the disruption, into 3 classes.

Model
There was a golden insight in the data, and exploiting that became a very interesting challenge.

I ensembled several XGBoosts, on different subsets of the data and features, and some combinations of parameters.

The features used were the one-hot encoded raw features along with some interesting features built using the golden insight.

GitHub
View My GitHub Repository

Results
I stood 10th on the public LB and 9th on the private LB, scoring 0.40735 / 0.40267 using the logloss metric. My username is 'Vopani' and I competed as 'Anonymous Ghost' during the competition.

Views
This contest was all about finding and hacking that golden feature. The feature was nothing complex, it was simply the ordering of the observations that mattered. It was hard to spot because the ordering held true in the feature files (log, event, resource, etc.) and not on the original train/test data.

After identifying the relevance of ordering, it was very interesting to work on feature engineering and build features to improve the model without overfitting.

I'm glad I finished in the Top-10 and it also becomes my first Top-10 finish on Kaggle as an individual. I've moved to 70th in overall Kaggle rankings. I should easily be able to get into Top-50 by the end of the year. Maybe even Top-25.

Check out My Best Kaggle Performances