Search This Blog

Tuesday, February 24, 2015

Avazu Click-Through Rate Prediction



The Avazu Click-Through Rate Prediction competition was held on Kaggle from Nov-2014 to Feb-2015.

Objective
Click-through rate is a very important measure for performance of ads and the challenge was to predict how likely an ad will be clicked.

Data
Train data consisted of ~ 40 million ads (which is just 10 days of Avazu data!) along with a label indicating whether they were clicked or not. The variables were about the website/app where the ad appeared, some features of the ad (like size, position, etc.), demographics of the user to whom the ad was shown and some anonymous variables.
The test data consisted of ~ 6 million ads (11th day of Avazu data).

Approach/Model
This is the largest data set I've worked with till date and 40 million rows of data meant memory issues right from the start.

Wait a minute. What about that awesome online-lr code? Of course... that's the same beauty I used for the Tradeshift competition and its the same one I used for this competition too. Well, isn't it just fabulous?

I started off playing around with the parameters of the code and adding interaction variables and generating some features. Some of the anonymous variables were decoded (by some Kagglers) and I tried using them more smartly.

There were massive number of participants and after 2-3 weeks, I was ranked in the top-20 with 600-700 teams. I had some work assignment for which I travelled to US, and wasn't sure if I would have time to try out new ideas, so I decided not to pursue it further.

Not much to share here, no particularly nice model ideas, but I still managed to secure 79th place out of a whopping 1604 teams scoring 0.3908 / 0.3889 using the logloss metric.

Views
It was a challenge to work with this data, and not having access to much RAM, it is all the more tricky. Thanks again to pypy and tinrtgu for the online-lr code and I'm glad I still made it into the top-10%

Congrats to 4-Idiots, Owen and Random Walker for the top-3 spots. What can you say about Owen? Leading the overall Kaggle rankings with more than double the points over 2nd place David Thaler. Some feat that it!

And for me, moved to 185th in overall rankings. The race is on to finish in the top-100 (or top-50) by end of this year.

Check out My Best Kaggle Performances