Search This Blog

Saturday, April 18, 2015

Women's Healthcare Prediction

The Women's Healthcare Prediction competition was held on DrivenData from Feb-2015 to Apr-2015.

The challenge was to predict which healthcare services (like household, pregnancy, family, medical, etc.) were opted by women. Essentially, it was a multi-label, multi-class classification.

The train data consisted of ~14600 rows (or women) along with various numeric and categorical variables and which of the 14 services were opted by them. Each woman could've opted for more than one service.

The test data consisted of ~3600 rows (or women) for which we had to predict which all services would they have opted for.

There were 1300+ variables, so my general approach was to do some form of FS along with an ensemble of classification models.

I started off with the usual suspects and found the tree-based models performing better than the linear models. None of the other models came even close to the accuracy received using XGBoost or RandomForest.

I tried multiple ways of doing feature selection and reducing the dimension, but they didn't improve the results significantly.

Once I exhausted all ideas, I used the brute-force approach to optimize my model performance by tweaking the parameters of each of the 14 individual labels.

My final model was an ensemble of XGBoost and RandomForest with some standard data cleaning and FS. I optimized the parameters for each of the 14 labels, but that gave very minor improvement.

I stood 11th on the public LB out of 104 teams. Just missed the Top-10 and also the Top-10% !
My model achieved logloss of 0.2588 while the topper was 0.2539.

View Complete Results

This is the first competition where I really struggled for a long time. Tried lots of ideas, but nothing seemed to work. Ensembles hardly gave any improvement and I was literally stuck during the last 2 weeks.

The public/private LB split seemed excellent with the ranks remaining almost the same. Even the CV and LB scores moved in the same direction.

Feels like I missed out here, but it only motivates me to come harder next time. This was my first competition on DrivenData, and I'm hoping there are better ones to come soon!

Wednesday, April 1, 2015

Unlucky 13

I authored a Sudoku contest Unlucky 13 on LMI. It was held from 1st - 6th April, 2015 and consists of 13 sudokus to be solved in 65 minutes.

View Championship Page

Download Instruction Booklet
Download Puzzle Booklet
Password is LuckyYou

View Forum

View Results

"13 is my favourite number and I created this themed test in late-2014 during some easy days at work. Incidently, this is also the 13th test I'm authoring at LMI. Lot of special moments and memories along the way... and I hope players enjoy this set and make it a success!"

Congrats to Jan Zverina, Hideaki Jo and Jakub Ondrousek for the top-3 overall players.
Congrats to Prakhar Gupta, Kishore Kumar and Rishi Puri for the top-3 Indian players.

"Good artists copy, great artists steal" - Pablo Picasso.

Few months back, a couple of my friends created some sudoku variants and asked me to test solve them. It was their first try at creating sudokus and they did quite a decent job, since they were all unique. Only problem was, all the variants could be solved like Classics without having to use the variant rule and I had a hearty laugh while solving them. For example, there was an Odd Even Sudoku with 42 givens... a Non-Consecutive Sudoku with 33 givens... etc. :-)

That's how the idea was formed for this test. If I enjoyed it so much, maybe other solvers would enjoy it too, in its own humorous way. It was subtle April 'fooling', unlike last year's total surprise (which was awesome in its own way). Thanks to Deb Mohanty and Prasanna Seshadri for test-solving and other inputs and 'contributions'. I'm happy many players were able to complete the test and get the bonus, it was intentionally left longer and the difficulty such that a large portion of solvers would finish.

Thanks for all the messages, and hope to see some more exciting Sudoku solving in the months to come! :-)