Search This Blog

Saturday, April 18, 2015

Women's Healthcare Prediction


The Women's Healthcare Prediction competition was held on DrivenData from Feb-2015 to Apr-2015.

Objective
The challenge was to predict which healthcare services (like household, pregnancy, family, medical, etc.) were opted by women. Essentially, it was a multi-label, multi-class classification.

Data
The train data consisted of ~14600 rows (or women) along with various numeric and categorical variables and which of the 14 services were opted by them. Each woman could've opted for more than one service.

The test data consisted of ~3600 rows (or women) for which we had to predict which all services would they have opted for.

Approach
There were 1300+ variables, so my general approach was to do some form of FS along with an ensemble of classification models.

I started off with the usual suspects and found the tree-based models performing better than the linear models. None of the other models came even close to the accuracy received using XGBoost or RandomForest.

I tried multiple ways of doing feature selection and reducing the dimension, but they didn't improve the results significantly.

Once I exhausted all ideas, I used the brute-force approach to optimize my model performance by tweaking the parameters of each of the 14 individual labels.

Model
My final model was an ensemble of XGBoost and RandomForest with some standard data cleaning and FS. I optimized the parameters for each of the 14 labels, but that gave very minor improvement.

Results
I stood 11th on the public LB out of 104 teams. Just missed the Top-10 and also the Top-10% !
My model achieved logloss of 0.2588 while the topper was 0.2539.

View Complete Results

Views
This is the first competition where I really struggled for a long time. Tried lots of ideas, but nothing seemed to work. Ensembles hardly gave any improvement and I was literally stuck during the last 2 weeks.

The public/private LB split seemed excellent with the ranks remaining almost the same. Even the CV and LB scores moved in the same direction.

Feels like I missed out here, but it only motivates me to come harder next time. This was my first competition on DrivenData, and I'm hoping there are better ones to come soon!

No comments:

Post a Comment