Search This Blog

Friday, July 31, 2015

Exacerbation Prediction of COPD


The Exacerbation Prediction of COPD patients competition was held on CrowdAnalytix in May-Jul, 2015.
Seems like this was a sequel to the first Exacerbation Prediction competition in which I stood 2nd.

Objective
Smoking related diseases like chronic pulmonary obstructive disease (COPD) are a severe global medical problem which have affected over 50 million people worldwide. As their condition worsens, a fraction of patients experience “exacerbations”. Exacerbation is defined by sudden worsening of symptoms such as shortness of breath and increased airway inflammation often requiring immediate medical treatment and emergency room visits.

The objective was to build a predictive model using medical data which predicts beforehand which patients will experience 'exacerbation' so that they can be provided appropriate medical treatment to prevent/control it.

Data
The train data consisted of 1935 patients and 62 variables related to medical and smoking history, demographics, lung functions, etc. along with the true labels of whether they experienced Exacerbation or not.
The test data consisted of 1324 patients for which we had to predict the probability of Exacerbation.

Approach
Being one of the toppers of the previous Exacerbation Prediction competition, I followed a similar approach. My approach was to build 3-4 models and ensemble.

Unfortunately, it was very hard since the CV and LB scores did not go hand-in-hand. I finally tried various subsets and combinations of XGBoost, RandomForest, Logistic Regression and k-NearestNeighbours.

Model
My best model on the public LB was a simple average of XGBoost and Logistic Regression. Which is the exact same ensemble I used in the previous Exacerbation contest.
My best model on the private LB was Logistic Regression on the PCA-transformed variables (using the top-7 components).

Results
My public LB gave an AUC score of 0.767 (XGB + LR) putting me in 11th place, whereas, my private LB gave an AUC of 0.769 (LR) putting me in 4th place.

So, I stood 4th and won some more prize money! (Who wants a party?)
This also means I've been in the Top-5 in 3 of the 4 CrowdAnalytix competitions I've participated in.

Views
I think the evaluation system is absolutely useless. The winners were decided solely based on the best private LB score. Kaggle does the same, but forces players to choose two submissions for evaluation. Here, ALL private submissions were evaluated and the best one was chosen.

I see a lot of cons here:

1. Players can try out all sorts of models and submit, and the more submissions a player makes, the likelier is one of them to be among the top.
2. Players don't know which model will be the final best model. So, if they made 100 submissions, are they supposed to track all 100 of them and submit the one that CA chooses as best? Are you kidding me? I had a tough time identifying which model of mine finally gave the best private LB score.
3. What sense does it make when one model fits best to public LB and another fits best to private LB?
4. Winners are more based on luck. Models are likely to be the luckiest fit to the private test set. I'm not sure how useful this would be to the client.

Kaggle has a much better, robust and stable evaluation system, and I really hope CrowdAnalytix figures something out soon, else its just going to be a series of lottery competitions.

Nonetheless, I'm happy with my performance. Another win up my sleeve and looking forward to add more in the future!

Read a blog post about the 7th place solution by Triskelion on ML Wave.

Check out My Best CrowdAnalytix Performances

Saturday, July 11, 2015

Times Sudoku Championship 2015


The Times Sudoku Championship will be held in July/August. This championship will select four players who will be sponsored to represent India at the World Sudoku Championship 2015 (WSC).

Note: This will be the 'sponsored' team for the WSC. The main A-Team that will represent India for the WSC has been selected from the Indian Sudoku Championship 2015 held last month.

--- Rishi Puri, Prasanna Seshadri, Kishore Kumar and Me are in the A-Team ---

Read Rules and Regulations

The schedule is as follows:

Regional Round in Delhi (12th July, 2015)
Regional Round in Mumbai (12th July, 2015)
Regional Round in Chennai (19th July, 2015)
Regional Round in Bengaluru (26th July, 2015)

National Finals in Mumbai (August, 2015)

Top-3 players from each regional round will be selected for the National Finals. Last year's TSC winners (Prasanna, Rishi, Sumit and Me) get wild cards for the National Finals.


Mumbai Regionals (12th July, 2015)
Who better to take the Mumbai crown than Tejal Phatak! Congratulations for finally making it on the Mumbai podium! Congrats to Prabha Joshi and Jaykumar Patel for qualifying too. See you all during the finals!

Read the Mumbai Article

The regional rounds in Mumbai and Delhi will be held simultaneously on 12th July. I won the Mumbai regionals last three years. Since I have a wild card this year and so does Prasanna Seshadri, its an open door for newcomers. I have no idea who could be in the top-3 unless someone is planning to travel from another city, maybe Tejal.


Delhi Regionals (12th July, 2015)
Delhi results are not surprising. Congrats to Akash Doulani, who has been in good touch lately, followed by Rajesh Aggarwal and Ritesh Gupta. Lets lock horns at the finals!

Read the Delhi Article

The regional rounds in Mumbai and Delhi will be held simultaneously on 12th July. Among the top known solvers of India, I expect Akash Doulani and Ritesh Gupta to qualify from Delhi (if they participate). Maybe Himani Shah, who's not been in the sudoku circuit recently or even Dileep Singh.


Chennai Regionals (19th July, 2015)
Chennai results were as expected too. Congrats to Rakesh Rai and Kishore Kumar for their consistent performances at the sudoku circuit, and Pranav Kamesh for coming in third. Looking forward to the finals!

Read the Chennai Article

I'm expecting and hoping Rakesh Rai and Kishore Kumar qualify from Chennai, they've been among the top solvers of India in recent times. There are some upcoming names from Chennai, so I wouldn't be surprised to see a new name in the top-3 this year.


Bengaluru Regionals (26th July, 2015)
The Bengaluru regionals is going to be a cracker! There are many potentials players who could make it in the top-3 this year. Some usual suspects are Rajesh Kumar, Harmeet Singh, Kunal Verma, Jayant Ameta, Gaurav Kumar Jain, Zalak Ghetia... and some more who's names are not on the top of my head! I would've loved to be there and see you guys battle it out, but unfortunately, I'm in Mumbai over the weekend.
Good Luck and may the best-3 qualify!