Search This Blog

Saturday, September 12, 2015

Carcinogenicity Prediction of Compounds


The Carcinogenicity Prediction competition was held on CrowdAnalytix in Jul-Sep, 2015.

Objective
Carcinogenicity (an agent or exposure that increases the incidence of cancer) is one of the most crucial aspects to evaluate drug safety.

The objective was the predict the amount of carcinogenicity in compounds, which is measured through TD50 (Tumorigenic Dose rate).

Data
The train data consisted of compounds with over 500 variables consisting of physical, chemical and medical features along with their corresponding TD50 values. About 60% of the TD50 values were 0, the rest were non-zeros with few outliers.

The test data consisted of compounds with these features for which we had to predict the TD50 value.

Approach
This was a weird contest. On exploring the data, within 3-4 days, I found a key insight, and that proved to be a game changer.

So, what was this golden insight? It was the evaluation metric: RMSE.

The target variable (TD50) had many zeros and the rest were positive continuous values. RMSE as a metric can very easily get skewed due to outliers.

The train data had two values above 20,000. Predicting them accurately (greater than 20,000) would reduce the RMSE by more than 50%. So, assuming there are these outliers in the test data too, I knew this would give the maximum boost in score.

All the participants were lingering in the 1700's scores... and most of the usual models were not performing better than the benchmark 'all zeros' submission! That was a proxy validation that there had to be outliers in the test set too.

I built a model to classify outliers. The train data had only two rows (the ones with TD50 > 20,000) with target value '1' and the rest as '0'. Scored the classifier on the test set. Took the top-3 predicted rows of the test set and used 25,000 as the prediction. And BINGO! The 2nd one dropped my RMSE from 1700's to ~900. Almost a 50% drop!
Thats what you call a game-changer :-)

There are pros and cons.
Pros are that it was definitely a 'smart trick', and not really a 'sophisticated model'. Which I accepted and mentioned on the forum too. It was a neat hack applied on a poor evaluation criteria.
Cons are, of course, it doesn't lead to the best model. And worse, the result was technically determined by just one or few rows, making the rest of the test set worthless.

Model
For the remaining observations, I used a two-step model approach.

I first built a binary classifier to predict zeros vs non-zeros. Used RandomForest for this.
I then built a regressor to predict the amount of TD50, only using it for the observations which were classified as non-zeros from the binary classifier. Used RandomForest for this too.

For the binary classifier and regressor, I subsetted the train data by removing all rows where the TD50 values were > 1000 (considering them as outliers).

Results
I was 1st on the Public LB and 1st on the Private LB too.

This is my first Data Science contest where I stood 1st. Yay!
Not a really good one, but I'll take it :-)

Congrats to Sanket Janewoo and Prarthana Bhatt for 2nd and 3rd. Nice to see all Indians on the podium!

Views
The evaluation metric became the decider for this contest. A learning for me, that sometimes a simple approach can make a BIG DIFFERENCE.

Which makes it VERY IMPORTANT to explore the data, understand the objective, the evaluation and always do some sanity checks before diving deep into models and analysis. I've learnt a lot of these things from top Kagglers, and I'm sharing one of these here today, hoping someone else learns and helps in the development, improvement and future of Data Science.

Data can do magical things sometimes :-)

Check out My Best CrowdAnalytix Performances

Saturday, September 5, 2015

Puzzle Ramayan 2016

The online rounds of Puzzle Ramayan 2015-2016 have ended! This is a national level event aimed at encouraging puzzle solvers of India to participate and compete with the top solvers to gain experience and improve competition in the years to come.

NOTE: This event serves as a qualifier to participate in the Indian Puzzle Championship 2016

The championship consisted of 8 online rounds (Sep-2015 to Mar-2016) from which the top solvers will be invited to participated in the national finals.
Championship Page

National Finals
The finals will be held on 17th July, 2016 in Chennai.

View Finals Page


Online Top-10
1. Rohan Rao - 597.3
2. Amit Sowani - 575.8
3. Swaroop Guggilam - 477.4
4. Rajesh Kumar - 458.3
5. Rakesh Rai - 411.2
6. Ashish Kumar - 374.2
7. Kishore Kumar - 372.1
8. Jayant Ameta - 344.2
9. Jaipal Reddy - 302.7
10. Devarajan D - 277.2

View Complete Results

P.S. Prasanna's name is removed from list since he has a wild card for the WPC next year on being the best Indian performer at the WPC this year.


Round 8: Placement (26th - 28th Mar, 2016)
Author: Rajesh Kumar
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

Nice puzzles, but on the harder side. A little disappointed that I wasn't able to finish the set.
Horrible answer keys, I struggled a lot with it, so did a few other players.

Overall, a decent end to PR.

The Top-10 look more-or-less as expected, but really good to see Ashish and Kishore improving and a great job by Devarajan for maintaining his top-10 position throughout the rounds. Looking forward for an interesting and fun-filled finals in Chennai in July.


Round 7: Loops (27th - 29th Feb, 2016)
Author: Prasanna Seshadri
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

Wow! Another wonderful round. I'm not very good at Loops, but I could solve this very smoothly. Puzzles were excellent, and a very well-balanced set and PR round. Probably the best so far.

I finished the set in 47mins, with Swaroop in 57mins and Amit in 65mins.  Swaroop now has increased his lead at 3rd place above Rajesh and should be able to hold on to it since the last round is authored by Rajesh.

Hope to end it well.


Round 6: Shading (23rd - 26th Jan, 2016)
Author: Swaroop Guggilam
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

Wonderful! What a perfectly balanced round this was. Kudos to Swaroop for authoring this set, my favourite round of PR so far. Prasanna finished the set in 44mins, I finished in 58mins and Amit in 65mins.

Lot of swaps in the points table after this round. Also due to the rankings being updated after discarding the worst two scores. I regain the top spot over Amit. Rajesh is less than a point above Swaroop. A disappointing round by Rakesh allowed Kishore to move above him.

With the last two rounds to go, it will be an interesting finish, especially crucial for Swaroop, who needs to be in the Top-3 to be eligible for the NRI wildcard.


Round 5: Snake (26th - 28th Dec, 2015)
Author: Ashish Kumar
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

After a great Round 4, I had an absolutely disastrous Round 5. Snakes is not a type I really enjoy, and it showed here. I scored a poor 57 points, compared to Amit's 88. Prasanna did well by finishing all puzzles just within 90 minutes.

Puzzles were top-notch quality from Ashish, but they were too hard for PR. I'm not surprised to see the participation low, but a little surprised by some regular names missing, including ones in the current Top-10.

Lot of changes in the top-10 after this round. Amit takes the top spot with a good lead, Swaroop moves to 3rd, above Rajesh, and finally, Rakesh moves above Kishore.
Its getting interesting, and I hope the remaining 3 rounds are better, way better. 


Round 4: Regions (28th - 30th Nov, 2015)
Author: Rakesh Rai
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

This was one of my better performances in a puzzle contest in recent times. The puzzles were of my liking. Yin Yang, Spiral Galaxies, Area Division are in my all-time favourites, and it was wonderful to solve this set. Puzzles were really fun and it was a better set than the last 3 PR rounds.

I topped the round by finishing in 48mins and was ranked 11th internationally, which is my best rank after Twist way back in 2011. Amit did well by finishing in 56mins, Prasanna finished in 64mins and Swaroop in 83mins.

That put me on top in PR rankings and also got me my best LMI Rating in Puzzles! So, a pretty good weekend!


Round 3: Evergreens (31st Oct - 2nd Nov, 2015)
Author: Amit Sowani
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

That was hard! Especially for the type of rounds expected in PR. Well, it was time to improvise. Since this resulted in a very low scoring round, the scoring system was changed to add a bit of normalization so that such variability in the difficulty of tests can be overcome to some extent.

Even though I topped the round (among Indians), it didn't feel like a smooth performance. Felt like I could've added some 8-10 points more to my score of 73.

Prasanna tested the puzzles, so you won't his name on the scorepage. Congrats to Rajesh and Swaroop for their good performances.


Round 2: Number Placement (26th - 28th Sep, 2015)
Author: Deb Mohanty
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

I didn't do too well. Couldn't finish the round, and got stuck up in too many puzzles during the test.
Puzzles were really nice. Much better than Deb's SM round :-)

Congrats to Prasanna who finished the set in 72 minutes and Amit who just managed to finish it before time. I scored 97.4 points.

This was supposed to be one of the rounds I was most comfortable with, and it bombed. Hope to cover-up in the next few rounds. I also hope this is the worst performance which will get discarded (along with R1 which I authored).


Round 1: Classics (5th - 7th Sep, 2015)
Author: Rohan Rao
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

Congrats to Prasanna, Swaroop and Amit for completing the set. Prasanna finished in 50mins which put him in 12th place worldwide. Swaroop and Amit were very close and finished just one second apart in the 77th minute.

Its nice to see my three team-mates, who will represent India at the WPC along with me, performing at the top among the Indians.

Congrats to Endo, Ulrich and Hideaki who take the top-3 international spots.

Overall, I'm glad the feedback was positive and most participants enjoyed the puzzles. There was some discussion around one puzzle, Hitori Blocks, being a tad harder than the rest for this set. I agree it was a little outlier, but it didn't affect rankings and performances much. Most of the results were as expected.

Seems like a good start to PR... 57 Indians with non-zero scores and totally 304 participants. I hope these numbers increase in subsequent rounds. And I'll be participating in the coming rounds! :-)

Friday, September 4, 2015

Sudoku Mahabharat 2016


The online rounds of Sudoku Mahabharat 2015-2016 have ended! This is a national level event aimed at encouraging sudoku solvers of India to participate and compete with the top solvers to gain experience and improve competition in the years to come.

NOTE: This event serves as a qualifier to participate in the Indian Sudoku Championship 2016

The championship consisted of 8 online rounds (Aug-2015 to Mar-2016) from which the top solvers will be invited to participated in the national finals.

Championship Page

National Finals
The finals will be held on 17th July, 2016 in Chennai.

View Finals Page


Online Top-10
1. Rohan Rao - 600.0
2. Kishore Kumar - 529.6
3. Rakesh Rai - 517.8
4. Jayant Ameta - 468.3
5. Jaipal Reddy - 441.8
6. Amit Sowani - 435.1
7. Suvarna - 413.6
8. Gaurav Kumar Jain - 406.59. Rajesh Kumar - 397.2
10. Shaheer Rahman - 383.3

View Complete Results

P.S. Prasanna's name is removed from list since he has a wild card for the WSC next year on being the best Indian performer at the WSC this year.


Round 8: Irregular (12th - 14th Mar, 2016)
Authors: Akash Doulani, Amit Sowani and Gaurav Kumar Jain
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

Smooth ending. Really good set of sudokus considering a majority of them were authored by first-time author Akash.

Overall, a good end to SM and I'm glad I was able to top every round from the eligible participants.

The Top-10 look more-or-less as I was expecting except for 7th place Suvarna (don't know who she/he is), but looking forward to a grand national finals in Chennai in mid-July.


Round 7: Converse (12th - 15th Feb, 2016)
Authors: Harmeet Singh and Rakesh Rai
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

I started badly, struggling on the Average sudokus... Took 11 minutes for the 6x6 and 13 minutes for the 9x9. But from there, it went really smooth and was able to cover up some lost time. I finished the set in 67mins, ahead of Prasanna in 84mins.

Really good performance by Shaheer Rahman, who gets his first podium in SM, scoring 84 points.
We have a newcomer 'Suvarna' in 2nd place with 88 points, who also enters the overall Top-10, quite an unknown player and it remains to see how she will perform at the national finals.

Well, since the final score is Best 6 out of 8 rounds and having topped 6 rounds, I will have a perfect score of 600 irrespective of the outcome of the last round. I can't say I wasn't expecting this, with Rishi's and Prasanna's absence, but its good to have achieved it.


Round 6: Twisted Classics (9th - 11th Jan, 2016)
Author: Rajesh Kumar
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

This went well. Had a couple of minor stumbles, but overall it was a smooth solve. The test was quite easy, especially compared to some of the previous rounds.

I completed the set in 41mins. Kishore finished in 58mins and Rakesh in 63mins.
I must mention a standout performance by Hemant Malani, who is one of our Sudoku Champs toppers, who finished the set in 80mins, ranking him 7th among Indians, which places him above some of the regular experienced folks. Hope to see some more strong performances like this in the future.

Its nice to see many Indians completing this set and a better participation level.

Now with 6 rounds completed, the top-10 look more-or-less stable, with few changes expected after the last two rounds.


Round 5: Outside (14th - 16th Nov, 2015)
Author: Rishi Puri
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

I was on a good streak before this contest, with some of my best performances in LMI puzzle test and Fed Sudoku. Unfortunately, it didn't continue here. I started well, but then broke the 9x9 Skyscraper. It was hard to find the error, so I just erased and started again. Lost over 10mins here.

I made mistakes while solving a 9x9 Classic too. And the icing on the cake was when I submitted the first 6x6 Classic incorrectly. And it drove me mad that I had to restart it twice to finally solve it correctly. I know I don't like 6x6, but what was wrong with me?

Overall, the participation was low. I still managed to finish the set in 76mins and be the top Indian after Prasanna, who finished in a great time of 62mins. Rakesh and Kishore missed out on a couple of sudokus but were not far behind.

With that, the top-10 remain the same with a couple of swaps.


Round 4: Math (14th - 16th Nov, 2015)
Author: Rohan Rao
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

Congrats to Prasanna for a very good score and to Kishore for topping my round among the eligible Indians. Jaipal and Rajesh were 2nd and 3rd with their strong performances. Nice to see them back among the top.

I really enjoyed creating this set and personally liked it much better than most of the other contests I've authored on LMI.
There is some feedback on the sudokus being hard. Yes, it was intended and since the scoring system has changed to normalize the points, tests can have varying difficulty without much effect on the score distribution.

The GroupSum 6x6 was my favourite of the set while the GroupSum 9x9 and Equal Product 9x9 were the hardest puzzles.

I'm glad most solvers enjoyed the round and its delighting to see close to 350 participants worldwide and over 100 Indians.


Round 3: Odd-Even (24th - 26th Oct, 2015)
Authors: Ashish Kumar and Swaroop Guggilam
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

I finished the set in 46 minutes. It was one of my smoothest solves on LMI. I started slowly, taking 13 minutes for the Odd-Event Count variants, but after that it was brisk.
So far, 3/3. Next SM Round is authored by me.

Wonderful sudokus by Swaroop and Ashish. I must mention that I loved the Quadro 9x9. Fantastic sudoku by Swaroop. Odd or Even 9x9 and Odd Sum 9x9 were good too.

Congrats to Prasanna, Rakesh, and Jayant for the other top Indians. A below par performance by Kishore who finished 6th among Indians.


Round 2: Neighbours (12th - 14th Sep, 2015)
Authors: Aditi Seshadri and Prasanna Seshadri (P.S. - They are not related :-) )
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

I finished the set in 52 minutes, made a small mistake in Touchy 6x6, which I was quickly able to correct. It was my best online contest this year.

Wonderful sudokus by Aditi and Prasanna. I loved the Touchy 9x9, Quadruple 9x9 and Repeated Neighbours 9x9. Touchy was just fantastic for rule usage whereas Quadruple had a slightly different rule which turned into a very nice variant.

The set was a little harder than SM1, but had very interesting set of sudokus.

Congrats to Jayant for completing in 59 minutes. I think this is one of his strongest performances in a sudoku contest. I hope his form continues. Kishore completed in 73 minutes to take the 3rd spot among the Indians.


Round 1: Standard (22nd - 24th Aug, 2015)
Author: Deb Mohanty
Download Instruction Booklet
Download Puzzle Booklet

View Results
View Forum

I finished the set in 49 minutes, stumbled slightly in a couple of grids, but overall it was a smooth solve.

Nice set of sudokus, but it didn't give that 'Deb' feeling :-)
A fun solve though.

Congrats to Kishore who finished the set in 52 minutes and a shocking performance by Prasanna who finished in 64 minutes. He certainly messed up, but we all have our bad days. Rishi, the fourth member of our Indian team at WSC has quit from sudoku solving and tested the set. Its sad to hear that, but I hope he continues to help LMI and the puzzle community in India.