Search This Blog

Saturday, February 13, 2016

AirBnB New User Bookings


The AirBnB New User Bookings competition was held on Kaggle in Nov-15 to Feb-16.

Objective
The objective was to predict in which country a new user on AirBnB would make their first booking.

There were 11 potential countries along with a 12th class - NDF (No Destination Found), indicating the user did not make any booking.

Data
The data consisted of user characteristics like language, age, browser, date-of-account-creation, OS, etc. for the train and test users.

There was data on the actions taken by users on the website along with the details of the action and duration.

Model
It was evident that the best way to quickly get a good score was to focus on classifying the NDF vs non-NDF users. So, I built a Logistic Regression on the one-hot encoded action features from the sessions data as a binary classifier for NDF vs non-NDF, only considering users present in the sessions data. This was the base classifier.

I then built a meta classifier using, well, everyone's favourite nowadays, XGBoost. It used the raw user features, along with the one-hot encoded features from sessions data, and finally, the LR predictions.

I did not complicate the model or ensemble too much due to lack of time, and also since the CV and LB were not perfectly correlating. Hence, I chose fairly simple models with some feature engineering.

GitHub
View GitHub Repository for the complete code, results and output.

Results
This model scored 0.88081 on the public LB which was ranked 89 and scored 0.88625 on the private LB which was ranked 23.
The metric used was NDCG.

View Public LB
View Final Results

Views
It was a very interesting dataset, and a good practise in building features from the sessions data, and without that, it wasn't possible to get a good score. It was disappointing that I had so many ideas which involved a lot more time to try out and code, but wasn't able to.

So, I think it was a simple stable model with lesser overfitting compared to many other competitors who dropped on the private LB.
In the end, I'm happy with the result, and this improves my overall Kaggle rank to 96th. So, finally I get into the Top-100 and on the first page of the rankings :-)

Hoping to improve on this further this year, and hopefully get into the Top-50 or Top-25 some day.

Check out My Best Kaggle Performances

3 comments:

  1. Congrats Rohan...thanks a ton for sharing all your code in git..it helps a lot in learning...
    all the very best in your future competitions!!! :-)

    ReplyDelete
  2. Amazing man! Definitely you'll reach your target top -25 by this year end.

    ReplyDelete
  3. Congratulations! Hopefully you will reach top 25 withing this year.
    Thanks for the post,will go through your code soon.

    ReplyDelete