Tudor Lapusan's Blog

Post info:

My first experience with Kaggle kernels

When I’m playing on Kaggle, usually I choose python and sklearn. The usually default tool to write the code in is jupyter notebook, but now I decided to try for the first time kaggle kernels.

It is pretty easy to create a new kernel. All you need to do is to choose a competition, click ‘Kernels’ submenu and then click ‘New Kernel’

new kernel

As you can see from above picture, I choose the well known Titanic competition.

After you click the ‘New Kernel’ button, you have the option to choose where you will write your next kernel : ‘Script’ or ‘Notebook’.

jupyter

The script option simulate your local python command line and the notebook option simulate a jupyter from your personal computer. Both options support python, R and markdown.

As you may probably guessed, I went with jupyter option. Here you can find the final version of it!
Bellow you can see how the jupyter kernel looks right after you create a new one.

kernel intro

This interface contains the basic functionalities for writing your kernel like choosing the programming language (python or R), running a particular cell or the entire kernel, adding external datasets.
Behind each kernel is a docker container which has mounted the input datasets from competition, so you don’t have to bother with downloading and saving the datasets anymore.
The docker container is also pre-loaded with the most common data science libraries, you can read its Dockerfile to see all available libraries. Maybe you are asking yourself : What if I need a library which is not pre-loaded into the kernel ? I didn’t find in this situation until now, but after a simple search on google about this functionality it seems to be not so straightforward. There seems to be a question like this on the kaggle blog.
I was pretty surprised to see that the kernels have a lot of RAM memory, around 17 GBs. This is more than the majority of laptops have. Plans for 2018 include enabling kaggle users to use kernels with their own private datasets, access GPUs and support more complex pipelines. Sounds great !

Another nice feature that comes with kernels is that you can submit your solution directly from it !
To submit your solution to kaggle competitions, usually you need to send your results into a csv file.

This is how I saved the results into a csv file from my kernel for Titanic competition.
save_output
After you press the “Commit & Run” button, you will see the “Output” tab into the kernel menu.

submit results

In my case, I saved seven result files. You can choose one and press the “Submit to Competition” to directly submit to Titanic competition.
In the first days when I started to use kernels I wanted to stop using it because I had inconsistent file output results. It was really annoying. After one week I discover it was because of my incorrect use of kernel but it wasn’t very evident. Let’s explain why : I thought that the output file results are saved after each run of the “save_submition_file(filename, passengerId, predictions)” function during the development of the kernel(meaning the kernel was in edit mode). I used to train many versions of the decision tree model and to run the same kernel cell to call the save_submition_file function to save my results with different filenames. In that time the “Run & Commit” button was named “Publish”.
After each push of “Publish” instead to see all the expected output files, I could see only one file. After many attempts, I came to the conclusion that after each “Publish” push, the entire kernel is also run again, not only published. I thought that publish means only saving a new version of the kernel…
At least now the name of the button was changed from “Publish” to “Run & Commit” and is a little more intuitive that the all code from kernel follows to be run again. I do not know who was to blame, me or the old kernels UX :))

Other thing I didn’t like was the error logging after pressing the “Run & Commit” button. I had an error in the middle of my kernel I didn’t observe and because of it, after pressing “Run & Commit” button, the kernel didn’t generate any output files. I looked in the logs but nothing to raise a sign about my error. It would be very helpful to see in logs a message with my error or to have after each run a stats about number of notebook cells with success/fail status.

I have to admit that kaggle engineers are working hard to fix and improve kernels. Above issues were identified around January 2018 during my work on Titanic kernel. Maybe at the time you are reading this article, these issues are already solved 😉

As a conclusion, it was a cool experience to work in kernels.
I do like that Kaggle provide this service, especially for sharing of knowledge in machine learning. I learned many things, got new ideas reading other kernels. Also you can receive comments, votes on your kernel which you can see them as a reward from the kaggle community. From here you can read how to increase your rating for kernels, competitions or discussions.
I would be glad to hear how was your experience using kaggle kernels! You can leave a comment if you want to share 😉