top of page
Search

Revisiting the Titanic in R

Writer's picture: David RaxénDavid Raxén

About a year ago I finished up my first machine learning project in Python. I did it on the "classic" (In the same sense as the Iris dataset is classic when it comes to visualisation) Titanic dataset and it was a cool experience. The kernel can be found here: https://www.kaggle.com/davidraxen/titanic-my-first-real-machine-learning-model and I learned alot by doing it.


I'm getting better at R, but I'm pretty much at the same place as I was with Python a year ago so I felt that revisiting the dataset - but this time using R would be a good way to get an idea of any major differences.


The result can be found here: https://www.kaggle.com/davidraxen/titanic-2-this-time-it-s-r and I actually managed to get a better score! My first try I could predict the survival rate for 77% of the passengers and this time I bettered my result all the way to 80%! Which isn't half bad!


The changes are not coming from the change of language ;-) But rather from some slight tweaks to the input to the models.


Something that always gets me when using R is that I still have big problems "guessing" how to tackle a problem if I don't already know an ok approach. In Python I'm still a lot quicker coming up with solutions - having said that, I'm often amazed with how "clean" the solutions are in R.


Want to add two dataframes together?

full2  <- bind_rows(train, test) 

Boom, done! There are people with names with fewer letters than that!

A table with the mean value from one column sorted by categories from another?

aggregate(full$Survived, by=list(full$Title), FUN=mean, na.rm=TRUE)

Boom, oneliner!

And there are many more examples to be found - and I'm more than certain that there is waaay cleaner ways to do things then how I'm doing them as of now.


But more than anything - when it comes to plotting graphs. The R way of just "adding" features to a plot comes really naturally. And even though the Seaborn package in Python can make some really pretty plots I've got to say that the ggplot2-plots have a really satisfying look to them.


And mean.. Being able to create this beauty that makes it clear that kids had a significantly higher survival rate (And that people over 60 did Not) with just these lines of R is pretty amazing: (-See the full kernel for more information)


ggplot(full[1:891,], aes(x = Age, fill = Survived)) +
      geom_density(alpha = 0.4)  + 
      ggtitle("Density Plot for Age in relation to Survival")

15 views0 comments

Recent Posts

See All

Comentarios


©2019 by David Raxén. Proudly created with Wix.com

bottom of page