About a year ago I finished up my first machine learning project in Python. I did it on the "classic" (In the same sense as the Iris dataset is classic when it comes to visualisation) Titanic dataset and it was a cool experience. The kernel can be found here: https://www.kaggle.com/davidraxen/titanic-my-first-real-machine-learning-model and I learned alot by doing it.
I'm getting better at R, but I'm pretty much at the same place as I was with Python a year ago so I felt that revisiting the dataset - but this time using R would be a good way to get an idea of any major differences.
The result can be found here: https://www.kaggle.com/davidraxen/titanic-2-this-time-it-s-r and I actually managed to get a better score! My first try I could predict the survival rate for 77% of the passengers and this time I bettered my result all the way to 80%! Which isn't half bad!
The changes are not coming from the change of language ;-) But rather from some slight tweaks to the input to the models.
Something that always gets me when using R is that I still have big problems "guessing" how to tackle a problem if I don't already know an ok approach. In Python I'm still a lot quicker coming up with solutions - having said that, I'm often amazed with how "clean" the solutions are in R.
Want to add two dataframes together?
full2 <- bind_rows(train, test)
Boom, done! There are people with names with fewer letters than that!
A table with the mean value from one column sorted by categories from another?
aggregate(full$Survived, by=list(full$Title), FUN=mean, na.rm=TRUE)
Boom, oneliner!
And there are many more examples to be found - and I'm more than certain that there is waaay cleaner ways to do things then how I'm doing them as of now.
But more than anything - when it comes to plotting graphs. The R way of just "adding" features to a plot comes really naturally. And even though the Seaborn package in Python can make some really pretty plots I've got to say that the ggplot2-plots have a really satisfying look to them.
And mean.. Being able to create this beauty that makes it clear that kids had a significantly higher survival rate (And that people over 60 did Not) with just these lines of R is pretty amazing: (-See the full kernel for more information)
ggplot(full[1:891,], aes(x = Age, fill = Survived)) +
geom_density(alpha = 0.4) +
ggtitle("Density Plot for Age in relation to Survival")
Comentarios