Tackling the same problem with Python or R

David Raxén
Apr 12, 2020
4 min read

So I've been learning Python off and on for over a year and I think that I probably can say that I'm getting sort of "okay" at it! Which is a big step for someone like me that haven't done any coding at all since like.. -03. And back then it was only making my profile page on the early social media sites look more pretty using some basic html and javascript.

But for a year or so I've been trying to up my skills in statistics, both for the fun of it - but also to help me with my job. And that lead me to Python which I found to be a good language to just get started with - there being loads of material online and good apps to use, like SoloLearn. (Which was very helpful for someone like myself that didn't really know anything about it)

Anyways, after taking a mental break for a couple of months (being on parental leave and being bummed out by the covid-19 situation). I finally decided to pick up a book again and I landed on Bayesian Statistics The Fun Way by Will Kurt, which I picked up on humblebundle a couple of weeks ago, because it uses R for the exercises and I thougth it would be a good way rehearse some statistics and to get better at using R in addition to Python.

Just some 30 pages or so into the book there was a small exercise where I was supposed to calculate the probability of getting more than 7 when rolling two six-sided dice and later on when using three dice. So I figured this would be a good way to write a function both in Python and in R to write functions where the amount of dice and the number of sides on the dice could be varied.

Using Python I already had a pretty straightforward way of doing this using a for loop to create a list of all the desired dice and then using the itertools module to combine all the different combinations of dice tosses into a (sometimes) long tuple.

I had used the itertools module before when making a script that could find the password for a locked pdf-file trying out all different combinations. - Which was surprisingly easy to do and if the password had 5 characters or less I could find it in just minutes! And I think that I could've cracked longer ones too... but then we would be talking years and not minutes, so I never tested it properly.

But when I had my tuple I'd just go through all items in it and if the sum was greater than r (what I wanted to roll higher than) I'd just add +1 to the count and when I'd gone through the whole tuple I'd have a count of all possible combinations of n number of dice with t sides that where greater than r. Finally I'd divide that count with the total number of possible combinations, using t^n, and BOOM there was my probability.

And here is the code snippet:

''' r = roll higher than, n = no. dices, t = no. sides on dices '''

import itertools



def prob_dice(r=7, n=2, t=6):
    count = 0

    dices = []

    for dice in range(n):
        
dices.append(list(range(1,t+1)))
    comb = list(itertools.product(*dices))
    
    for c in comb:
        if sum(c)>r:
             count +=1
    return count/(t**n)

After that I tried writing a function using R instead. I haven't really done many functions using R yet. I've mostly used it for linear algebra or to graph data. So my first instinct was to use the same approach. So I googled to find out if there was an itertools package for R, which there was. https://cran.r-project.org/web/packages/itertools/itertools.pdf ! And the product function seemed to be included.

But from what I gathered R is an excellent language for using vector based calculation. And this example is very much a vector based problem where the dice could be seen as vectors with the length being their sides. So I figured I might as well google how to "find all different combinations of any number of vector elements". And behold! There was a built in function in R called expand.grid that does exactly this.

When testing it out and seeing the grid it in fact produced remembered that another built in function called "apply" worked really when I was doing an R tutorial a couple of months ago where I was applying a function for each row in a grid. So I tried it out and ended up with a function that looked like this:

 prob_dice <- function(r=7, n=2, t=6){
 dice <- list()
 for(die in c(1:n)){
    dice[[die]] <- c(1:t)
    }
 wanted_outcome <- sum(apply(expand.grid(dice),1,sum) > r)
 total_outcomes <- t ** n
 return(wanted_outcome/total_outcomes)
 }

It pretty much does the same thing. It starts out with creating a list including all "n" dice with their "t" number of sides and then sums the count of outcomes where the combined rolls are larger than "r".

But I do think that there are differences - and I think that I prefer the R version a bit more. It's shorter and I didn't have to include an outside module/package to achieve what I wanted. But it got me thinking - I guess it's pretty common to try to solve a problem in a new language by trying to translate the way one would do it in a language that one is more comfortable with, but that probably isn't really a very smart way to do it since it means missing out on perks that the second language might have over the first.

How do you go about to not fall into this trap? If you have any tips and trix for me I'd love to hear from you!

Tackling the same problem with Python or R

Recent Posts

Comments