top of page
Search

Creating and customizing a boxplot in R

Writer's picture: David RaxénDavid Raxén

(The datasets I use to create my plots, and the code to scrape the data from Wikipedia using Python, can be found here: https://github.com/davidraxen/wikiscraping/ )



First I read my dataset using the read.delim() function.


players <- read.delim("Chelsea_players.csv", header=TRUE, sep=",", fill = TRUE)

I know that the "Minute"-column in this dataset has got "extra time" written in as "+X" which messes up the values a bit. It's not a perfect solution - but to make it a bit easier to work with if we just set all of those goals as minute "45" and "90" by removing the + and everything that follows.


events$Minutes2 <- as.character(events$Minute)
events$Minutes2 <- sapply(strsplit(events$Minutes2,"\\+"), `[`, 1)
events$Minutes2 <- as.numeric(events$Minutes2)

The boxplot function is pretty straight forward in R. You just need to write boxplot(data) and it will create a boxplot for your data.

The boxplot has some added features that is explained in the comments:


boxplot(events$Minutes2, main = "Minute for Goal", ylab = "Minute")
boxplot(events$Age, main = "Age for goalscorer", ylab = "Age")

I'm going to filter out all goals scored by other teams than Chelsea and focus on three competitions. Premier League, Champions League and the FA Cup.

events2 = events[which(events$Event == "Goal" & (events$Competition == "Premier League" | events$Competition == "FA Cup" | events$Competition == "UEFA Champions League") & ((events$HomeTeam == "Chelsea" & events$Home.Away == "H") | (events$AwayTeam == "Chelsea" & events$Home.Away == "A"))),]
events2 <- droplevels(events2) #This is needed to do after filtering to not get "empty" boxes for every level that has been removed.

Let's have a look a little closer att when Chelsea scores!

I'm going to create 4 different plots where I add some features to each plot.


First plot is the same as above with a title and custom y-label - but I'm also adding ~Home.Away to create distinct boxplots for Home & Away games.


Second plot will add *Competition to the ~Home.Away-group to create further distinct boxplots for each competition.


The third plot will add the feature "varwidth=TRUE" which whill make the width smaller/larger depending on the total sum of goals for each competition.


And finally the fourth plot will add some fancy colors to the boxes and make the ticks on the axis smaller!


attach(events2) # First we attach the dataframe so we can access the the columns withouth the "events2$"

par(mfrow = c(2,2)) # I'm going to create 2 x 2 plots where I add features to each one.

#1. 
boxplot(Minutes2~Home.Away, main = "Minute for Goal", ylab = "Minute", xlab="")
#2
boxplot(Minutes2~Home.Away*Competition, main = "Minute for Goal", ylab = "Minute", xlab="")
#3
boxplot(Minutes2~Home.Away*Competition, varwidth=TRUE, main = "Minute for Goal", ylab = "Minute", xlab="")
#4
boxplot(Minutes2~Home.Away*Competition, varwidth=TRUE, main = "Minute for goal per competition", col = c("red", "blue"), ylab = "Minute", xlab="", cex.axis = 0.4)

The fourth plot is starting to look like something! But labels on the axis are way too small - and being Blue the red color hurt my eyes. So I'm going to make some further changes that I will note by making comments! :)


boxplot(Minutes2~Home.Away*Competition, horizontal=TRUE, #Making the boxplots horizontal instead.
        las = 1, varwidth=TRUE, main = "Minute for goal per competition", #The las argument rotates the tick marks 
        col = c("white", "blue"), xlab = "Minute", ylab="", 
        cex = 0.5, cex.axis = 0.4) # Changing size on the main text and the ticks
abline(v = 45, lty = "dotted", col ="lightgray") #adding a grey vertical line at the 45 minute mark.

This is an ok plot. But I'm going to make some more changes to it. Again in 4 steps:

1. Adding the color arguments to the plot in the par-function (but setting it to the default white background and black frame and setting the axis and the ticks to the color gray47. Changing the colors to blue and different shade of blue and renaming the competitions to shorter versions of their names.

2. Making changes to the margins of the plot by using the "mar"-argument and changing the size of the text with the cex-arguments. Then reintroducing the vertical line - but this time at the 50 minute mark


par(mfrow = c(1,2)) # I'm going to create 2 x 2 plots where I add features to each one.
#1
par(bg = "white", fg = "black", col.axis = "gray47", las=1) #Features to the plot 'canvas'
boxplot(Minutes2~Home.Away*Competition, horizontal=TRUE, border = "black",
        varwidth=TRUE, main = "Minute for goal per competition",
        col = c("deepskyblue", "blue"), xlab = "Minute", ylab="",
        names=c("FA Cup (A)", "FA Cup (H)", "PL (A)", "PL (H)", "CL (A)", "CL (H)")) #Setting the names in the order in which they appeared.
#2
par(bg = "white", fg = "white", col.axis = "gray47", mar = c(7,3,3,3), cex = 0.7, las = 1)
boxplot(Minutes2~Home.Away*Competition, horizontal=TRUE, border = "black",
        varwidth=TRUE, main = "Minute for goal per competition",
        col = c("deepskyblue", "blue"), xlab = "Minute", ylab="",
        names=c("FA Cup (A)", "FA Cup (H)", "PL (A)", "PL (H)", "CL (A)", "CL (H)"))
abline(v = 50, lty = "dotted", col ="firebrick2")

The plot on the right side actually looks pretty good! But I'm going to add a few extra things just to make it even easier to follow. The changes will be described with comments this time! :)


par(bg = "white", fg = "white", col.axis = "gray47", mar = c(7,12,5,4), cex = 0.65, las = 1)
boxplot(Minutes2~Home.Away*Competition, horizontal=TRUE, border = "black",
        varwidth=TRUE, main = "Minute for goal per competition",
        col = c("deepskyblue", "blue"), xlab = "Minute", ylab="",
        names=c("FA Cup (A)", "FA Cup (H)", "PL (A)", "PL (H)", "CL (A)", "CL (H)"), cex = 0.5)
abline(v = 50, lty = "dotted", col ="firebrick2")
#-------- New things
axis(1, col = "gray47", at = c(0,20,40,60,80,100,120)) # Adding an axis to the plot again.
legend("right", title = "Home or Away", title.col = "black", c("Home","Away"),
       text.col = c("blue","deepskyblue"), text.font = 2, cex = 1.2) #Adding a legend to the plot and setting the text color to match resp. box.
mtext(" 50 minute mark -->", 
      side = 3, line =-2, adj=0.28, cex =0.7, col="firebrick2") # adds text next to the vertical line annotating that it is the 50 minute mark.
mtext("Data source: The wikipedia pages for each Chelsea-season from 00/01 to present.",
    side = 1, line = 3, adj = 1.2, col = "dodgerblue4", cex = .4) #adds text to lower right side of the plot side 1 means at the bottom (2,3,4 -> L,U,R), #adj sets the distance from the plot
detach(events2)

Now those are some fine looking boxes!

16 views0 comments

Recent Posts

See All

Comentários


©2019 by David Raxén. Proudly created with Wix.com

bottom of page