I'm going to keep using the dataset I created from scraping Chelsea data from wikipedia (A post about it can be found here: https://davidraxen.wixsite.com/website/post/data-cleaning-in-pandas-using-regex-s-and-more-regex-s where there also is a link to both the dataset I'm using and the code to get it yourself)
In the dataset where I've stored information about individual players number of games and goals (per competition) I've also stored information about the age of each player August 1st (I decided that's when the seasons start) to be able to do age specific analysis for the different Chelsea teams over the years.
It can also be done to create a cool plot to see individual players goal distribution per season! And that's what I'm going to show here. On the one and only Super Frank!
First I nead to read in my csv-file and filter out all players that are not named Frank Lampard from the dataset.
players <- read.delim("Chelsea_players.csv", header=TRUE, sep=",", fill = TRUE)
Lampard <- players[which(players$Name == "Frank Lampard"),]
Next I need to clean the data a bit. I'm going to add the goals and fixtures together - so I need to get rid of all the "NA" values. (A player in this dataset gets a NA value from Europa League games the years that Chelsea didn't play Europa League for instance.) I'm also going to convert the column "AgeAugust" into a format that R recognises as dates using the as.Date()-function.
(Just to be sure nothing gets weird I also use the order()-function to order my Lampard-dataset by date and make sure to drop all not used levels that lingers from reading the original dataset.
Lampard[is.na(Lampard)] <- 0
Lampard$AgeAugust <- as.Date(Lampard$AgeAugust)
Lampard <- Lampard[order(Lampard$AgeAugust),]
Lampard <- droplevels(Lampard)
When that is done and done I'm going to add all goals and fixtures together - and make a column where I add all european cup goals into the same column.
Lastly I'm going to "attach" the dataset which makes it possible to use it's columns without first specifyng which dataset they are included in. (Lampard$Goals can be written as just Goals)
Lampard$Goals <- Lampard$PremierLeagueGoals + Lampard$FaCupGoals + Lampard$LeagueCupGoals + Lampard$UEFACupGoals + Lampard$ChampionsLeagueGoals + Lampard$EuropaLeagueGoals
Lampard$Apps <- Lampard$PremierLeagueApps + Lampard$FaCupApps + Lampard$LeagueCupApps + Lampard$UEFACupApps + Lampard$ChampionsLeagueApps + Lampard$EuropaLeagueApps
Lampard$EU.Goals <- Lampard$UEFACupGoals + Lampard$ChampionsLeagueGoals + Lampard$EuropaLeagueGoals
attach(Lampard) #remember to detach!
Next I'm going to make four small plots where I go through each step to make a pretty good plot! And after that I'm going to add a little more flair and make a final plot.
First of i use the par()-statement, which is used for arguments to the "main" plot. Here I say that I'm going to plot 4 plots in a 2x2-grid
! In plot "a" (you can see all four plots below) the first plot is a basic line chart which is how many plots where you follow a kpi over time period looks. The plot function takes a couple of arguments. The dates for the x-axis first (AgeAugust), the number of total Goals for the y-axis 2nd.
It specifies a type (b stands for "both" which means both lines and points), pch is a symbol (20 is a circle, 17 is a triangle etc.), lty specifies which line type and main specifies the title for the plot.
par(mfrow = c(2,2))
plot(AgeAugust, Goals, type = "b", pch = 20,
lty = "solid", main = "a. Line chart of Lampards Goals")
For the "b"-plot I'm going add two lines that shows how many goals of the total that was done in the league and how many in the european competitions. I'm also going to change the type to "l" - which removes the dots and I'm going to change the y-axis limits to make sure that 0 is included.
Below the plot I'm going to add two "lines" by using the lines()-function. Most arguments are the same as in the main plot - but I'm going to change both the color (col) and the width of the lines (lwd).
plot(AgeAugust, Goals, type = "l", ylim = c(0,30),
lty = "solid", main = "b. Add lines for Prem. League and Euro. Comp")
lines(AgeAugust, PremierLeagueGoals, lty = "dashed", col = "red", lwd = 2)
lines(AgeAugust, EU.Goals, lty = "dashed", col = "navyblue", lwd = 1)
The "c" plot is where the magic happens! Here I'm going to change the lines that indicates the total amount of goals per season into majestic bars instead. The bars also get a width (lwd) and a color (col). In order to have rectangles as bars I also use the (lend)-argument as "butt".
plot(AgeAugust, Goals, type = "h", ylim = c(0,30),
lty = "solid", lwd = 17, main = "c. Change to Histogram", col = "gray67",
lend = "butt")
lines(AgeAugust, PremierLeagueGoals, lty = "dashed", col = "red", lwd = 2)
lines(AgeAugust, EU.Goals, lty = "dashed", col = "navyblue", lwd = 1)
And for the final "d"-plot I'm going to add a legend that i place using "topleft", a vector with the "names" for each line, a vector with the same colors (in the same order) as the other lines. And lastly the argument bty which is the boxtype (I guess!) that i set to "n" for none.
plot(AgeAugust, Goals, type = "h", ylim = c(0,30),
lty = "solid", lwd = 17, main = "d. Add Legend", col = "gray67",
lend = "butt")
lines(AgeAugust, PremierLeagueGoals, lty = "dashed", col = "red", lwd = 2)
lines(AgeAugust, EU.Goals, lty = "dashed", col = "navyblue", lwd = 1)
legend("topleft", c("Total Goals", "Prem. League", "Europ. Comp"),
text.col = c("gray67", "red", "navyblue"), bty = "n" )
And there you have it! A a perfectly fine plot.
But - with a little bit of polish it can look even cooler.
I'm going to add another plot below where I change the color of the plot, I add a third axis to the right that shows the number of games and a fourth line to go with it.
I also swap the original x-axis (with dates) for the season instead to make the plot a bit clearer. If the code above was easy to follow and you want to try it out yourself I'm going to add the code and the plot below!
#--- Storing some args as variables to make the code less messy
main = "Frank Lampard - Goals per Season"
labels = c("Total Goals", "Prem. League", "Euro. Comp" , "Total Apps")
colors = c("gray67", "red", "dodgerblue2", "goldenrod1")
#--- Using par() for main plot.
par(mar = c(6,6,5,7), cex = .8,
bg = 'gray7', fg = "white", col.axis = "white",
col.lab = "white", col.main = "dodgerblue1")
#--- Creating the Histogram for goals per season.
plot(AgeAugust, Goals, type = "h", lty = "solid",
lwd = 35, main = main, col = "gray67", ylim = c(0,30),
lend = "butt", bty="l", ylab="Goals", xlab = "",
xaxt = "n") #xaxt = "n" surpresses x-axis
#---- Adding a third axis to the right
axis(4, at = seq(0, 30, length.out=5),
labels = c("0", "", "30", "", "60"),
las =1, tick = T,
cex.lab = 0.5, col= "goldenrod1", col.axis= "goldenrod2")
#---- Adding a new X-axis with seasons instead of dates.
axis(1, at=AgeAugust, labels=Season, las = 2)
#-- Adding the three extra lines. Note that the "apps"-line..
# is divided by 2 to fit the primary Y-axis.
lines(AgeAugust, PremierLeagueGoals, lty = "dashed", col = "red", lwd = 2)
lines(AgeAugust, EU.Goals, lty = "dashed", col = "dodgerblue1", lwd = 1)
lines(AgeAugust, Apps/2, lty = "dotdash", col = "goldenrod1", lwd = 3)
#--- Adding a legend.
legend("topright", labels,
text.col = colors, bty = "n", cex = 0.8)
#--- Adding a text to the bottom right indicating the source...
# for the data
mtext("Data source: The wikipedia pages for each Chelsea-season from 00/01 to present.",
side = 1, line = 4.4, adj = 1.8, col = "dodgerblue3", cex = .6)
Comments