Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Pie chart with percentage as labels using ggplot2

From a data frame I want to plot a pie chart for five categories with their percentages as labels in the same graph in order from highest to lowest, going clockwise.

My code is:

League<-c("A","B","A","C","D","E","A","E","D","A","D")
data<-data.frame(League) # I have more variables 

p<-ggplot(data,aes(x="",fill=League))
p<-p+geom_bar(width=1)
p<-p+coord_polar(theta="y")
p<-p+geom_text(data,aes(y=cumsum(sort(table(data)))-0.5*sort(table(data)),label=paste(as.character(round(sort(table(data))/sum(table(data)),2)),rep("%",5),sep="")))
p

I use

cumsum(sort(table(data)))-0.5*sort(table(data))

to place the label in the corresponding portion and

label=paste(as.character(round(sort(table(data))/sum(table(data)),2)),rep("%",5),sep="")

for the labels which is the percentages.

I get the following output:

Error: ggplot2 doesn't know how to deal with data of class uneval
like image 220
pescobar Avatar asked Oct 15 '14 21:10

pescobar


People also ask

How do you show percentages and labels in a pie chart?

To display percentage values as labels on a pie chart On the design surface, right-click on the pie and select Show Data Labels. The data labels should appear within each slice on the pie chart.

How do you show percentages on a pie chart in R?

Pie chart in R with percentage An alternative to display percentages on the pie chart is to use the PieChart function of the lessR package, that shows the percentages in the middle of the slices.

Does Ggplot have pie charts?

ggplot2 does not offer any specific geom to build piecharts. The trick is the following: input data frame has 2 columns: the group names ( group here) and its value ( value here) build a stacked barchart with one bar only using the geom_bar() function.

How do I make a pie of pie chart in R?

In R the pie chart is created using the pie() function which takes positive numbers as a vector input. The additional parameters are used to control labels, color, title etc.


1 Answers

I've preserved most of your code. I found this pretty easy to debug by leaving out the coord_polar... easier to see what's going on as a bar graph.

The main thing was to reorder the factor from highest to lowest to get the plotting order correct, then just playing with the label positions to get them right. I also simplified your code for the labels (you don't need the as.character or the rep, and paste0 is a shortcut for sep = "".)

League<-c("A","B","A","C","D","E","A","E","D","A","D")
data<-data.frame(League) # I have more variables 

data$League <- reorder(data$League, X = data$League, FUN = function(x) -length(x))

at <- nrow(data) - as.numeric(cumsum(sort(table(data)))-0.5*sort(table(data)))

label=paste0(round(sort(table(data))/sum(table(data)),2) * 100,"%")

p <- ggplot(data,aes(x="", fill = League,fill=League)) +
  geom_bar(width = 1) +
  coord_polar(theta="y") +
  annotate(geom = "text", y = at, x = 1, label = label)
p

The at calculation is finding the centers of the wedges. (It's easier to think of them as the centers of bars in a stacked bar plot, just run the above plot without the coord_polar line to see.) The at calculation can be broken out as follows:

table(data) is the number of rows in each group, and sort(table(data)) puts them in the order they'll be plotted. Taking the cumsum() of that gives us the edges of each bar when stacked on top of each other, and multiplying by 0.5 gives us the half the heights of each bar in the stack (or half the widths of the wedges of the pie).

as.numeric() simply ensures we have a numeric vector rather than an object of class table.

Subtracting the half-widths from the cumulative heights gives the centers each bar when stacked up. But ggplot will stack the bars with the biggest on the bottom, whereas all our sort()ing puts the smallest first, so we need to do nrow - everything because what we've actually calculate are the label positions relative to the top of the bar, not the bottom. (And, with the original disaggregated data, nrow() is the total number of rows hence the total height of the bar.)

like image 80
Gregor Thomas Avatar answered Sep 21 '22 06:09

Gregor Thomas