Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ggplot holes in stacked area chart

Here is a link to my data.

I use the following code:

#read in data
data = read.csv("ggplot_data.csv")

#order by group then year
data = arrange(data, group, year)

#generage ggplot stacked area chart
plot = ggplot(data, aes(x=year,y=value, fill=group)) +
  geom_area() 
plot

That produces the following chart: enter image description here

As you can see, there are odd holes in three different parts of this chart.

I previously had this issue and asked about it, and the answer provided then was that I needed to sort my data by group and then year. At the time, that answer fixed my holes. However, it doesn't seem to eliminate all the holes this time. Any help?

like image 244
Jim Avatar asked Apr 08 '16 22:04

Jim


1 Answers

The reason for the gaps is that some time series start later than others. When the first non-vanishing value appears, the new area starts with an non-continuous jump. The area just above is however connected to the next point by linear interpolation. This result in the gap.

For example, look at the left-most gap. The olive region starts just after the gap with a vertical jump in 1982. The green area, however, increases linearly from the value in 1981 (where the olive area is zero) to the value in 1982 (where the olive area suddenly contributes).

What you could do is, for instance, add a value of zero at the beginning of each time series that starts after 1975. I use dplyr functionality to create a data frame of these additional first years:

first_years <- group_by(data, group, group_id) %>%
               summarise(year = min(year) - 1) %>%
               filter(year > 1974) %>%
               mutate(value = 0, value_pct = 0)
first_years
## Source: local data frame [3 x 5] 
## Groups: group [3]
## 
##    group group_id  year value value_pct
##   (fctr)    (int) (dbl) (dbl)     (dbl)
## 1      c    10006  1981     0         0
## 2      e    10022  2010     0         0
## 3      i    24060  2002     0         0

As you can see, these three new values fit exactly the three gaps in your plot. Now, you can combine these new data frames with your data and sort in the same way as before:

data_complete <- bind_rows(data, first_years) %>%
                 arrange(year, group)

And the plot then has no gaps:

ggplot(data_complete, aes(x=year,y=value, fill=group)) +
  geom_area()

enter image description here

like image 183
Stibu Avatar answered Oct 12 '22 12:10

Stibu