Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Grouped bar plot in ggplot

I have a survey file in which row are observation and column question.

Here are some fake data they look like:

People,Food,Music,People P1,Very Bad,Bad,Good P2,Good,Good,Very Bad P3,Good,Bad,Good P4,Good,Very Bad,Very Good P5,Bad,Good,Very Good P6,Bad,Good,Very Good 

My aim is to create this kind of plot with ggplot2.

  • I absolutely don't care of the colors, design, etc.
  • The plot doesn't correspond to the fake data

enter image description here

Here are my fake data:

raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",") raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) 

But if I choose Y as count then I'm facing an issue about choosing the X and the Group values... I don't know if I can succeed without using reshape2... I've also tired to use reshape with melt function. But I don't understand how to use it...

like image 802
S12000 Avatar asked Aug 10 '13 03:08

S12000


People also ask

How do I make a grouped bar chart in R?

Then the user needs to call the geom_bar() function from the ggplot package with the required parameters into it to create the grouped bar plot in the R programming language. geom_bar() function: This function makes the height of the bar proportional to the number of cases in each group.

How do you make a stacked bar chart in R?

In order to create a stacked bar chart, also known as stacked bar graph or stacked bar plot, you can use barplot from base R graphics. Note that you can add a title, a subtitle, the axes labels with the corresponding arguments or remove the axes setting axes = FALSE , among other customization arguments.


1 Answers

EDIT: Eight years later...

This needs a tidyverse solution, so here is one, with all non-base packages explicitly stated so that you know where each function comes from (except for read.csv which is from utils which comes with base R):

library(magrittr) # needed for %>% if dplyr is not attached  "http://pastebin.com/raw.php?i=L8cEKcxS" %>%   read.csv(sep = ",") %>%   tidyr::pivot_longer(cols = c(Food, Music, People.1),                       names_to = "variable",                       values_to = "value") %>%   dplyr::group_by(variable, value) %>%   dplyr::summarise(n = dplyr::n()) %>%   dplyr::mutate(value = factor(     value,     levels = c("Very Bad", "Bad", "Good", "Very Good"))   ) %>%   ggplot2::ggplot(ggplot2::aes(variable, n)) +   ggplot2::geom_bar(ggplot2::aes(fill = value),                     position = "dodge",                     stat = "identity") 

The original answer:

First you need to get the counts for each category, i.e. how many Bads and Goods and so on are there for each group (Food, Music, People). This would be done like so:

raw <- read.csv("http://pastebin.com/raw.php?i=L8cEKcxS",sep=",") raw[,2]<-factor(raw[,2],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) raw[,3]<-factor(raw[,3],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE) raw[,4]<-factor(raw[,4],levels=c("Very Bad","Bad","Good","Very Good"),ordered=FALSE)  raw=raw[,c(2,3,4)] # getting rid of the "people" variable as I see no use for it  freq=table(col(raw), as.matrix(raw)) # get the counts of each factor level 

Then you need to create a data frame out of it, melt it and plot it:

Names=c("Food","Music","People")     # create list of names data=data.frame(cbind(freq),Names)   # combine them into a data frame data=data[,c(5,3,1,2,4)]             # sort columns  # melt the data frame for plotting data.m <- melt(data, id.vars='Names')  # plot everything ggplot(data.m, aes(Names, value)) +      geom_bar(aes(fill = variable), position = "dodge", stat="identity") 

Is this what you're after?

enter image description here

To clarify a little bit, in ggplot multiple grouping bar you had a data frame that looked like this:

> head(df)   ID Type Annee X1PCE X2PCE X3PCE X4PCE X5PCE X6PCE 1  1    A  1980   450   338   154    36    13     9 2  2    A  2000   288   407   212    54    16    23 3  3    A  2020   196   434   246    68    19    36 4  4    B  1980   111   326   441    90    21    11 5  5    B  2000    63   298   443   133    42    21 6  6    B  2020    36   257   462   162    55    30 

Since you have numerical values in columns 4-9, which would later be plotted on the y axis, this can be easily transformed with reshape and plotted.

For our current data set, we needed something similar, so we used freq=table(col(raw), as.matrix(raw)) to get this:

> data    Names Very.Bad Bad Good Very.Good 1   Food        7   6    5         2 2  Music        5   5    7         3 3 People        6   3    7         4 

Just imagine you have Very.Bad, Bad, Good and so on instead of X1PCE, X2PCE, X3PCE. See the similarity? But we needed to create such structure first. Hence the freq=table(col(raw), as.matrix(raw)).

like image 139
jakub Avatar answered Oct 01 '22 22:10

jakub