Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Restructure csv data for r and ggplot2

Tags:

r

ggplot2

I'm new to R and ggplot2. I have a csv file with beverage consumption data. The first column is the Year, and then the next 9 columns are beverage types like, coffee, tea, soda, etc., with values for the consumption amount for the year value of that row. The data covers a 41 year period. I've been researching this and trying many things. I can easily create a dot plot for any one type of beverage with ggplot.

However, I want to create horizontal stacked dot plots with Year on the x axis for each plot. So, there'd be a plot for coffee, and then right below it, one for tea, etc. I think I want to use facets. I'm also thinking I want to get my data restructured so it has 3 columns: one for year, one for "category" (i.e., coffee, tea, soda, etc.), and the last one for the value. My thinking is that once I get the data in that form, then using faceting should be straight forward.

Problem is, I can't seem to figure out how to get my data in that form. Here is how the first few rows of the data look:

Year    Whole Milk  Other Milk  Total Milk  Tea Coffee  Diet Soda   Regular Soda    Total Soda  Juice
1970    25.5    5.8 31.3    6.8 33.4    2.1 22.2    24.3    5.5
1971    25  6.3 31.3    7.2 32.2    2.2 23.3    25.5    5.8
1972    24.1    6.9 31  7.3 33.6    2.3 23.9    26.2    6

Can someone help me?

dput of the data is:

structure(list(Year = 1970:1972, `Whole Milk` = c(25.5, 25, 24.1
), `Other Milk` = c(5.8, 6.3, 6.9), `Total Milk` = c(31.3, 31.3, 
31), Tea = c(6.8, 7.2, 7.3), Coffee = c(33.4, 32.2, 33.6), `Diet Soda` = c(2.1, 
2.2, 2.3), `Regular Soda` = c(22.2, 23.3, 23.9), `Total Soda` = c(24.3, 
25.5, 26.2), Juice = c(5.5, 5.8, 6)), .Names = c("Year", "Whole Milk", 
"Other Milk", "Total Milk", "Tea", "Coffee", "Diet Soda", "Regular Soda", 
"Total Soda", "Juice"), class = "data.frame", row.names = c(NA, 
-3L))
like image 662
user1739283 Avatar asked Oct 11 '12 20:10

user1739283


1 Answers

I have a little saying that I use often for ggplot2, "When in doubt, melt". In the reshape package there is a function melt(), that does exactly this.

tmp <- structure(list(Year = 1970:1972, `Whole Milk` = c(25.5, 25, 24.1
), `Other Milk` = c(5.8, 6.3, 6.9), `Total Milk` = c(31.3, 31.3, 
31), Tea = c(6.8, 7.2, 7.3), Coffee = c(33.4, 32.2, 33.6), `Diet Soda` = c(2.1, 
2.2, 2.3), `Regular Soda` = c(22.2, 23.3, 23.9), `Total Soda` = c(24.3, 
25.5, 26.2), Juice = c(5.5, 5.8, 6)), .Names = c("Year", "Whole Milk", 
"Other Milk", "Total Milk", "Tea", "Coffee", "Diet Soda", "Regular Soda", 
"Total Soda", "Juice"), class = "data.frame", row.names = c(NA, 
-3L))

library(reshape) 

melt(tmp, id.vars="Year")

 Year     variable value
1  1970   Whole Milk  25.5
2  1971   Whole Milk  25.0
3  1972   Whole Milk  24.1
4  1970   Other Milk   5.8
5  1971   Other Milk   6.3
6  1972   Other Milk   6.9
7  1970   Total Milk  31.3
8  1971   Total Milk  31.3
9  1972   Total Milk  31.0
10 1970          Tea   6.8
11 1971          Tea   7.2
12 1972          Tea   7.3
13 1970       Coffee  33.4
...
like image 131
Brandon Bertelsen Avatar answered Oct 11 '22 01:10

Brandon Bertelsen