I have a data frame like the following:
year income group
1 2008 27907 Under25
2 2009 25522 Under25
3 2010 26777 Under25
4 2008 58809 Age25_34
5 2009 57239 Age25_34
6 2010 58558 Age25_34
7 2008 75677 Age35_44
8 2009 74900 Age35_44
9 2010 74136 Age35_44
10 2008 78537 Age45_54
11 2009 77460 Age45_54
12 2010 76266 Age45_54
13 2008 69009 Age55_64
14 2009 67586 Age55_64
15 2008 44402 Age65_74
16 2009 46147 Age65_74
17 2010 48595 Age65_74
18 2008 32747 Over75
19 2009 31272 Over75
20 2010 31638 Over75
> str(df)
'data.frame': 20 obs. of 3 variables:
$ year : int 2008 2009 2010 2008 2009 2010 2008 2009 2010 2008 ...
$ income: int 27907 25522 26777 58809 57239 58558 75677 74900 74136 78537 ...
$ group : Factor w/ 7 levels "Age25_34","Age35_44",..: 7 7 7 1 1 1 2 2 2 3 ...
I would like to use cast to find the mean by group. In addition, I would like to create a wide data.frame from this df where the first column is year and the following columns are incomes for the different groups. For Example
year under25 Age25_34 Age35_44 Age45_54 ...
2008 27907 58809 75677 78537 ...
2009 25522 57239 74900 77460 ...
...
When I try cast I get the following error:
cast(df, income ~ group, mean) Using group as value column. Use the value argument to cast to override this choice Error in
[.data.frame
(data, , variables, drop = FALSE) : undefined columns selected
What am I doing wrong with the cast command?
How would I convert this to the wide format as shown in the example?
My R version information is listed below.
> unlist(R.Version())
platform arch os
"x86_64-pc-mingw32" "x86_64" "mingw32"
system status major
"x86_64, mingw32" "" "2"
minor year month
"13.1" "2011" "07"
day svn rev language
"08" "56322" "R"
version.string
"R version 2.13.1 (2011-07-08)"
The R melt() and cast() functions help us to reshape the data within a data frame into any customized shape.
Melting in R programming is done to organize the data. It is performed using melt() function which takes dataset and column values that has to be kept constant. Using melt(), dataframe is converted into long format and stretches the data frame.
The melt() function is used to convert a data frame with several measurement columns into a data frame in this canonical format, which has one row for every observed (measured) value.
Try this with cast
cast(df, year ~ group, mean, value = 'income')
year Age25_34 Age35_44 Age45_54 Age55_64 Age65_74 Over75 Under25
1 2008 58809 75677 78537 69009 44402 32747 27907
2 2009 57239 74900 77460 67586 46147 31272 25522
3 2010 58558 74136 76266 NaN 48595 31638 26777
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With