Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshape multiple value columns to wide format

Tags:

r

r-faq

reshape

I have the following data frame and i want to use cast to create a "pivot table" with columns for two values (value and percent). Here is the data frame:

expensesByMonth <- structure(list(month = c("2012-02-01", "2012-02-01", "2012-02-01",  "2012-02-01", "2012-02-01", "2012-02-01", "2012-02-01", "2012-02-01",  "2012-02-01", "2012-02-01", "2012-02-01", "2012-02-01", "2012-03-01",  "2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01",  "2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01",  "2012-03-01", "2012-03-01", "2012-03-01", "2012-03-01", "2012-04-01",  "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01",  "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01",  "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01", "2012-04-01",  "2012-04-01", "2012-04-01", "2012-05-01", "2012-05-01", "2012-05-01",  "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01",  "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01",  "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01", "2012-05-01",  "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01",  "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01",  "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01",  "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01", "2012-06-01",  "2012-07-01", "2012-07-01", "2012-07-01", "2012-07-01", "2012-07-01",  "2012-07-01", "2012-07-01", "2012-07-01", "2012-07-01", "2012-07-01",  "2012-07-01", "2012-07-01", "2012-07-01"),  expense_type = c("Adjustment", "Bank Service Charge", "Cable", "Clubbing", "Dining", "Education",  "Gifts", "Groceries", "Lunch", "Personal Care", "Rent", "Transportation",  "Adjustment", "Bank Service Charge", "Cable", "Clubbing", "Dining",  "Gifts", "Groceries", "Lunch", "Medical Expenses", "Miscellaneous",  "Personal Care", "Phone", "Recreation", "Rent", "Transportation",  "Adjustment", "Bank Service Charge", "Clothes", "Clubbing", "Computer",  "Dining", "Gifts", "Groceries", "Lunch", "Maintenance", "Medical Expenses",  "Miscellaneous", "Personal Care", "Phone", "Recreation", "Rent",  "Transportation", "Travel", "Bank Service Charge", "Cable", "Clothes",  "Clubbing", "Computer", "Dining", "Electric", "Gifts", "Groceries",  "Lunch", "Maintenance", "Medical Expenses", "Miscellaneous",  "Personal Care", "Phone", "Recreation", "Rent", "Transportation",  "Adjustment", "Bank Service Charge", "Cable", "Charity", "Clothes",  "Computer", "Dining", "Education", "Electric", "Gifts", "Groceries",  "Lunch", "Maintenance", "Medical Expenses", "Miscellaneous",  "Personal Care", "Phone", "Recreation", "Rent", "Transportation",  "Computer", "Gifts", "Groceries", "Lunch", "Maintenance", "Medical Expenses",  "Miscellaneous", "Personal Care", "Phone", "Recreation", "Rent",  "Repair and Maintenance", "Transportation"),  value = c(442.37, 200, 21.33, 75, 22.5, 1800, 10, 233.33, 154.75, 30, 545, 32.5,  2, 200, 36.33, 206.55, 74.5, 89, 372.68, 383.75, 144.19, 508.11,  30, 38.4, 81.75, 1746.7, 35, 16.37, 200, 806.9, 324.81, 756,  80.5, 100, 398.37, 326.25, 151, 29.95, 101, 90, 38.45, 61, 743.75,  129, 228.53, 200, 39.05, 237, 40, 283.83, 141.32, 32.88, 30,  424.4, 412, 142.75, 86.55, 1051.5, 30, 38.9, 51.5, 749.7, 35,  10, 200, 16, 32.59, 149.81, 100, 80, 60, 31.91, 55, 397.25, 486.4,  115.6, 47.08, 1000, 120, 41.11, 256, 761.6, 55, 10.54, 10, 342.11,  291, 76.5, 66.8, 1008, 30, 41.11, 316, 765, 65, 62),  percent = c(0.124025030980324, 0.0560729845967511, 0.00598018380724351, 0.0210273692237817,  0.0063082107671345, 0.50465686137076, 0.00280364922983756, 0.0654175474797997,  0.0433864718317362, 0.00841094768951267, 0.152798883026147, 0.00911185999697206,  0.000506462461002391, 0.0506462461002391, 0.00919989060410842,  0.0523049106600219, 0.018865726672339, 0.0225375795146064, 0.0943742149831854,  0.0971774847048337, 0.0365134111259673, 0.128669320529962, 0.00759693691503586,  0.0097240792512459, 0.0207016530934727, 0.442318990316438, 0.00886309306754183,  0.00357276925628781, 0.0436502047194601, 0.176106750940662, 0.0708901149746392,  0.164997773839559, 0.0175692073995827, 0.0218251023597301, 0.0869446602704567,  0.0712043964486193, 0.0329559045631924, 0.00653661815673915,  0.0220433533833274, 0.0196425921237571, 0.00839175185731621,  0.0133133124394353, 0.162324198800492, 0.0281543820440518, 0.0498769064226911,  0.0496724104530621, 0.00969853814096037, 0.0588618063868785,  0.00993448209061241, 0.070492601294463, 0.0350985252261336, 0.0081661442784834,  0.00745086156795931, 0.105404854981398, 0.102325165533308, 0.035453682960873,  0.0214957356235626, 0.261152697956974, 0.00745086156795931, 0.00966128383312057,  0.0127906456916635, 0.186197030583303, 0.00869267182928586, 0.00249044292527426,  0.0498088585054852, 0.00398470868043882, 0.00811635349346881,  0.0373093254635337, 0.0249044292527426, 0.0199235434021941, 0.0149426575516456,  0.00794700337455016, 0.0136974360890084, 0.09893284520652, 0.12113514388534,  0.0287895202161704, 0.0117250052921912, 0.249044292527426, 0.0298853151032911,  0.0102382108658025, 0.0637553388870211, 0.189672133188888, 0.0136974360890084,  0.00341757293956667, 0.0032424790697976, 0.110928451456846, 0.0943561409311103,  0.0248049648839517, 0.021659760186248, 0.326841890235599, 0.00972743720939281,  0.013329831455938, 0.102462338605604, 0.248049648839517, 0.0210761139536844,  0.0201033702327451)),  .Names = c("month", "expense_type", "value", "percent"),  row.names = c(NA, -96L),  class = "data.frame" ) 

This is what i would like to create (of course, with different header names like: [month]_value, [month]_percent):

expenses   value     percent value.1   percent.1 value.2   percent.2 value.3   percent.3 value.4   percent.4 value.5   percent.5 1              Adjustment  442.37 0.124025031    2.00 0.000506462   16.37 0.003572769    0.00 0.000000000   10.00 0.002490443    0.00 0.000000000 2     Bank Service Charge  200.00 0.056072985  200.00 0.050646246  200.00 0.043650205  200.00 0.049672410  200.00 0.049808859    0.00 0.000000000 3                   Cable   21.33 0.005980184   36.33 0.009199891    0.00 0.000000000   39.05 0.009698538   16.00 0.003984709    0.00 0.000000000 4                 Charity    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000   32.59 0.008116353    0.00 0.000000000 5                 Clothes    0.00 0.000000000    0.00 0.000000000  806.90 0.176106751  237.00 0.058861806  149.81 0.037309325    0.00 0.000000000 6                Clubbing   75.00 0.021027369  206.55 0.052304911  324.81 0.070890115   40.00 0.009934482    0.00 0.000000000    0.00 0.000000000 7                Computer    0.00 0.000000000    0.00 0.000000000  756.00 0.164997774  283.83 0.070492601  100.00 0.024904429   10.54 0.003417573 8                  Dining   22.50 0.006308211   74.50 0.018865727   80.50 0.017569207  141.32 0.035098525   80.00 0.019923543    0.00 0.000000000 9               Education 1800.00 0.504656861    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000   60.00 0.014942658    0.00 0.000000000 10               Electric    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000   32.88 0.008166144   31.91 0.007947003    0.00 0.000000000 11                  Gifts   10.00 0.002803649   89.00 0.022537580  100.00 0.021825102   30.00 0.007450862   55.00 0.013697436   10.00 0.003242479 12              Groceries  233.33 0.065417547  372.68 0.094374215  398.37 0.086944660  424.40 0.105404855  397.25 0.098932845  342.11 0.110928451 13                  Lunch  154.75 0.043386472  383.75 0.097177485  326.25 0.071204396  412.00 0.102325166  486.40 0.121135144  291.00 0.094356141 14            Maintenance    0.00 0.000000000    0.00 0.000000000  151.00 0.032955905  142.75 0.035453683  115.60 0.028789520   76.50 0.024804965 15       Medical Expenses    0.00 0.000000000  144.19 0.036513411   29.95 0.006536618   86.55 0.021495736   47.08 0.011725005   66.80 0.021659760 16          Miscellaneous    0.00 0.000000000  508.11 0.128669321  101.00 0.022043353 1051.50 0.261152698 1000.00 0.249044293 1008.00 0.326841890 17          Personal Care   30.00 0.008410948   30.00 0.007596937   90.00 0.019642592   30.00 0.007450862  120.00 0.029885315   30.00 0.009727437 18                  Phone    0.00 0.000000000   38.40 0.009724079   38.45 0.008391752   38.90 0.009661284   41.11 0.010238211   41.11 0.013329831 19             Recreation    0.00 0.000000000   81.75 0.020701653   61.00 0.013313312   51.50 0.012790646  256.00 0.063755339  316.00 0.102462339 20                   Rent  545.00 0.152798883 1746.70 0.442318990  743.75 0.162324199  749.70 0.186197031  761.60 0.189672133  765.00 0.248049649 21 Repair and Maintenance    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000   65.00 0.021076114 22         Transportation   32.50 0.009111860   35.00 0.008863093  129.00 0.028154382   35.00 0.008692672   55.00 0.013697436   62.00 0.020103370 23                 Travel    0.00 0.000000000    0.00 0.000000000  228.53 0.049876906    0.00 0.000000000    0.00 0.000000000    0.00 0.000000000 

I also encountered the following error while using cast on a single value column: it does not take into account the "value" parameter. So, even if i specify value = "percent" it still displays the values from "value" column.

cast(expensesByMonth, expense_type ~ month, fun.aggregate = sum, value = "percent") 
like image 811
Alex Burdusel Avatar asked Jul 23 '12 07:07

Alex Burdusel


People also ask

How do I change my data frame from long to wide?

To reshape the dataframe from long to wide in Pandas, we can use Pandas' pd. pivot() method. columns : Column to use to make new frame's columns (e.g., 'Year Month'). values : Column(s) to use for populating new frame's values (e.g., 'Avg.

What is reshape data?

Data Reshaping in R is about changing the way data is organized into rows and columns. Most of the time data processing in R is done by taking the input data as a data frame.

What is reshape package in R?

reshape2 is an R package written by Hadley Wickham that makes it easy to transform data between wide and long formats.


2 Answers

Your best option is to reshape your data to long format, using melt, and then to dcast:

library(reshape2)  meltExpensesByMonth <- melt(expensesByMonth, id.vars=1:2) dcast(meltExpensesByMonth, expense_type ~ month + variable, fun.aggregate = sum) 

The first few lines of output:

             expense_type 2012-02-01_value 2012-02-01_percent 2012-03-01_value 2012-03-01_percent 1              Adjustment           442.37        0.124025031             2.00       0.0005064625 2     Bank Service Charge           200.00        0.056072985           200.00       0.0506462461 3                   Cable            21.33        0.005980184            36.33       0.0091998906 4                 Charity             0.00        0.000000000             0.00       0.0000000000 
like image 121
Andrie Avatar answered Sep 20 '22 05:09

Andrie


data.table can cast on multiple value.var variables. This is quite direct (and efficient).

Therefore:

library(data.table) # v1.9.5+ dcast(setDT(expensesByMonth), expense_type ~ month, value.var = c("value", "percent")) 
like image 23
Arun Avatar answered Sep 21 '22 05:09

Arun