Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data Transformation With R

Here is my original output.

BizDate     Description                             TotalAmount TotalQty
2013-09-01 NIN JIOM COUGH SYRUP 75ML                11.8        2
2013-09-01 BREACOL COUGH SYRUP ADULTS 120ML         15.8        2
2013-09-02 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    8.5         1
2013-09-03 AFRICAN SEA COCONUT COUGH SYRUP 177ML    8.2         1
2013-09-01 THREE LEGS COOLING WATER 200ML           21.0        14
2013-09-01 SEAHORSE BRAND COOLING WATER 200ML       4.4         4
2013-09-05 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    8.5         1
2013-09-06 AFRICAN SEA COCONUT COUGH SYRUP 177ML    8.2         1

How would I transform it into something like this

 Description                              2013-09-01   2013-09-02 .......
 NIN JIOM COUGH SYRUP 75ML                some number some number .......
 BREACOL COUGH SYRUP ADULTS 120ML         some number some number .......
 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    some number some number .......
 AFRICAN SEA COCONUT COUGH SYRUP 177ML    some number some number .......
 THREE LEGS COOLING WATER 200ML           some number some number .......
 SEAHORSE BRAND COOLING WATER 200ML       some number some number .......
 AFRICAN SEA-COCONUT COUGH SYRUP 177ML    some number some number .......
 AFRICAN SEA COCONUT COUGH SYRUP 177ML    some number some number .......

Here is some fake data.

set.seed(1)
dat <- data.frame(product = rep(letters[1:4], each = 4)
    , date = sample(seq(as.Date("2013-01-01"), as.Date("2013-01-10"), by = 1), 16, TRUE)
    , amount = sample(1:100, 16, TRUE)
    , qty = sample(1:4, 16, TRUE))

dat
   product       date amount qty
1        a 2013-01-03     72   2
2        a 2013-01-04    100   1
3        a 2013-01-06     39   4
4        a 2013-01-10     78   3
5        b 2013-01-03     94   4
6        b 2013-01-09     22   1
7        b 2013-01-10     66   3
8        b 2013-01-07     13   2
9        c 2013-01-07     27   4
10       c 2013-01-01     39   3
11       c 2013-01-03      2   4
12       c 2013-01-02     39   3
13       d 2013-01-07     87   3
14       d 2013-01-04     35   4
15       d 2013-01-08     49   1
16       d 2013-01-05     60   2

My dumb way of doing this. Is there any smarter to do this? I am thinking to use plyr. just direct me to the right package or way.

dat1<-subset(dat,date=="2013-01-01")
colnames(dat1)[c(3,4)]<-c("2013-01-01.amount","2013-01-01.qty")
dat1<-dat1[-2]
dat2<-subset(dat,date=="2013-01-02")
colnames(dat2)[c(3,4)]<-c("2013-01-02.amount","2013-01-02.qty")
dat2<-dat2[-2]
dat3<-subset(dat,date=="2013-01-03")
colnames(dat3)[c(3,4)]<-c("2013-01-03.amount","2013-01-03.qty")
dat3<-dat3[-2]

new.dat<-merge(dat1,dat2,by="product",all.x=T,all.y=T)
new.dat<-merge(new.dat,dat3,by="product",all.x=T,all.y=T)
new.dat[is.na(new.dat)]<-0

new.dat
  product 2013-01-01.amount 2013-01-01.qty 2013-01-02.amount 2013-01-02.qty
1       a                 0              0                 0              0
2       b                 0              0                 0              0
3       c                39              3                39              3
  2013-01-03.amount 2013-01-03.qty
1                72              2
2                94              4
3                 2              4
like image 744
chee.work.stuff Avatar asked Mar 21 '23 16:03

chee.work.stuff


1 Answers

Considering the current form of your data, just use reshape from base R:

> reshape(dat, direction = "wide", idvar="product", timevar="date")
   product amount.2013-01-03 qty.2013-01-03 amount.2013-01-04 qty.2013-01-04 amount.2013-01-06
1        a                72              2               100              1                39
5        b                94              4                NA             NA                NA
9        c                 2              4                NA             NA                NA
13       d                NA             NA                35              4                NA
   qty.2013-01-06 amount.2013-01-10 qty.2013-01-10 amount.2013-01-09 qty.2013-01-09
1               4                78              3                NA             NA
5              NA                66              3                22              1
9              NA                NA             NA                NA             NA
13             NA                NA             NA                NA             NA
   amount.2013-01-07 qty.2013-01-07 amount.2013-01-01 qty.2013-01-01 amount.2013-01-02
1                 NA             NA                NA             NA                NA
5                 13              2                NA             NA                NA
9                 27              4                39              3                39
13                87              3                NA             NA                NA
   qty.2013-01-02 amount.2013-01-08 qty.2013-01-08 amount.2013-01-05 qty.2013-01-05
1              NA                NA             NA                NA             NA
5              NA                NA             NA                NA             NA
9               3                NA             NA                NA             NA
13             NA                49              1                60              2

Alternatively, you may consider the "reshape2" package:

library(reshape2)
datL <- melt(dat, id.vars=c("product", "date"))
dcast(datL, product ~ date + variable, value.var="value")
like image 196
A5C1D2H2I1M1N2O1R2T1 Avatar answered Apr 19 '23 15:04

A5C1D2H2I1M1N2O1R2T1