Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I reshape a data.table when the order of the registers determines the category?

Let's say I have the following data table:

dt=data.table(type=c('big','medium','small','small'
                     ,'medium','small','small'
                     ,'big','medium','small','small')
             ,category=letters[1:11])

      type category
 1:    big        a
 2: medium        b
 3:  small        c
 4:  small        d
 5: medium        e
 6:  small        f
 7:  small        g
 8:    big        h
 9: medium        i
10:  small        j
11:  small        k

In this case I have a category hierarchy: the 'big' type is the same for all rows until a following 'big' type is seen. And the behavior is the same for every type.

The reshape I want must give me the following:

dt=data.table(type=c('big','medium','small','small'
                     ,'medium','small','small'
                     ,'big','medium','small','small')
              ,category=letters[1:11])


   big medium small
1:   a      b     c
2:   a      b     d
3:   a      e     f
4:   a      e     g
5:   h      i     j
6:   h      i     k

As you can see each category only changes when a register of the same category is found, the order is important to set this categories.

Do you think there is a way to do this without using a for?

like image 955
Aldo Pareja Avatar asked Mar 18 '16 21:03

Aldo Pareja


People also ask

What does reshape mean in Stata?

Title. stata.com. reshape — Convert data from wide to long form and vice versa.

What does setDT do in R?

setDT converts lists (both named and unnamed) and data. frames to data. tables by reference. This feature was requested on Stackoverflow.

What is Dcast function in R?

dcast: Convert data between wide and long forms.


1 Answers

Here's an approach that you can use. You'll need na.locf from "zoo":

library(data.table)
library(zoo)

First, we need to figure out the final rows. To do this, we need to explicitly define what the order of the types is, as you can start from the same dt and get different results, if the order is changed (that's what the match part does). Once you have the numeric order, if the diff is less than or equal to zero, that means it's going to be a new row in the new table:

dt[, rid := match(type, c('big', 'medium', 'small'))][, row := cumsum(diff(c(0, rid)) <= 0)]

This is what the data looks like now:

dt
#      type category rid row
# 1:    big        a   1   0
# 2: medium        b   2   0
# 3:  small        c   3   0
# 4:  small        d   3   1
# 5: medium        e   2   2
# 6:  small        f   3   2
# 7:  small        g   3   3
# 8:    big        h   1   4
# 9: medium        i   2   4
#10:  small        j   3   4
#11:  small        k   3   5

Here it is in the form you've requested:

na.locf(dcast(dt, row ~ type, value.var = "category"))
#    row big medium small
# 1:   0   a      b     c
# 2:   1   a      b     d
# 3:   2   a      e     f
# 4:   3   a      e     g
# 5:   4   h      i     j
# 6:   5   h      i     k
like image 145
A5C1D2H2I1M1N2O1R2T1 Avatar answered Sep 28 '22 07:09

A5C1D2H2I1M1N2O1R2T1