Let's say I have the following data table:
dt=data.table(type=c('big','medium','small','small'
,'medium','small','small'
,'big','medium','small','small')
,category=letters[1:11])
type category
1: big a
2: medium b
3: small c
4: small d
5: medium e
6: small f
7: small g
8: big h
9: medium i
10: small j
11: small k
In this case I have a category hierarchy: the 'big' type is the same for all rows until a following 'big' type is seen. And the behavior is the same for every type.
The reshape I want must give me the following:
dt=data.table(type=c('big','medium','small','small'
,'medium','small','small'
,'big','medium','small','small')
,category=letters[1:11])
big medium small
1: a b c
2: a b d
3: a e f
4: a e g
5: h i j
6: h i k
As you can see each category only changes when a register of the same category is found, the order is important to set this categories.
Do you think there is a way to do this without using a for?
Title. stata.com. reshape — Convert data from wide to long form and vice versa.
setDT converts lists (both named and unnamed) and data. frames to data. tables by reference. This feature was requested on Stackoverflow.
dcast: Convert data between wide and long forms.
Here's an approach that you can use. You'll need na.locf
from "zoo":
library(data.table)
library(zoo)
First, we need to figure out the final rows. To do this, we need to explicitly define what the order of the types is, as you can start from the same dt
and get different results, if the order is changed (that's what the match
part does). Once you have the numeric order, if the diff is less than or equal to zero, that means it's going to be a new row in the new table:
dt[, rid := match(type, c('big', 'medium', 'small'))][, row := cumsum(diff(c(0, rid)) <= 0)]
This is what the data looks like now:
dt
# type category rid row
# 1: big a 1 0
# 2: medium b 2 0
# 3: small c 3 0
# 4: small d 3 1
# 5: medium e 2 2
# 6: small f 3 2
# 7: small g 3 3
# 8: big h 1 4
# 9: medium i 2 4
#10: small j 3 4
#11: small k 3 5
Here it is in the form you've requested:
na.locf(dcast(dt, row ~ type, value.var = "category"))
# row big medium small
# 1: 0 a b c
# 2: 1 a b d
# 3: 2 a e f
# 4: 3 a e g
# 5: 4 h i j
# 6: 5 h i k
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With