Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r data.table group by with no aggregate

Tags:

r

data.table

How do I get a data table in R to just return a column of grouped values where I am applying no other aggregate functions? Say I have:

test<-data.table(x=c(rep("a",2),rep("b",3)),y=1:5)

And I just want to return:

a
b

When I use:

test[,,by=x]

I get back:

   x y
1: a 1
2: a 2
3: b 3
4: b 4
5: b 5

And when I do:

test[,x,by=x]

I get back:

   x x
1: a a
2: b b

I know I can use:

test[,.(unique(x))]

But that doesn't seem like the right way to do it and besides what if I wanted to return two columns grouped?

like image 201
data_science_actuary Avatar asked May 28 '15 04:05

data_science_actuary


1 Answers

I'd accomplish this by applying unique() to a data.table containing just the subset of grouping columns in which I was interested. Handing a data.table to unique(), as below, will trigger a call to unique.data.table(), which works just as well for two or more columns as for one:

unique(test[, .(x)]) ## .() is data.table shorthand for list()
#    x
# 1: a
# 2: b

## Add another column to see that unique.data.table() works fine in that case as well 
test[, z:=c(1,1,1,2,2)]
unique(test[, .(x,z)])   
#    x z
# 1: a 1
# 2: b 1
# 3: b 2
like image 193
Josh O'Brien Avatar answered Oct 08 '22 13:10

Josh O'Brien