Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop columns when splitting data frame in R

Tags:

r

data.table

I am trying to split data table by column, however once I get list of data tables, they still contains the column which data table was split by. How would I drop this column once the split is complete. Or more preferably, is there a way how do I drop multiple columns.

This is my code:

x <- rnorm(10, mean = 5, sd = 2)
y <- rnorm(10, mean = 5, sd = 2)
z <- sample(5, 10, replace = TRUE)
dt <- data.table(x, y, z)

split(dt, dt$z)

The resulting data table subsets looks like that

$`1`
          x        y z
1: 6.179790 5.776683 1
2: 5.725441 4.896294 1
3: 8.690388 5.394973 1

$`2`
          x        y z
1: 5.768285 3.951733 2
2: 4.572454 5.487236 2

$`3`
          x        y z
1: 5.183101 8.328322 3
2: 2.830511 3.526044 3

$`4`
          x        y z
1: 5.043010 5.566391 4
2: 5.744546 2.780889 4

$`5`
          x          y z
1: 6.771102 0.09301977 5

Thanks

like image 899
Laurynas Stašys Avatar asked Nov 07 '25 20:11

Laurynas Stašys


2 Answers

Splitting a data.table is really not worthwhile unless you have some fancy parallelization step to follow. And even then, you might be better off sticking with a single table.

That said, I think you want

split( dt[, !"z"], dt$z )

# or more generally

mysplitDT <- function(x, bycols) 
  split( x[, !..bycols], x[, ..bycols] )

mysplitDT(dt, "z")

You would run into the same problem if you had a data.frame:

df = data.frame(dt)
split( df[-which(names(df)=="z")], df$z )
like image 51
Frank Avatar answered Nov 10 '25 09:11

Frank


First thing that came to mind was to iterate through the list and drop the z column.

lapply(split(dt, dt$z), function(d) { d$z <- NULL; d })

And I just noticed that you use the data.table package, so there is probably a better, data.table way of achieving your desired result.

like image 26
ialm Avatar answered Nov 10 '25 11:11

ialm