Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting a dataframe by column name indices

Tags:

r

This is a variation of an earlier question.

df <- data.frame(matrix(rnorm(9*9), ncol=9))
names(df) <- c("c_1", "d_1", "e_1", "a_p", "b_p", "c_p", "1_o1", "2_o1", "3_o1")

I want to split the dataframe by the index that is given in the column.names after the underscore "_". (The indices can be any character/number in different lengths; these are just random examples).

indx <- gsub(".*_", "", names(df))

and name the resulting dataframes accordingly n the end i would like get three dataframes, called:

  • df_1
  • df_p
  • df_o1

Thank you!

like image 836
nouse Avatar asked Dec 16 '14 09:12

nouse


2 Answers

Here, you can split the column names by indx, get the subset of data within the list using lapply and [, set the names of the list elements using setNames, and use list2env if you need them as individual datasets (not so recommended as most of the operations can be done within the list and later if you want, it can be saved using write.table with lapply.

 list2env(
     setNames(
       lapply(split(colnames(df), indx), function(x) df[x]),
                paste('df', sort(unique(indx)), sep="_")), 
                                              envir=.GlobalEnv)


head(df_1,2)
#      c_1        d_1        e_1
#1  1.0085829 -0.7219199  0.3502958
#2 -0.9069805 -0.7043354 -1.1974415


head(df_o1,2)
#     1_o1      2_o1       3_o1
#1 0.7924930  0.434396  1.7388130
#2 0.9202404 -2.079311 -0.6567794

head(df_p,2)
#      a_p       b_p        c_p
#1 -0.12392272 -1.183582  0.8176486
#2  0.06330595 -0.659597 -0.6350215

Or using Map. This is similar to the above approach ie. split the column names by indx and use [ to extract the columns, and the rest is as above.

list2env(setNames(Map(`[` , 
       list(df), split(colnames(df), indx)),
          paste('df',unique(sort(indx)), sep="_")), envir=.GlobalEnv)

Update

You can do:

 indx1 <- factor(indx, levels=unique(indx))
 split(colnames(df), indx1)
like image 196
akrun Avatar answered Oct 08 '22 21:10

akrun


you can try this :

 invisible(sapply(unique(indx),
                  function(x)                      
                     assign(paste("df",x,sep="_"),
                            df[,grepl(paste0("_",x,"$"),colnames(df))],
                            envir=.GlobalEnv)))

# the code applies to each unique element of indx the assignement (in the global environment) 
# of the columns corresponding to indx in a new data.frame, named according to the indx.
# invisible function avoids that the data.frames are printed on screen.

> ls()
[1] "df"    "df_1"  "df_o1" "df_p"  "indx"  

> df_1[1:3,]
         c_1        d_1        e_1
1  1.8033188  0.5578494  2.2458750
2  1.0095556 -0.4042410 -0.9274981
3  0.7122638  1.4677821  0.7770603

> df_o1[1:3,]
         1_o1        2_o1       3_o1
1 -2.05854176 -0.92394923 -0.4932116
2 -0.05743123 -0.24143979  1.9060076
3  0.68055653 -0.70908036  1.4514368

> df_p[1:3,]
         a_p        b_p        c_p
1 -0.2106823 -0.1170719  2.3205184
2 -0.1826542 -0.5138504  1.9341230
3 -1.0551739 -0.2990706  0.5054421
like image 3
Cath Avatar answered Oct 08 '22 20:10

Cath