This is a variation of an earlier question.
df <- data.frame(matrix(rnorm(9*9), ncol=9))
names(df) <- c("c_1", "d_1", "e_1", "a_p", "b_p", "c_p", "1_o1", "2_o1", "3_o1")
I want to split the dataframe by the index that is given in the column.names after the underscore "_". (The indices can be any character/number in different lengths; these are just random examples).
indx <- gsub(".*_", "", names(df))
and name the resulting dataframes accordingly n the end i would like get three dataframes, called:
Thank you!
Here, you can split the column names by indx
, get the subset of data within the list using lapply
and [
, set the names of the list elements using setNames
, and use list2env
if you need them as individual datasets (not so recommended as most of the operations can be done within the list and later if you want, it can be saved using write.table
with lapply
.
list2env(
setNames(
lapply(split(colnames(df), indx), function(x) df[x]),
paste('df', sort(unique(indx)), sep="_")),
envir=.GlobalEnv)
head(df_1,2)
# c_1 d_1 e_1
#1 1.0085829 -0.7219199 0.3502958
#2 -0.9069805 -0.7043354 -1.1974415
head(df_o1,2)
# 1_o1 2_o1 3_o1
#1 0.7924930 0.434396 1.7388130
#2 0.9202404 -2.079311 -0.6567794
head(df_p,2)
# a_p b_p c_p
#1 -0.12392272 -1.183582 0.8176486
#2 0.06330595 -0.659597 -0.6350215
Or using Map
. This is similar to the above approach ie. split the column names by indx
and use [
to extract the columns, and the rest is as above.
list2env(setNames(Map(`[` ,
list(df), split(colnames(df), indx)),
paste('df',unique(sort(indx)), sep="_")), envir=.GlobalEnv)
You can do:
indx1 <- factor(indx, levels=unique(indx))
split(colnames(df), indx1)
you can try this :
invisible(sapply(unique(indx),
function(x)
assign(paste("df",x,sep="_"),
df[,grepl(paste0("_",x,"$"),colnames(df))],
envir=.GlobalEnv)))
# the code applies to each unique element of indx the assignement (in the global environment)
# of the columns corresponding to indx in a new data.frame, named according to the indx.
# invisible function avoids that the data.frames are printed on screen.
> ls()
[1] "df" "df_1" "df_o1" "df_p" "indx"
> df_1[1:3,]
c_1 d_1 e_1
1 1.8033188 0.5578494 2.2458750
2 1.0095556 -0.4042410 -0.9274981
3 0.7122638 1.4677821 0.7770603
> df_o1[1:3,]
1_o1 2_o1 3_o1
1 -2.05854176 -0.92394923 -0.4932116
2 -0.05743123 -0.24143979 1.9060076
3 0.68055653 -0.70908036 1.4514368
> df_p[1:3,]
a_p b_p c_p
1 -0.2106823 -0.1170719 2.3205184
2 -0.1826542 -0.5138504 1.9341230
3 -1.0551739 -0.2990706 0.5054421
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With