I have a dfrm of over 100 columns and 150 rows. I need to merge the contents of every 4 columns to 1 (preferably separated by a "/", although dispensable) which is simple enough, performing apply(dfrm[ ,1:4], 1, paste, collapse="/")
. I have difficulties scaling that solution to my whole df. In other words:
How can I go from this:
loc1 loc1.1 loc1.2 loc1.3 loc2 loc2.1 loc2.2 loc2.3
ind.1 257 262 228 266 204 245 282 132
ind.2 244 115 240 187 196 133 189 251
ind.3 298 139 216 225 219 276 192 254
ind.4 129 176 180 182 215 250 227 186
ind.5 238 217 284 240 131 184 247 168
To something like this:
loc1 loc2
ind.1 257/262/228/266 204/245/282/132
ind.2 244/115/240/187 196/133/189/251
ind.3 298/139/216/225 219/276/192/254
ind.4 129/176/180/182 215/250/227/186
ind.5 238/217/284/240 131/184/247/168
In a dataframe of over 100 rows and columns. I've tried indexing the data frame as presented in the solution of this question, but after creating said index of every 4 columns y do find myself lost while trying to perform do.call
over my data frame. I'm sure there must be a easy solution for this, but please keep in mind that i'm all but proficient in R.
Also; the colnames are not a real problem if the body is in shape, since extracting a list of names is performed by loc <- colnames(dfrm)
and loc <- loc[c(T, F, F, F)
, and then defining colnames(dfrm) <- loc
, although would be nice if incorporated.
DataFrames consist of rows, columns, and data. To combine the values of all the column and append them into a single column, we will use apply() method inside which we will write our expression to do the same. Whenever we want to perform some operation on the entire DataFrame, we use apply() method.
You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .
Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.
By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.
This is certainly not pretty, but it works:
do.call(cbind, lapply(1:ceiling(ncol(df)/4), function(i)
apply(df[,seq(4*(i-1)+1, min(4*i, ncol(df))), drop = F],
1, paste, collapse = "/")))
# [,1] [,2]
#ind.1 "257/262/228/266" "204/245/282/132"
#ind.2 "244/115/240/187" "196/133/189/251"
#ind.3 "298/139/216/225" "219/276/192/254"
#ind.4 "129/176/180/182" "215/250/227/186"
#ind.5 "238/217/284/240" "131/184/247/168"
The ceiling
and drop
are there to survive edge cases when number of columns is not divisible by 4. Also, note that the end result is a matrix
here (thanks to the apply
), and you can convert it back to data.frame
if you like (and assign whatever column names).
Way late to the party, but I think this is a little cleaner (and robust to non multiple of 4 column counts):
as.data.frame(
lapply(
split.default(df, (1:ncol(df) - 1) %/% 4),
function(x) do.call(paste, c(x, list(sep="/"))
) ) )
Splitting the data frame by columns using (1:ncol(df) - 1) %/% 4)
creates groups of four columns (or fewer if you have a non-mulitple of four for the last group), which then makes it trivial to pass on to paste
. Note we have to use split.default
because split.data.frame
will attempt to split by row instead of column. Produces:
X0 X1
1 257/262/228/266 204/245/282/132
2 244/115/240/187 196/133/189/251
3 298/139/216/225 219/276/192/254
4 129/176/180/182 215/250/227/186
5 238/217/284/240 131/184/247/168
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With