Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Paste every "X" columns to a single column in a dataframe

Tags:

dataframe

r

paste

I have a dfrm of over 100 columns and 150 rows. I need to merge the contents of every 4 columns to 1 (preferably separated by a "/", although dispensable) which is simple enough, performing apply(dfrm[ ,1:4], 1, paste, collapse="/"). I have difficulties scaling that solution to my whole df. In other words:

How can I go from this:

        loc1   loc1.1 loc1.2 loc1.3 loc2  loc2.1 loc2.2  loc2.3
ind.1    257    262    228    266    204    245    282    132
ind.2    244    115    240    187    196    133    189    251
ind.3    298    139    216    225    219    276    192    254
ind.4    129    176    180    182    215    250    227    186
ind.5    238    217    284    240    131    184    247    168

To something like this:

                 loc1            loc2
ind.1 257/262/228/266 204/245/282/132
ind.2 244/115/240/187 196/133/189/251
ind.3 298/139/216/225 219/276/192/254
ind.4 129/176/180/182 215/250/227/186
ind.5 238/217/284/240 131/184/247/168

In a dataframe of over 100 rows and columns. I've tried indexing the data frame as presented in the solution of this question, but after creating said index of every 4 columns y do find myself lost while trying to perform do.call over my data frame. I'm sure there must be a easy solution for this, but please keep in mind that i'm all but proficient in R.

Also; the colnames are not a real problem if the body is in shape, since extracting a list of names is performed by loc <- colnames(dfrm) and loc <- loc[c(T, F, F, F), and then defining colnames(dfrm) <- loc, although would be nice if incorporated.

like image 853
Panchito Avatar asked Feb 21 '14 23:02

Panchito


People also ask

How do I convert multiple columns to one column in Python?

DataFrames consist of rows, columns, and data. To combine the values of all the column and append them into a single column, we will use apply() method inside which we will write our expression to do the same. Whenever we want to perform some operation on the entire DataFrame, we use apply() method.

How do I merge all columns into one in pandas?

You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .

How do I return multiple columns in a data frame?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.

How do I concatenate columns in pandas?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.


2 Answers

This is certainly not pretty, but it works:

do.call(cbind, lapply(1:ceiling(ncol(df)/4), function(i)
                      apply(df[,seq(4*(i-1)+1, min(4*i, ncol(df))), drop = F],
                            1, paste, collapse = "/")))
#      [,1]              [,2]             
#ind.1 "257/262/228/266" "204/245/282/132"
#ind.2 "244/115/240/187" "196/133/189/251"
#ind.3 "298/139/216/225" "219/276/192/254"
#ind.4 "129/176/180/182" "215/250/227/186"
#ind.5 "238/217/284/240" "131/184/247/168"

The ceiling and drop are there to survive edge cases when number of columns is not divisible by 4. Also, note that the end result is a matrix here (thanks to the apply), and you can convert it back to data.frame if you like (and assign whatever column names).

like image 95
eddi Avatar answered Oct 02 '22 01:10

eddi


Way late to the party, but I think this is a little cleaner (and robust to non multiple of 4 column counts):

as.data.frame(
  lapply(
    split.default(df, (1:ncol(df) - 1) %/% 4), 
    function(x) do.call(paste, c(x, list(sep="/"))
) ) )

Splitting the data frame by columns using (1:ncol(df) - 1) %/% 4) creates groups of four columns (or fewer if you have a non-mulitple of four for the last group), which then makes it trivial to pass on to paste. Note we have to use split.default because split.data.frame will attempt to split by row instead of column. Produces:

               X0              X1
1 257/262/228/266 204/245/282/132
2 244/115/240/187 196/133/189/251
3 298/139/216/225 219/276/192/254
4 129/176/180/182 215/250/227/186
5 238/217/284/240 131/184/247/168
like image 38
BrodieG Avatar answered Oct 02 '22 03:10

BrodieG