Paste every "X" columns to a single column in a dataframe

Q: How do I convert multiple columns to one column in Python?

DataFrames consist of rows, columns, and data. To combine the values of all the column and append them into a single column, we will use apply() method inside which we will write our expression to do the same. Whenever we want to perform some operation on the entire DataFrame, we use apply() method.

Q: How do I merge all columns into one in pandas?

You can use DataFrame. apply() for concatenate multiple column values into a single column, with slightly less typing and more scalable when you want to join multiple columns .

Q: How do I return multiple columns in a data frame?

Return Multiple Columns from pandas apply() You can return a Series from the apply() function that contains the new data. pass axis=1 to the apply() function which applies the function multiply to each row of the DataFrame, Returns a series of multiple columns from pandas apply() function.

Q: How do I concatenate columns in pandas?

By use + operator simply you can concatenate two or multiple text/string columns in pandas DataFrame. Note that when you apply + operator on numeric columns it actually does addition instead of concatenation.

Tags:

dataframe

r

paste

I have a dfrm of over 100 columns and 150 rows. I need to merge the contents of every 4 columns to 1 (preferably separated by a "/", although dispensable) which is simple enough, performing apply(dfrm[ ,1:4], 1, paste, collapse="/"). I have difficulties scaling that solution to my whole df. In other words:

How can I go from this:

        loc1   loc1.1 loc1.2 loc1.3 loc2  loc2.1 loc2.2  loc2.3
ind.1    257    262    228    266    204    245    282    132
ind.2    244    115    240    187    196    133    189    251
ind.3    298    139    216    225    219    276    192    254
ind.4    129    176    180    182    215    250    227    186
ind.5    238    217    284    240    131    184    247    168

To something like this:

                 loc1            loc2
ind.1 257/262/228/266 204/245/282/132
ind.2 244/115/240/187 196/133/189/251
ind.3 298/139/216/225 219/276/192/254
ind.4 129/176/180/182 215/250/227/186
ind.5 238/217/284/240 131/184/247/168

In a dataframe of over 100 rows and columns. I've tried indexing the data frame as presented in the solution of this question, but after creating said index of every 4 columns y do find myself lost while trying to perform do.call over my data frame. I'm sure there must be a easy solution for this, but please keep in mind that i'm all but proficient in R.

Also; the colnames are not a real problem if the body is in shape, since extracting a list of names is performed by loc <- colnames(dfrm) and loc <- loc[c(T, F, F, F), and then defining colnames(dfrm) <- loc, although would be nice if incorporated.

853

asked Feb 21 '14 23:02

Panchito

2 Answers

This is certainly not pretty, but it works:

do.call(cbind, lapply(1:ceiling(ncol(df)/4), function(i)
                      apply(df[,seq(4*(i-1)+1, min(4*i, ncol(df))), drop = F],
                            1, paste, collapse = "/")))
#      [,1]              [,2]             
#ind.1 "257/262/228/266" "204/245/282/132"
#ind.2 "244/115/240/187" "196/133/189/251"
#ind.3 "298/139/216/225" "219/276/192/254"
#ind.4 "129/176/180/182" "215/250/227/186"
#ind.5 "238/217/284/240" "131/184/247/168"

The ceiling and drop are there to survive edge cases when number of columns is not divisible by 4. Also, note that the end result is a matrix here (thanks to the apply), and you can convert it back to data.frame if you like (and assign whatever column names).

answered Oct 02 '22 01:10

eddi

Way late to the party, but I think this is a little cleaner (and robust to non multiple of 4 column counts):

as.data.frame(
  lapply(
    split.default(df, (1:ncol(df) - 1) %/% 4), 
    function(x) do.call(paste, c(x, list(sep="/"))
) ) )

Splitting the data frame by columns using (1:ncol(df) - 1) %/% 4) creates groups of four columns (or fewer if you have a non-mulitple of four for the last group), which then makes it trivial to pass on to paste. Note we have to use split.default because split.data.frame will attempt to split by row instead of column. Produces:

               X0              X1
1 257/262/228/266 204/245/282/132
2 244/115/240/187 196/133/189/251
3 298/139/216/225 219/276/192/254
4 129/176/180/182 215/250/227/186
5 238/217/284/240 131/184/247/168

answered Oct 02 '22 03:10

BrodieG

Related questions
                            
                                Colorful geom_bar() plot
                            
                                Replacing all NAs with smoothing spline
                            
                                How to calculate marginal probabilities for generating correlated binary variables
                            
                                Two legends for polar ggplot (with one customized)
                            
                                Applying function to data table subset excluding nested by value
                            
                                Convert factor combinations to wide format table of presence/absence in R
                            
                                How do I run diagnostic plots for lmer in R?
                            
                                Is there a way to do test-driven development with literate programming?
                            
                                Punchcard plot in R
                            
                                Pass argument to data.table aggregation function
                            
                                Test whether a dataframe is a sorted version of another dataframe
                            
                                Add points to pairs plot?
                            
                                format a zoo object with "dimnames"=List of 2
                            
                                Put column names of a data frame as the title of plots of each column
                            
                                Export each data frame within a list to csv [duplicate]
                            
                                Subset data.table using min condition
                            
                                Filled contour plot with R/ggplot/ggmap
                            
                                How to add overlapping histograms with lattice
                            
                                R count function calls
                            
                                Unexpected apply function behaviour in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With