Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Concatenate rows of a data frame

I would like to take a data frame with characters and numbers, and concatenate all of the elements of the each row into a single string, which would be stored as a single element in a vector. As an example, I make a data frame of letters and numbers, and then I would like to concatenate the first row via the paste function, and hopefully return the value "A1"

df <- data.frame(letters = LETTERS[1:5], numbers = 1:5) df  ##   letters numbers ## 1       A       1 ## 2       B       2 ## 3       C       3 ## 4       D       4 ## 5       E       5  paste(df[1,], sep =".") ## [1] "1" "1" 

So paste is converting each element of the row into an integer that corresponds to the 'index of the corresponding level' as if it were a factor, and it keeps it a vector of length two. (I know/believe that factors that are coerced to be characters behave in this way, but as R is not storing df[1,] as a factor at all (tested by is.factor(), I can't verify that it is actually an index for a level)

is.factor(df[1,]) ## [1] FALSE is.vector(df[1,]) ## [1] FALSE 

So if it is not a vector then it makes sense that it is behaving oddly, but I can't coerce it into a vector

> is.vector(as.vector(df[1,])) [1] FALSE 

Using as.character did not seem to help in my attempts

Can anyone explain this behavior?

like image 443
Sam Avatar asked Dec 19 '12 01:12

Sam


2 Answers

While others have focused on why your code isn't working and how to improve it, I'm going to try and focus more on getting the result you want. From your description, it seems you can readily achieve what you want using paste:

df <- data.frame(letters = LETTERS[1:5], numbers = 1:5, stringsAsFactors=FALSE) paste(df$letters, df$numbers, sep=""))  ## [1] "A1" "B2" "C3" "D4" "E5" 

You can change df$letters to character using df$letters <- as.character(df$letters) if you don't want to use the stringsAsFactors argument.

But let's assume that's not what you want. Let's assume you have hundreds of columns and you want to paste them all together. We can do that with your minimal example too:

df_args <- c(df, sep="") do.call(paste, df_args)  ## [1] "A1" "B2" "C3" "D4" "E5" 

EDIT: Alternative method and explanation:

I realised the problem you're having is a combination of the fact that you're using a factor and that you're using the sep argument instead of collapse (as @adibender picked up). The difference is that sep gives the separator between two separate vectors and collapse gives separators within a vector. When you use df[1,], you supply a single vector to paste and hence you must use the collapse argument. Using your idea of getting every row and concatenating them, the following line of code will do exactly what you want:

apply(df, 1, paste, collapse="") 

Ok, now for the explanations:

Why won't as.list work?

as.list converts an object to a list. So it does work. It will convert your dataframe to a list and subsequently ignore the sep="" argument. c combines objects together. Technically, a dataframe is just a list where every column is an element and all elements have to have the same length. So when I combine it with sep="", it just becomes a regular list with the columns of the dataframe as elements.

Why use do.call?

do.call allows you to call a function using a named list as its arguments. You can't just throw the list straight into paste, because it doesn't like dataframes. It's designed for concatenating vectors. So remember that dfargs is a list containing a vector of letters, a vector of numbers and sep which is a length 1 vector containing only "". When I use do.call, the resulting paste function is essentially paste(letters, numbers, sep).
But what if my original dataframe had columns "letters", "numbers", "squigs", "blargs" after which I added the separator like I did before? Then the paste function through do.call would look like:

paste(letters, numbers, squigs, blargs, sep) 

So you see it works for any number of columns.

like image 199
sebastian-c Avatar answered Oct 02 '22 20:10

sebastian-c


For those using library(tidyverse), you can simply use the unite function.

 new.df <- df%>%  unite(together, letters, numbers, sep="") 

This will give you a new column called together with A1, B2, etc.

like image 25
Shirley Avatar answered Oct 02 '22 18:10

Shirley