Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

combining two data frames of different lengths [closed]

Tags:

dataframe

r

I have two data frames.
The first is of only one column and 10 rows.
The second is of 3 columns and 50 rows.

When I try to combine this by using cbind, it gives this error:

Error in data.frame(..., check.names = FALSE) :

Can anyone suggest another function to do this?
P.S I have tried this using lists too, but it gives the same error.

The data frame consisting of 3 columns should be the first 3 columns in a CSV file, whereas the data frame with one column should be the fourth column in that file, when I write with the write.table function. The first 3 columns have 50 rows and the fourth column should occupy the first 10 rows.

like image 627
Matt Avatar asked Aug 08 '11 20:08

Matt


People also ask

Can you merge two Dataframes of different lengths pandas?

It can be done using the merge() method. Below are some examples that depict how to merge data frames of different lengths using the above method: Example 1: Below is a program to merge two student data frames of different lengths.

Which function is used to merge two data frames?

join function combines DataFrames based on index or column. Joining two DataFrames can be done in multiple ways (left, right, and inner) depending on what data must be in the final DataFrame.

How do I merge two Dataframes with different columns?

It is possible to join the different columns is using concat() method. DataFrame: It is dataframe name. axis: 0 refers to the row axis and1 refers the column axis. join: Type of join.


2 Answers

In the plyr package there is a function rbind.fill that will merge data.frames and introduce NA for empty cells:

library(plyr) combined <- rbind.fill(mtcars[c("mpg", "wt")], mtcars[c("wt", "cyl")]) combined[25:40, ]      mpg    wt cyl 25 19.2 3.845  NA 26 27.3 1.935  NA 27 26.0 2.140  NA 28 30.4 1.513  NA 29 15.8 3.170  NA 30 19.7 2.770  NA 31 15.0 3.570  NA 32 21.4 2.780  NA 33   NA 2.620   6 34   NA 2.875   6 35   NA 2.320   4 
like image 86
Andrie Avatar answered Oct 16 '22 06:10

Andrie


It's not clear to me at all what the OP is actually after, given the follow-up comments. It's possible they are actually looking for a way to write the data to file.

But let's assume that we're really after a way to cbind multiple data frames of differing lengths.

cbind will eventually call data.frame, whose help files says:

Objects passed to data.frame should have the same number of rows, but atomic vectors, factors and character vectors protected by I will be recycled a whole number of times if necessary (including as from R 2.9.0, elements of list arguments).

so in the OP's actual example, there shouldn't be an error, as R ought to recycle the shorter vectors to be of length 50. Indeed, when I run the following:

set.seed(1) a <- runif(50) b <- 1:50 c <- rep(LETTERS[1:5],length.out = 50) dat1 <- data.frame(a,b,c) dat2 <- data.frame(d = runif(10),e = runif(10)) cbind(dat1,dat2) 

I get no errors and the shorter data frame is recycled as expected. However, when I run this:

set.seed(1) a <- runif(50) b <- 1:50 c <- rep(LETTERS[1:5],length.out = 50) dat1 <- data.frame(a,b,c) dat2 <- data.frame(d = runif(9), e = runif(9)) cbind(dat1,dat2) 

I get the following error:

Error in data.frame(..., check.names = FALSE) :    arguments imply differing number of rows: 50, 9 

But the wonderful thing about R is that you can make it do almost anything you want, even if you shouldn't. For example, here's a simple function that will cbind data frames of uneven length and automatically pad the shorter ones with NAs:

cbindPad <- function(...){ args <- list(...) n <- sapply(args,nrow) mx <- max(n) pad <- function(x, mx){     if (nrow(x) < mx){         nms <- colnames(x)         padTemp <- matrix(NA, mx - nrow(x), ncol(x))         colnames(padTemp) <- nms         if (ncol(x)==0) {           return(padTemp)         } else {         return(rbind(x,padTemp))           }     }     else{         return(x)     } } rs <- lapply(args,pad,mx) return(do.call(cbind,rs)) } 

which can be used like this:

set.seed(1) a <- runif(50) b <- 1:50 c <- rep(LETTERS[1:5],length.out = 50) dat1 <- data.frame(a,b,c) dat2 <- data.frame(d = runif(10),e = runif(10)) dat3 <- data.frame(d = runif(9), e = runif(9)) cbindPad(dat1,dat2,dat3) 

I make no guarantees that this function works in all cases; it is meant as an example only.

EDIT

If the primary goal is to create a csv or text file, all you need to do it alter the function to pad using "" rather than NA and then do something like this:

dat <- cbindPad(dat1,dat2,dat3) rs <- as.data.frame(apply(dat,1,function(x){paste(as.character(x),collapse=",")})) 

and then use write.table on rs.

like image 36
joran Avatar answered Oct 16 '22 06:10

joran