Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

rbind data.frames without names

I am trying to figure out why the rbind function is not working as intended when joining data.frames without names. Here is my testing:

test <- data.frame(
            id=rep(c("a","b"),each=3),
            time=rep(1:3,2),
            black=1:6,
            white=1:6,
            stringsAsFactors=FALSE
            )

# take some subsets with different names
pt1 <- test[,c(1,2,3)]
pt2 <- test[,c(1,2,4)]

# method 1 - rename to same names - works
names(pt2) <- names(pt1)
rbind(pt1,pt2)

# method 2 - works - even with duplicate names
names(pt1) <- letters[c(1,1,1)]
names(pt2) <- letters[c(1,1,1)]
rbind(pt1,pt2)

# method 3 - works  - with a vector of NA's as names
names(pt1) <- rep(NA,ncol(pt1))
names(pt2) <- rep(NA,ncol(pt2))
rbind(pt1,pt2)

# method 4 - but... does not work without names at all?
pt1 <- unname(pt1)
pt2 <- unname(pt2)
rbind(pt1,pt2)

This seems a bit odd to me. Am I missing a good reason why this shouldn't work out of the box?

edit for additional info

Using @JoshO'Brien's suggestion to debug, I can identify the error as occurring during this if statement part of the rbind.data.frame function

if (is.null(pi) || is.na(jj <- pi[[j]]))

(online version of code here: http://svn.r-project.org/R/trunk/src/library/base/R/dataframe.R starting at: "### Here are the methods for rbind and cbind.")

From stepping through the program, the value of pi does not appear to have been set at this point, hence the program tries to index the built-in constant pi like pi[[3]] and errors out.

From what I can figure, the internal pi object doesn't appear to be set due to this earlier line where clabs has been initialized as NULL:

if (is.null(clabs)) clabs <- names(xi) else { #pi gets set here

I am in a tangle trying to figure this out, but will update as it comes together.

like image 723
thelatemail Avatar asked Nov 28 '12 06:11

thelatemail


People also ask

Does Rbind use column names?

For cbind ( rbind ) the column (row) names are taken from the colnames ( rownames ) of the arguments if these are matrix-like.

How do I Rbind a DataFrame in R?

To join two data frames (datasets) vertically, use the rbind function. The two data frames must have the same variables, but they do not have to be in the same order. If data frameA has variables that data frameB does not, then either: Delete the extra variables in data frameA or.

What is faster than Rbind?

As many before me have documented, I also find that rbindlist() is the fastest method and rbind() is the slowest.

What is the difference between Cbind and Rbind?

cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows.


1 Answers

Because unname() & explicitly assigning NA as column headers are not identical actions. When the column names are all NA, then an rbind() is possible. Since rbind() takes the names/colnames of the data frame, the results do not match & hence rbind() fails.

Here is some code to help see what I mean:

> c1 <- c(1,2,3)
> c2 <- c('A','B','C')
> df1 <- data.frame(c1,c2)
> df1
  c1 c2
1  1  A
2  2  B
3  3  C
> df2 <- data.frame(c1,c2) # df1 & df2 are identical
>
> #Let's perform unname on one data frame &
> #replacement with NA on the other
>
> unname(df1)
  NA NA
1  1  A
2  2  B
3  3  C
> tem1 <- names(unname(df1))
> tem1
NULL
>
> #Please note above that the column headers though showing as NA are null
>
> names(df2) <- rep(NA,ncol(df2))
> df2
  NA NA
1  1  A
2  2  B
3  3  C
> tem2 <- names(df2)
> tem2
[1] NA NA
> 
> #Though unname(df1) & df2 look identical, they aren't
> #Also note difference in tem1 & tem2
>
> identical(unname(df1),df2)
[1] FALSE
> 

I hope this helps. The names show up as NA each, but the two operations are different.

Hence, two data frames with their column headers replaced to NA can be "rbound" but two data frames without any column headers (achieved using unname()) cannot.

like image 53
A_K Avatar answered Oct 20 '22 05:10

A_K