Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

cbind: is there a way to have missing values set to NA?

Tags:

r

cbind

Please forgive me if I missed an answer to such a simple question.

I want to use cbind() to bind two columns. One of them is a single entry shorter in length.

Can I have R supply an NA for the missing value?

The documentation discusses a deparse.level argument but this doesn't seem to be my solution.

Further, if I may be so bold, would there also be a quick way to prepend the shorter column with NA's?

like image 308
tumultous_rooster Avatar asked Sep 29 '13 03:09

tumultous_rooster


People also ask

What is the difference between Cbind and Rbind?

cbind() and rbind() both create matrices by combining several vectors of the same length. cbind() combines vectors as columns, while rbind() combines them as rows.

What does the Cbind command do in R?

cbind() function in R Language is used to combine specified Vector, Matrix or Data Frame by columns.

Does Cbind create Dataframe?

Data frame methodsThe cbind data frame method is just a wrapper for data. frame(..., check. names = FALSE) . This means that it will split matrix columns in data frame arguments, and convert character columns to factors unless stringsAsFactors = FALSE is specified.

What does Cbind mean?

cbind(my_data, new_column) The name of the cbind R function stands for column-bind. The cbind function is used to combine vectors, matrices and/or data frames by columns.


2 Answers

Try this:

x <- c(1:5)
y <- c(4:1)
length(y) = length(x)
cbind(x,y)
     x  y
[1,] 1  4
[2,] 2  3
[3,] 3  2
[4,] 4  1
[5,] 5 NA

or this:

x <- c(4:1)
y <- c(1:5)
length(x) = length(y)
cbind(x,y)
      x y
[1,]  4 1
[2,]  3 2
[3,]  2 3
[4,]  1 4
[5,] NA 5

I think this will do something similar to what DWin suggested and work regardless of which vector is shorter:

x <- c(4:1)
y <- c(1:5)

lengths <- max(c(length(x), length(y)))
length(x) <- lengths
length(y) <- lengths
cbind(x,y)

The code above can also be condensed to:

x <- c(4:1)
y <- c(1:5)
length(x) <- length(y) <- max(c(length(x), length(y)))
cbind(x,y)

EDIT

Here is what I came up with to address the question:

"Further, if I may be so bold, would there also be a quick way to prepend the shorter column with NA's?"

inserted into the original post by Matt O'Brien.

x <- c(4:1)
y <- c(1:5)

first <- 1   # 1 means add NA to top of shorter vector
             # 0 means add NA to bottom of shorter vector

if(length(x)<length(y)) {
     if(first==1) x = c(rep(NA, length(y)-length(x)),x);y=y
     if(first==0) x = c(x,rep(NA, length(y)-length(x)));y=y
} 

if(length(y)<length(x)) {
     if(first==1) y = c(rep(NA, length(x)-length(y)),y);x=x
     if(first==0) y = c(y,rep(NA, length(x)-length(y)));x=x
} 

cbind(x,y)

#       x y
# [1,] NA 1
# [2,]  4 2
# [3,]  3 3
# [4,]  2 4
# [5,]  1 5

Here is a function:

x <- c(4:1)
y <- c(1:5)

first <- 1   # 1 means add NA to top of shorter vector
             # 0 means add NA to bottom of shorter vector

my.cbind <- function(x,y,first) {

  if(length(x)<length(y)) {
     if(first==1) x = c(rep(NA, length(y)-length(x)),x);y=y
     if(first==0) x = c(x,rep(NA, length(y)-length(x)));y=y
  } 

  if(length(y)<length(x)) {
     if(first==1) y = c(rep(NA, length(x)-length(y)),y);x=x
     if(first==0) y = c(y,rep(NA, length(x)-length(y)));x=x
  } 

  return(cbind(x,y))

}

my.cbind(x,y,first)

my.cbind(c(1:5),c(4:1),1)
my.cbind(c(1:5),c(4:1),0)
my.cbind(c(1:4),c(5:1),1)
my.cbind(c(1:4),c(5:1),0)
my.cbind(c(1:5),c(5:1),1)
my.cbind(c(1:5),c(5:1),0)

This version allows you to cbind two vectors of different mode:

x <- c(4:1)
y <- letters[1:5]

first <- 1   # 1 means add NA to top of shorter vector
             # 0 means add NA to bottom of shorter vector

my.cbind <- function(x,y,first) {

  if(length(x)<length(y)) {
     if(first==1) x = c(rep(NA, length(y)-length(x)),x);y=y
     if(first==0) x = c(x,rep(NA, length(y)-length(x)));y=y
  } 

  if(length(y)<length(x)) {
     if(first==1) y = c(rep(NA, length(x)-length(y)),y);x=x
     if(first==0) y = c(y,rep(NA, length(x)-length(y)));x=x
  } 

  x <- as.data.frame(x)
  y <- as.data.frame(y)

  return(data.frame(x,y))

}

my.cbind(x,y,first)

#    x y
# 1 NA a
# 2  4 b
# 3  3 c
# 4  2 d
# 5  1 e

my.cbind(c(1:5),letters[1:4],1)
my.cbind(c(1:5),letters[1:4],0)
my.cbind(c(1:4),letters[1:5],1)
my.cbind(c(1:4),letters[1:5],0)
my.cbind(c(1:5),letters[1:5],1)
my.cbind(c(1:5),letters[1:5],0)
like image 135
Mark Miller Avatar answered Oct 15 '22 01:10

Mark Miller


A while back I had put together a function called Cbind that was meant to do this sort of thing. In its current form, it should be able to handle vectors, data.frames, and matrices as the input.

For now, the function is here: https://gist.github.com/mrdwab/6789277

Here is how one would use the function:

x <- 1:5
y <- letters[1:4]
z <- matrix(1:4, ncol = 2, dimnames = list(NULL, c("a", "b")))
Cbind(x, y, z)
#   x    y z_a z_b
# 1 1    a   1   3
# 2 2    b   2   4
# 3 3    c  NA  NA
# 4 4    d  NA  NA
# 5 5 <NA>  NA  NA
Cbind(x, y, z, first = FALSE)
#   x    y z_a z_b
# 1 1 <NA>  NA  NA
# 2 2    a  NA  NA
# 3 3    b  NA  NA
# 4 4    c   1   3
# 5 5    d   2   4

The two three functions required are padNA, dotnames, and Cbind, which are defined as follows:

padNA <- function (mydata, rowsneeded, first = TRUE) {
## Pads vectors, data.frames, or matrices with NA
  temp1 = colnames(mydata)
  rowsneeded = rowsneeded - nrow(mydata)
  temp2 = setNames(
    data.frame(matrix(rep(NA, length(temp1) * rowsneeded), 
                      ncol = length(temp1))), temp1)
  if (isTRUE(first)) rbind(mydata, temp2)
  else rbind(temp2, mydata)
}

dotnames <- function(...) {
## Gets the names of the objects passed through ...
  vnames <- as.list(substitute(list(...)))[-1L]
  vnames <- unlist(lapply(vnames,deparse), FALSE, FALSE)
  vnames
}

Cbind <- function(..., first = TRUE) {
## cbinds vectors, data.frames, and matrices together
  Names <- dotnames(...)
  datalist <- setNames(list(...), Names)
  nrows <- max(sapply(datalist, function(x) 
    ifelse(is.null(dim(x)), length(x), nrow(x))))
  datalist <- lapply(seq_along(datalist), function(x) {
    z <- datalist[[x]]
    if (is.null(dim(z))) {
      z <- setNames(data.frame(z), Names[x])
    } else {
      if (is.null(colnames(z))) {
        colnames(z) <- paste(Names[x], sequence(ncol(z)), sep = "_")
      } else {
        colnames(z) <- paste(Names[x], colnames(z), sep = "_")
      }
    }
    padNA(z, rowsneeded = nrows, first = first)
  })
  do.call(cbind, datalist)
}

Part of the reason I stopped working on the function was that the gdata package already has a function called cbindX that handles cbinding data.frames and matrices with different numbers of rows. It will not work directly on vectors, so you need to convert them to data.frames first.

library(gdata)
cbindX(data.frame(x), data.frame(y), z)
#   x    y  a  b
# 1 1    a  1  3
# 2 2    b  2  4
# 3 3    c NA NA
# 4 4    d NA NA
# 5 5 <NA> NA NA
like image 33
A5C1D2H2I1M1N2O1R2T1 Avatar answered Oct 15 '22 01:10

A5C1D2H2I1M1N2O1R2T1