Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Add Columns to an empty data frame in R

Tags:

I have searched extensively but not found an answer to this question on Stack Overflow.

Lets say I have a data frame a.

I define:

a <- NULL a <- as.data.frame(a) 

If I wanted to add a column to this data frame as so:

a$col1 <- c(1,2,3) 

I get the following error:

Error in `$<-.data.frame`(`*tmp*`, "a", value = c(1, 2, 3)) :      replacement has 3 rows, data has 0 

Why is the row dimension fixed but the column is not?

How do I change the number of rows in a data frame?

If I do this (inputting the data into a list first and then converting to a df), it works fine:

a <- NULL a$col1 <- c(1,2,3) a <- as.data.frame(a) 
like image 969
Michal Avatar asked Oct 31 '14 22:10

Michal


People also ask

How do you add a column to an empty DataFrame in R?

The easiest way to add an empty column to a dataframe in R is to use the add_column() method: dataf %>% add_column(new_col = NA) . Note, that this includes installing dplyr or tidyverse.

How do I create an empty DataFrame with column names in R?

If you want to create an empty data. frame with dynamic names (colnames in a variable), this can help: names <- c("v","u","w") df <- data. frame() for (k in names) df[[k]]<-as. numeric() You can change the type as well if you need so.


2 Answers

The row dimension is not fixed, but data.frames are stored as list of vectors that are constrained to have the same length. You cannot add col1 to a because col1 has three values (rows) and a has zero, thereby breaking the constraint. R does not by default auto-vivify values when you attempt to extend the dimension of a data.frame by adding a column that is longer than the data.frame. The reason that the second example works is that col1 is the only vector in the data.frame so the data.frame is initialized with three rows.

If you want to automatically have the data.frame expand, you can use the following function:

cbind.all <- function (...)  {     nm <- list(...)     nm <- lapply(nm, as.matrix)     n <- max(sapply(nm, nrow))     do.call(cbind, lapply(nm, function(x) rbind(x, matrix(, n -          nrow(x), ncol(x))))) } 

This will fill missing values with NA. And you would use it like: cbind.all( df, a )

like image 116
ctbrown Avatar answered Sep 22 '22 04:09

ctbrown


You could also do something like this where I read in data from multiple files, grab the column I want, and store it in the dataframe. I check whether the dataframe has anything in it, and if it doesn't, create a new one rather than getting the error about mismatched number of rows:

readCounts = data.frame()  for(f in names(files)){     d = read.table(files[f], header=T, as.is=T)     d2 = round(data.frame(d$NumReads))     colnames(d2) = f     if(ncol(readCounts) == 0){         readCounts = d2         rownames(readCounts) = d$Name     } else{         readCounts = cbind(readCounts, d2)     } } 
like image 30
user2820516 Avatar answered Sep 21 '22 04:09

user2820516