Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R: Find missing columns, add to data frame if missing

Tags:

r

I'd like to write some code that would take a given data frame, check to see if any columns are missing, and if so, add the missing columns filled with 0 or NA. Here's what I've got:

> df
   x1 x2 x4
1   0  1  3
2   3  1  3
3   1  2  1

> nameslist <- c("x1","x2","x3","x4")
> miss.names <- !nameslist %in% colnames(df)
> holder <- rbind(nameslist,miss.names)
> miss.cols <- subset(holder[1,], holder[2,] == "TRUE")

Beyond this point, I can't figure out how to add in the missing column ("x3") without hardcoding it. Ideally, I'd want the new, complete data frame to have columns in the same order as nameslist as well.

Any ideas? My current code can be ignored, no problem.

like image 521
bosbmgatl Avatar asked Feb 11 '12 01:02

bosbmgatl


People also ask

How do I find missing values in all columns in R?

In order to find the missing values in all columns use apply function with the which and the sum function in is.na() method.

How do I check if a column is missing in R?

In R, the easiest way to find columns that contain missing values is by combining the power of the functions is.na() and colSums(). First, you check and count the number of NA's per column. Then, you use a function such as names() or colnames() to return the names of the columns with at least one missing value.

How do I replace missing values with NA in R?

The classic way to replace NA's in R is by using the IS.NA() function. The IS.NA() function takes a vector or data frame as input and returns a logical object that indicates whether a value is missing (TRUE or VALUE). Next, you can use this logical object to create a subset of the missing values and assign them a zero.


1 Answers

Here's a straightforward approach

df <- data.frame(a=1:4, e=4:1)
nms <- c("a", "b", "d", "e")   # Vector of columns you want in this data.frame

Missing <- setdiff(nms, names(df))  # Find names of missing columns
df[Missing] <- 0                    # Add them, filled with '0's
df <- df[nms]                       # Put columns in desired order
#   a b d e
# 1 1 0 0 4
# 2 2 0 0 3
# 3 3 0 0 2
# 4 4 0 0 1
like image 76
Josh O'Brien Avatar answered Nov 10 '22 07:11

Josh O'Brien