Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to combine columns within one data.frame that contain NA's in order to remove NA's

Tags:

dataframe

r

tidyr

I have a data.frame all_data with multiple columns, like this

VoS            Value            Total.Value
1              NA               NA
NA             NA               41
NA             13               NA
76             NA               NA
4              NA               NA
NA             7                NA
NA             NA               22

I want to combine these columns into one column Total VoS that looks like this:

Total VoS
1
41
13
76
4
7
22

I have tried

all_data <- unite( all_data, Total VoS, VoS, 
                                        Value, 
                                        Total.Value )

which is close to what I want, but end up with this:

Total VoS
1_NA_NA
41_NA_NA
13_NA_NA
76_NA_NA
4_NA_NA
7_NA_NA
22_NA_NA

I tried doing na.rm = TRUE within the unite function, but that returns Error: `TRUE` must evaluate to column positions or names, not a logical vector.

like image 435
Maridee Weber Avatar asked Mar 02 '23 09:03

Maridee Weber


1 Answers

With unite, there is na.rm argument which is FALSE by default

library(tidyr)
unite( all_data, Total, VoS,  Value, Total.Value, na.rm = TRUE )
#  Total
#1     1
#2    41
#3    13
#4    76
#5     4
#6     7
#7    22

In the OP's original data, convert the columns of interest to character from factor and then do the unite

library(dplyr)
all_data_new %>%
     mutate_at(c(3, 6, 7, 11), as.character) %>% 
     unite(New, names(.)[c(3, 6, 7, 11)], na.rm = TRUE)
#  Geographic.area.name Year         New X2007.NAICS.codes.and.NAICS.based.rollup.code
#1              Alabama 2009  90,530,746                                         31-33
#2              Alabama 2008 116,401,285                                         31-33
#3              Alabama 2009   9,932,542                                           311
#4              Alabama 2008   9,661,432                                           311
#5              Alabama 2009   1,819,728                                          3111
#6              Alabama 2008   1,744,928                                          3111
#  Meaning.of.2007.NAICS.codes.and.NAICS.based.rollup.code
#1                                           Manufacturing
#2                                           Manufacturing
#3                                      Food manufacturing
#4                                      Food manufacturing
#5                               Animal food manufacturing
#6                               Animal food manufacturing
  #Relative.standard.error.for.estimate.of.total.value.of.shipments.and.receipts.for.services.... X2012.NAICS.code
#1                                                                                           <NA>             <NA>
#2                                                                                           <NA>             <NA>
#3                                                                                           <NA>             <NA>
#4                                                                                           <NA>             <NA>
#5                                                                                           <NA>             <NA>
#6                                                                                           <NA>             <NA>
#  Meaning.of.2012.NAICS.code
#1                       <NA>
#2                       <NA>
#3                       <NA>
#4                       <NA>
#5                       <NA>
#6                       <NA>

Or another option is coalesce

library(dplyr)
all_data %>%
    transmute(Total = coalesce(!!! .))
#  Total
#1     1
#2    41
#3    13
#4    76
#5     4
#6     7
#7    22

Or in base R with pmax

do.call(pmax, c(all_data, na.rm = TRUE))

Or using pmin

do.call(pmin, c(all_data, na.rm = TRUE))

data

all_data <- structure(list(VoS = c(1L, NA, NA, 76L, 4L, NA, NA), Value = c(NA, 
NA, 13L, NA, NA, 7L, NA), Total.Value = c(NA, 41L, NA, NA, NA, 
NA, 22L)), class = "data.frame", row.names = c(NA, -7L))

all_data_new <- structure(list(Geographic.area.name = structure(c(1L, 1L, 1L,
1L, 1L, 1L), .Label = "Alabama", class = "factor"), Year = c(2009L,
2008L, 2009L, 2008L, 2009L, 2008L), Total.value.of.shipments...1.000. = c("90,530,746",
"116,401,285", "9,932,542", "9,661,432", "1,819,728", "1,744,928"
), X2007.NAICS.codes.and.NAICS.based.rollup.code = structure(c(1L,
1L, 2L, 2L, 3L, 3L), .Label = c("31-33", "311", "3111"), class = "factor"),
Meaning.of.2007.NAICS.codes.and.NAICS.based.rollup.code = structure(c(3L,
3L, 2L, 2L, 1L, 1L), .Label = c("Animal food manufacturing",
"Food manufacturing", "Manufacturing"), class = "factor"),
X.Total.value.of.shipments...1.000.. = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Label = character(0), class = "factor"), X.Total.value.of.shipments.and.receipts.for.services...1.000.. = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Label = character(0), class = "factor"), Relative.standard.error.for.estimate.of.total.value.of.shipments.and.receipts.for.services.... = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), X2012.NAICS.code = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Label = character(0), class = "factor"), Meaning.of.2012.NAICS.code = structure(c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), .Label = character(0), class = "factor"), Total.value.of.shipments.and.receipts.for.services...1.000. = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_)), row.names = c(NA, 6L), class = "data.frame")
like image 188
akrun Avatar answered Mar 04 '23 23:03

akrun