I have a bunch of data frames with different variables. I want to read them into R and add columns to those that are short of a few variables so that they all have a common set of standard variables, even if some are unobserved.
In other words... Is there a way to add columns of NA
in the tidyverse when a column does not exist? My current attempt works for adding new variables where the column doesn't exist (top_speed
) but fails when the column already exists (mpg
) - it sets all observations to the first value Mazda RX4
.
library(tidyverse) mtcars %>% as_tibble() %>% rownames_to_column("car") %>% mutate(top_speed = ifelse("top_speed" %in% names(.), top_speed, NA), mpg = ifelse("mpg" %in% names(.), mpg, NA)) %>% select(car, top_speed, mpg, everything()) # # A tibble: 32 x 13 # car top_speed mpg cyl disp hp drat wt qsec vs am gear carb # <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 Mazda RX4 NA 21 6 160.0 110 3.90 2.620 16.46 0 1 4 4 # 2 Mazda RX4 Wag NA 21 6 160.0 110 3.90 2.875 17.02 0 1 4 4 # 3 Datsun 710 NA 21 4 108.0 93 3.85 2.320 18.61 1 1 4 1 # 4 Hornet 4 Drive NA 21 6 258.0 110 3.08 3.215 19.44 1 0 3 1 # 5 Hornet Sportabout NA 21 8 360.0 175 3.15 3.440 17.02 0 0 3 2 # 6 Valiant NA 21 6 225.0 105 2.76 3.460 20.22 1 0 3 1 # 7 Duster 360 NA 21 8 360.0 245 3.21 3.570 15.84 0 0 3 4 # 8 Merc 240D NA 21 4 146.7 62 3.69 3.190 20.00 1 0 4 2 # 9 Merc 230 NA 21 4 140.8 95 3.92 3.150 22.90 1 0 4 2 # 10 Merc 280 NA 21 6 167.6 123 3.92 3.440 18.30 1 0 4 4
For checking the existence we need to use the COL_LENGTH() function. COL_LENGTH() function returns the defined length of a column in bytes. This function can be used with the IF ELSE condition to check if the column exists or not.
The Postgres IF NOT EXISTS syntaxFirst, we specify the name of the table to which we want to add a column. We supply the IF NOT EXISTS option after the ADD COLUMN clause, and then we specify the name of the column and its data type.
Another option that does not require creating a helper function (or an already complete data.frame) using tibble's add_column
:
library(tibble) cols <- c(top_speed = NA_real_, nhj = NA_real_, mpg = NA_real_) add_column(mtcars, !!!cols[setdiff(names(cols), names(mtcars))])
We could create a helper function to create the column
fncols <- function(data, cname) { add <-cname[!cname%in%names(data)] if(length(add)!=0) data[add] <- NA data } fncols(mtcars, "mpg") fncols(mtcars, c("topspeed","nhj","mpg"))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With