I have a dataframe that has two rows: <pre class="prettyprint"><code>| code | name | v1 | v2 | v3 | v4 | |------|-------|----|----|----|----| | 345 | Yemen | NA | 2 | 3 | NA | | 346 | Yemen | 4 | NA | NA | 5 | </code></pre> Is there an easy way to merge these two rows? What if I rename "345" in "346", would that make things easier?

You can use <code>aggregate</code>. Assuming that you want to merge rows with identical values in column <code>name</code>: <pre class="prettyprint"><code>aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE) name v1 v2 v3 v4 1 Yemen 4 2 3 5 </code></pre> This is like the SQL <code>SELECT name, min(v1) GROUP BY name</code>. The <code>min</code> function is arbitrary, you could also use <code>max</code> or <code>mean</code>, all of them return the non-NA value from an NA and a non-NA value if <code>na.rm = TRUE</code>. (An SQL-like <code>coalesce()</code> function would sound better if existed in R.) However, you should check first if all non-NA values for a given <code>name</code> is identical. For example, run the <code>aggregate</code> both with <code>min</code> and <code>max</code> and compare, or run it with <code>range</code>. Finally, if you have many more variables than just v1-4, you could use <code>DF[,!(names(DF) %in% c("code","name"))]</code> to define the columns.

Adding <code>dplyr</code> & <code>data.table</code> solutions for completeness Using <code>dplyr::coalesce()</code> <pre class="prettyprint lang-r prettyprint-override"><code>library(dplyr) sum_NA <- function(x) {if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)} df %>% group_by(name) %>% summarise_all(sum_NA) #> # A tibble: 1 x 6 #> name code v1 v2 v3 v4 #> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Yemen 691 4 2 3 5 # Ref: https://stackoverflow.com/a/45515491 # Supply lists by splicing them into dots: coalesce_by_column <- function(df) { return(dplyr::coalesce(!!! as.list(df))) } df %>% group_by(name) %>% summarise_all(coalesce_by_column) #> # A tibble: 1 x 6 #> name code v1 v2 v3 v4 #> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 Yemen 345 4 2 3 5 </code></pre> Using <code>data.table</code> <pre class="prettyprint lang-r prettyprint-override"><code># Ref: https://stackoverflow.com/q/28036294/ library(data.table) setDT(df)[, lapply(.SD, na.omit), by = name] #> name code v1 v2 v3 v4 #> 1: Yemen 345 4 2 3 5 #> 2: Yemen 346 4 2 3 5 setDT(df)[, code := NULL][, lapply(.SD, na.omit), by = name] #> name v1 v2 v3 v4 #> 1: Yemen 4 2 3 5 setDT(df)[, code := NULL][, lapply(.SD, sum_NA), by = name] #> name v1 v2 v3 v4 #> 1: Yemen 4 2 3 5 </code></pre>

Merge rows in a dataframe where the rows are disjoint and contain NAs

Tags:

dataframe

r

data.table

dplyr

coalesce

I have a dataframe that has two rows:

| code | name  | v1 | v2 | v3 | v4 |
|------|-------|----|----|----|----|
| 345  | Yemen | NA | 2  | 3  | NA |
| 346  | Yemen | 4  | NA | NA | 5  |

Is there an easy way to merge these two rows? What if I rename "345" in "346", would that make things easier?

629

asked Jan 10 '13 22:01

LukasKawerau

2 Answers

You can use aggregate. Assuming that you want to merge rows with identical values in column name:

aggregate(x=DF[c("v1","v2","v3","v4")], by=list(name=DF$name), min, na.rm = TRUE)
   name v1 v2 v3 v4
1 Yemen  4  2  3  5

This is like the SQL SELECT name, min(v1) GROUP BY name. The min function is arbitrary, you could also use max or mean, all of them return the non-NA value from an NA and a non-NA value if na.rm = TRUE. (An SQL-like coalesce() function would sound better if existed in R.)

However, you should check first if all non-NA values for a given name is identical. For example, run the aggregate both with min and max and compare, or run it with range.

Finally, if you have many more variables than just v1-4, you could use DF[,!(names(DF) %in% c("code","name"))] to define the columns.

answered Oct 01 '22 21:10

Daniel Sparing

Adding dplyr & data.table solutions for completeness

Using dplyr::coalesce()

library(dplyr)

sum_NA <- function(x) {if (all(is.na(x))) x[NA_integer_] else sum(x, na.rm = TRUE)}

df %>% 
  group_by(name) %>% 
  summarise_all(sum_NA)
#> # A tibble: 1 x 6
#>   name   code    v1    v2    v3    v4
#>   <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Yemen   691     4     2     3     5

# Ref: https://stackoverflow.com/a/45515491
# Supply lists by splicing them into dots:
coalesce_by_column <- function(df) {
  return(dplyr::coalesce(!!! as.list(df)))
}

df %>% 
  group_by(name) %>% 
  summarise_all(coalesce_by_column)
#> # A tibble: 1 x 6
#>   name   code    v1    v2    v3    v4
#>   <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Yemen   345     4     2     3     5

Using data.table

# Ref: https://stackoverflow.com/q/28036294/
library(data.table)
setDT(df)[, lapply(.SD, na.omit), by = name]
#>     name code v1 v2 v3 v4
#> 1: Yemen  345  4  2  3  5
#> 2: Yemen  346  4  2  3  5

setDT(df)[, code := NULL][, lapply(.SD, na.omit), by = name]    
#>     name v1 v2 v3 v4
#> 1: Yemen  4  2  3  5

setDT(df)[, code := NULL][, lapply(.SD, sum_NA), by = name]
#>     name v1 v2 v3 v4
#> 1: Yemen  4  2  3  5

answered Oct 01 '22 20:10

Tung

Related questions
                            
                                Reload .Renviron or .Rprofile from an active R session (without restarting R)?
                            
                                Remove border lines in ggplot map/choropleth
                            
                                What is the purpose of .*\\?
                            
                                How to configure R-3.1.2 with --enable-R-shlib
                            
                                How do I sweep specific columns with dplyr?
                            
                                Where do absent dots (`...`) get processed?
                            
                                Prevent zooming out in leaflet R-Map?
                            
                                Equivalent of rowwise() do() with purrr, now that by_row() is deprecated?
                            
                                Error in bind_rows_(x, .id) : Argument 1 must have names using map_df in purrr
                            
                                Plotly plot doesn't render within for loop of RMarkdown document
                            
                                3D scatterplot using custom image
                            
                                Minimise cost of reallocating individuals
                            
                                Using MySQL in R for Windows
                            
                                Identify records in data frame A not contained in data frame B [closed]
                            
                                How are environments, (en)closures, and frames related?
                            
                                Create and save R's default codebooks as a pdf
                            
                                How can I make R plot locally in a remote ssh connection?
                            
                                Partial italics, axis.text.x
                            
                                Extract feature coordinates from SpatialPolygons and other sp classes
                            
                                How to analyse irregular time-series in R

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With