Merging rows with shared information

Question

I have a data.frame with several rows which come from a merge which are not completely merged:

b <- read.table(text = "
      ID   Age    Steatosis       Mallory Lille_dico Lille_3 Bili.AHHS2cat
68 HA-09   16   <NA>          <NA>       <NA>       5             NA
69 HA-09   16   <33% no/occasional       <NA>      NA             1")

How can I merge them by a column ?

Expected output :

      ID  Age     Steatosis       Mallory Lille_dico Lille_3 Bili.AHHS2cat
69 HA-09   16  <33% no/occasional       <NA>       5             1

Note that some columns (other than ID) have the same value on both rows. These columns aren't part of the "primary key" of the database (AFAIK). So if there are several different values shouldn't be merged. Things I tried:

 merge(b[1, ], b[2, ], all = T) # Doesn't merge the rows, just the data.frames
 cast(b, ID ~ .) # I can count them but not merging them into a single row
 aggregate(b, by = list("ID", "Age"), c) # Error

aichao · Accepted Answer

A dplyr approach using summarise_all:

## using `na.strings` to identify NA entries in posted data
b <- read.table(text = "
      ID   Age    Steatosis       Mallory Lille_dico Lille_3 Bili.AHHS2cat
68 HA-09   16   <NA>          <NA>       <NA>       5             NA
69 HA-09   16   <33% no/occasional       <NA>      NA             1", na.strings = c("NA", "<NA>"))

library(dplyr)
f <- function(x) {
  x <- na.omit(x)
  if (length(x) > 0) first(x) else NA
}
res <- b %>% group_by(ID,Age) %>% summarise_all(funs(f))
##Source: local data frame [1 x 7]
##Groups: ID [?]
##
##      ID   Age Steatosis       Mallory Lille_dico Lille_3 Bili.AHHS2cat
##  <fctr> <int>    <fctr>        <fctr>      <lgl>   <int>         <int>
##1  HA-09    16      <33% no/occasional         NA       5             1

The definition of the function is to handle the case where all values is NA.

As @jdobres suggests, if there are more than one non-NA values that you want to merge (per each column), you may want to flatten all of these to a string representation using:

library(dplyr)
f <- function(x) {
  x <- na.omit(x)
  if (length(x) > 0) paste(x,collapse='-') else NA
}
res <- b %>% group_by(ID,Age) %>% summarise_all(funs(f))

In your posted data, the result would be the same as above because all columns that are summarized has at most one non-NA value.

Merging rows with shared information

Tags:

merge

r

llrs

1 Answers

aichao

Recent Activity

Donate For Us

Merging rows with shared information

Tags:

merge

r

llrs

1 Answers

aichao

Related questions

Recent Activity

Donate For Us