Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data.frame filtering

Tags:

dataframe

r

I have the following data.frame df:

df = data.frame(col1    = c('a','a','a','a','a','b','b','c','d'),
                col2    = c('a','a','a','b','b','b','b','a','a'),
                height1 = c(NA,32,NA,NA,NA,NA,NA,25,NA),
                height2 = c(31,31.5,NA,NA,11,12,13,NA,NA),
                col3    = 1:9)

#  col1 col2 height1 height2 col3
#1    a    a      NA    31.0    1
#2    a    a      32    31.5    2
#3    a    a      NA      NA    3
#4    a    b      NA      NA    4
#5    a    b      NA    11.0    5
#6    b    b      NA    12.0    6
#7    b    b      NA    13.0    7
#8    c    a      25      NA    8
#9    d    a      NA      NA    9

I want for each couple of value in col1, col2 to build a column height containing values such that:

  • If there are only NA in height1 and height2, return NA.
  • If there is a value in height1, take this value. (for a couple col1, col2, there is at most one non NA value in column height1)
  • If there are only NA in height1 and some non NA values in height2, take the first value in height2.

I need also to keep corresponding values in column col3.

The new data.frame new.df will look like:

#  col1 col2 height col3
#1    a    a     32    2
#2    a    b     11    5
#3    b    b     12    6
#4    c    a     25    8
#5    d    a     NA    9

I would prefer a data.frame approach, quite concise, but I realize I am unable to find one!

like image 760
Colonel Beauvel Avatar asked Mar 09 '15 08:03

Colonel Beauvel


People also ask

How do you filter a column in a DataFrame in python?

Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.

What is DF filter?

DataFrame - filter() function The filter() function is used to subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.


3 Answers

Maybe not the elegant solution you are looking for but here is a base R option:

do.call("rbind",
        lapply(split(df,paste0(df$col1,df$col2)),
               function(tab) {
                 colnames(tab)[3:4] <- "height" 
                 out <- if(any(!is.na(tab[, 3]))) {
                           tab[which(!is.na(tab[,3])),-4]
                        } else {
                           if (any(!is.na(tab[,4]))) {
                              tab[which(!is.na(tab[,4]))[1],c(1:2,4:5)]
                           } else {
                              tab[1,-4]
                           }
                        }
                return(out)
               }
        )
      )

#       col1 col2 height col3
#    aa    a    a     32    2
#    ab    a    b     11    5
#    bb    b    b     12    6
#    ca    c    a     25    8
#    da    d    a     NA    9
like image 51
Cath Avatar answered Oct 21 '22 23:10

Cath


With dplyr:

df %>%
  mutate( 
    order = ifelse(!is.na(height1), 1, ifelse(!is.na(height2), 2, 3)),
    height = ifelse(!is.na(height1), height1, ifelse(!is.na(height2), height2, NA))
    ) %>%
  arrange( col1, col2, order, height) %>%
  distinct(col1, col2) %>%
  select( col1, col2, height, col3)
like image 34
bergant Avatar answered Oct 21 '22 23:10

bergant


I use data.table (whereas I would like to use data.frame option exceptionaly there) and I find my solution unelegant:

func = function(df)
{
    if(all(is.na(subset(df, select=c(height1,height2)))))
        return(df[1,])

    if(any(!is.na(df$height1)))
        return(df[!is.na(df$height1),])

    df[!is.na(df$height2),][1,]
}

setDT(df)
new.df=df[,func(.SD),by=list(col1,col2)]
new.df = data.frame(new.df)

new.df$height = ifelse(is.na(new.df$height1), new.df$height2, new.df$height1)

#> new.df
#  col1 col2 height1 height2 col3 height
#1    a    a      32    31.5    2     32
#2    a    b      NA    11.0    5     11
#3    b    b      NA    12.0    6     12
#4    c    a      25      NA    8     25
#5    d    a      NA      NA    9     NA
like image 41
Colonel Beauvel Avatar answered Oct 22 '22 00:10

Colonel Beauvel