I have the following data.frame <code>df</code>: <pre class="prettyprint"><code>df = data.frame(col1 = c('a','a','a','a','a','b','b','c','d'), col2 = c('a','a','a','b','b','b','b','a','a'), height1 = c(NA,32,NA,NA,NA,NA,NA,25,NA), height2 = c(31,31.5,NA,NA,11,12,13,NA,NA), col3 = 1:9) # col1 col2 height1 height2 col3 #1 a a NA 31.0 1 #2 a a 32 31.5 2 #3 a a NA NA 3 #4 a b NA NA 4 #5 a b NA 11.0 5 #6 b b NA 12.0 6 #7 b b NA 13.0 7 #8 c a 25 NA 8 #9 d a NA NA 9 </code></pre> I want for each couple of value in <code>col1, col2</code> to build a column <code>height</code> containing values such that: <ul> <li>If there are only <code>NA</code> in <code>height1</code> and <code>height2</code>, return <code>NA</code>.</li> <li>If there is a value in <code>height1</code>, take this value. (for a couple <code>col1, col2</code>, there is at most one <code>non NA</code> value in column <code>height1</code>)</li> <li>If there are only <code>NA</code> in <code>height1</code> and some <code>non NA</code> values in <code>height2</code>, take the first value in <code>height2</code>.</li> </ul> I need also to keep corresponding values in column <code>col3</code>. The new <code>data.frame</code> <code>new.df</code> will look like: <pre class="prettyprint"><code># col1 col2 height col3 #1 a a 32 2 #2 a b 11 5 #3 b b 12 6 #4 c a 25 8 #5 d a NA 9 </code></pre> I would prefer a <code>data.frame</code> approach, quite concise, but I realize I am unable to find one!

Maybe not the elegant solution you are looking for but here is a <code>base R</code> option: <pre class="prettyprint"><code>do.call("rbind", lapply(split(df,paste0(df$col1,df$col2)), function(tab) { colnames(tab)[3:4] <- "height" out <- if(any(!is.na(tab[, 3]))) { tab[which(!is.na(tab[,3])),-4] } else { if (any(!is.na(tab[,4]))) { tab[which(!is.na(tab[,4]))[1],c(1:2,4:5)] } else { tab[1,-4] } } return(out) } ) ) # col1 col2 height col3 # aa a a 32 2 # ab a b 11 5 # bb b b 12 6 # ca c a 25 8 # da d a NA 9 </code></pre>

Data.frame filtering

I have the following data.frame df:

df = data.frame(col1    = c('a','a','a','a','a','b','b','c','d'),
                col2    = c('a','a','a','b','b','b','b','a','a'),
                height1 = c(NA,32,NA,NA,NA,NA,NA,25,NA),
                height2 = c(31,31.5,NA,NA,11,12,13,NA,NA),
                col3    = 1:9)

#  col1 col2 height1 height2 col3
#1    a    a      NA    31.0    1
#2    a    a      32    31.5    2
#3    a    a      NA      NA    3
#4    a    b      NA      NA    4
#5    a    b      NA    11.0    5
#6    b    b      NA    12.0    6
#7    b    b      NA    13.0    7
#8    c    a      25      NA    8
#9    d    a      NA      NA    9

I want for each couple of value in col1, col2 to build a column height containing values such that:

If there are only NA in height1 and height2, return NA.
If there is a value in height1, take this value. (for a couple col1, col2, there is at most one non NA value in column height1)
If there are only NA in height1 and some non NA values in height2, take the first value in height2.

I need also to keep corresponding values in column col3.

The new data.frame new.df will look like:

#  col1 col2 height col3
#1    a    a     32    2
#2    a    b     11    5
#3    b    b     12    6
#4    c    a     25    8
#5    d    a     NA    9

I would prefer a data.frame approach, quite concise, but I realize I am unable to find one!

How do you filter a column in a DataFrame in python?

Using query() to Filter by Column Value in pandas DataFrame. query() function is used to filter rows based on column value in pandas. After applying the expression, it returns a new DataFrame. If you wanted to update the existing DataFrame use inplace=True param.

What is DF filter?

DataFrame - filter() function The filter() function is used to subset rows or columns of dataframe according to labels in the specified index. Note that this routine does not filter a dataframe on its contents. The filter is applied to the labels of the index.

Maybe not the elegant solution you are looking for but here is a base R option:

do.call("rbind",
        lapply(split(df,paste0(df$col1,df$col2)),
               function(tab) {
                 colnames(tab)[3:4] <- "height" 
                 out <- if(any(!is.na(tab[, 3]))) {
                           tab[which(!is.na(tab[,3])),-4]
                        } else {
                           if (any(!is.na(tab[,4]))) {
                              tab[which(!is.na(tab[,4]))[1],c(1:2,4:5)]
                           } else {
                              tab[1,-4]
                           }
                        }
                return(out)
               }
        )
      )

#       col1 col2 height col3
#    aa    a    a     32    2
#    ab    a    b     11    5
#    bb    b    b     12    6
#    ca    c    a     25    8
#    da    d    a     NA    9

With dplyr:

df %>%
  mutate( 
    order = ifelse(!is.na(height1), 1, ifelse(!is.na(height2), 2, 3)),
    height = ifelse(!is.na(height1), height1, ifelse(!is.na(height2), height2, NA))
    ) %>%
  arrange( col1, col2, order, height) %>%
  distinct(col1, col2) %>%
  select( col1, col2, height, col3)

I use data.table (whereas I would like to use data.frame option exceptionaly there) and I find my solution unelegant:

func = function(df)
{
    if(all(is.na(subset(df, select=c(height1,height2)))))
        return(df[1,])

    if(any(!is.na(df$height1)))
        return(df[!is.na(df$height1),])

    df[!is.na(df$height2),][1,]
}

setDT(df)
new.df=df[,func(.SD),by=list(col1,col2)]
new.df = data.frame(new.df)

new.df$height = ifelse(is.na(new.df$height1), new.df$height2, new.df$height1)

#> new.df
#  col1 col2 height1 height2 col3 height
#1    a    a      32    31.5    2     32
#2    a    b      NA    11.0    5     11
#3    b    b      NA    12.0    6     12
#4    c    a      25      NA    8     25
#5    d    a      NA      NA    9     NA

Data.frame filtering

Tags:

dataframe

r

Colonel Beauvel

People also ask

3 Answers

Cath

bergant

Colonel Beauvel

Recent Activity

Donate For Us

Data.frame filtering

Tags:

dataframe

r

Colonel Beauvel

People also ask

3 Answers

Cath

bergant

Colonel Beauvel

Related questions

Recent Activity

Donate For Us