Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Dataframe create new column based on other columns

Tags:

dataframe

r

apply

I have a dataframe:

df <- data.frame('a'=c(1,2,3,4,5), 'b'=c(1,20,3,4,50))
df
    a    b
1   1    1
2   2   20
3   3    3
4   4    4
5   5   50

and I want to create a new column based on existing columns. Something like this:

if (df[['a']] == df[['b']]) {
  df[['c']] <- df[['a']] + df[['b']]
} else {
  df[['c']] <- df[['b']] - df[['a']]
}

The problem is that the if condition is checked only for the first row... If I create a function from the above if statement then I use apply() (or mapply()...), it is the same.

In Python/pandas I can use this:

df['c'] = df[['a', 'b']].apply(lambda x: x['a'] + x['b'] if (x['a'] == x['b']) \
    else x['b'] - x['a'], axis=1)

I want something similar in R. So the result should look like this:

    a    b    c
1   1    1    2
2   2   20   18
3   3    3    6
4   4    4    8
5   5   50   45
like image 994
ragesz Avatar asked Aug 26 '16 11:08

ragesz


People also ask

How will you create a new column whose value is calculated from two other columns?

To create a new column, use the [] brackets with the new column name at the left side of the assignment.

How do I get a column value of a pandas DataFrame based on another column?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.

How do I add a conditional column in pandas?

You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.


1 Answers

One option is ifelse which is vectorized version of if/else. If we are doing this for each row, the if/else as showed in the OP's pandas post can be done in either a for loop or lapply/sapply, but that would be inefficient in R.

df <- transform(df, c= ifelse(a==b, a+b, b-a))
df
#  a  b  c
#1 1  1  2
#2 2 20 18
#3 3  3  6
#4 4  4  8
#5 5 50 45

This can be otherwise written as

df$c <- with(df, ifelse(a==b, a+b, b-a))

to create the 'c' column in the original dataset


As the OP wants a similar option in R using if/else

df$c <- apply(df, 1, FUN = function(x) if(x[1]==x[2]) x[1]+x[2] else x[2]-x[1])
like image 94
akrun Avatar answered Sep 26 '22 06:09

akrun