Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Adding new column with conditional values using ifelse

I have a data frame with more than 400.000 observations and I'm trying to add a column to it which its values depend on another column and sometimes multiple ones.

Here is a simpler example of what I'm trying to do :

# Creating a data frame 

M <- data.frame(c("A","B","C"),c(5,100,60))

names(M) <- c("Letter","Number")

#adding a column 

M$Size <- NA

# if Number <= 50 Size is small, 
# if Number is between 50 and 70, Size is Medium
# if Number is Bigger than 70, Size is Big

ifelse (M$Number <=50, M$Size <-"Small",
        ifelse(M$Number <= 70,
        M$Size <- "Medium",
        M$Size <- "Big"
        ))

When I run the Code, the output I get is :

[1] "Small"  "Big"    "Medium"

But the "Size" column in M is always the last condition in the ifelse function :

> print (M)
  Letter Number Size
1      A      5  Big
2      B    100  Big
3      C     60  Big

The Result that I want :

> print (M)
  Letter Number Size
1      A      5  Small
2      B    100  Big
3      C     60  Medium

I can solve the problem by subsetting each conditionsubset and using rbind to get the result I want but the code will be very long and since the original data frame I'm working on is big, it'll take more time to run. So I'm wondering how can I fix this issue ?

like image 895
NouiNou Avatar asked May 19 '16 10:05

NouiNou


People also ask

How do you create a new column based on conditions of other columns pandas?

Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.

How do I add a conditional column in pandas?

You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.

How do you replace values in a column based on condition?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.


3 Answers

This will help you out -

# Creating a data frame 

M <- data.frame(c("A","B","C"),c(5,100,60))

names(M) <- c("Letter","Number")

#adding a column 


# if Number <= 50 Size is small, 
# if Number is between 50 and 70, Size is Medium
# if Number is Bigger than 70, Size is Big

# M$Size[M$Number <= 50] <- "Small"
# Edit: No need to subset "Small"
M$Size <- "Small"
M$Size[M$Number >50 & M$Number<70] <- "Medium"
M$Size[M$Number > 70] <- "Big"

#      Letter Number   Size
# 1      A      5      Small
# 2      B    100      Big
# 3      C     60      Medium

See this on R-Fiddle

like image 146
PRYM Avatar answered Nov 04 '22 05:11

PRYM


Use cut:

M$Size <- cut(M$Number, breaks = c(-Inf, 50, 70, Inf), 
                        labels = c("small", "medium", "large"))
#   etter Number   Size
#1      A      5  small
#2      B    100  large
#3      C     60 medium
like image 8
Roland Avatar answered Nov 04 '22 05:11

Roland


Same idea but assign it like this instead. No package needed.

M$Size <- ifelse(M$Number <= 50, 'Small', ifelse(M$Number <= 70, 'Medium', 'Big'))

Result:

  Letter Number   Size
1      A      5  Small
2      B    100    Big
3      C     60 Medium
like image 6
elevendollar Avatar answered Nov 04 '22 07:11

elevendollar