I have a data frame with more than 400.000 observations and I'm trying to add a column to it which its values depend on another column and sometimes multiple ones.
Here is a simpler example of what I'm trying to do :
# Creating a data frame
M <- data.frame(c("A","B","C"),c(5,100,60))
names(M) <- c("Letter","Number")
#adding a column
M$Size <- NA
# if Number <= 50 Size is small,
# if Number is between 50 and 70, Size is Medium
# if Number is Bigger than 70, Size is Big
ifelse (M$Number <=50, M$Size <-"Small",
ifelse(M$Number <= 70,
M$Size <- "Medium",
M$Size <- "Big"
))
When I run the Code, the output I get is :
[1] "Small" "Big" "Medium"
But the "Size" column in M is always the last condition in the ifelse function :
> print (M)
Letter Number Size
1 A 5 Big
2 B 100 Big
3 C 60 Big
The Result that I want :
> print (M)
Letter Number Size
1 A 5 Small
2 B 100 Big
3 C 60 Medium
I can solve the problem by subsetting each conditionsubset
and using rbind
to get the result I want but the code will be very long and since the original data frame I'm working on is big, it'll take more time to run. So I'm wondering how can I fix this issue ?
Using apply() method If you need to apply a method over an existing column in order to compute some values that will eventually be added as a new column in the existing DataFrame, then pandas. DataFrame. apply() method should do the trick.
You can create a conditional column in pandas DataFrame by using np. where() , np. select() , DataFrame. map() , DataFrame.
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
This will help you out -
# Creating a data frame
M <- data.frame(c("A","B","C"),c(5,100,60))
names(M) <- c("Letter","Number")
#adding a column
# if Number <= 50 Size is small,
# if Number is between 50 and 70, Size is Medium
# if Number is Bigger than 70, Size is Big
# M$Size[M$Number <= 50] <- "Small"
# Edit: No need to subset "Small"
M$Size <- "Small"
M$Size[M$Number >50 & M$Number<70] <- "Medium"
M$Size[M$Number > 70] <- "Big"
# Letter Number Size
# 1 A 5 Small
# 2 B 100 Big
# 3 C 60 Medium
See this on R-Fiddle
Use cut
:
M$Size <- cut(M$Number, breaks = c(-Inf, 50, 70, Inf),
labels = c("small", "medium", "large"))
# etter Number Size
#1 A 5 small
#2 B 100 large
#3 C 60 medium
Same idea but assign it like this instead. No package needed.
M$Size <- ifelse(M$Number <= 50, 'Small', ifelse(M$Number <= 70, 'Medium', 'Big'))
Result:
Letter Number Size
1 A 5 Small
2 B 100 Big
3 C 60 Medium
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With