I'm sure this has been asked before, but I don't know what to search for, so I apologise in advance.
Let's say that I have the following data frame:
grades <- data.frame(a = 1:40, b = sample(45:100, 40))
Using deplyr, I want to create a new variable that indicates the grade the student received, based on the following criteria: 90-100 = excellent, 80-90 = very good, etc.
I thought I could use the following to get that result with nestling ifelse() inside of mutate():
grades %>%
mutate(ifelse(b >= 90, "excellent"),
ifelse(b >= 80 & b < 90, "very_good"),
ifelse(b >= 70 & b < 80, "fair"),
ifelse(b >= 60 & b < 70, "poor", "fail"))
This doesn't work, as I get the error message "argument no is missing, with no default"). I thought the "no" would be the "fail" at the end, but obviously I'm getting the syntax wrong.
I can get this to get if I first filter the original data individually, and then call ifelse, as follows:
a <- grades %>%
filter( b >= 90) %>%
mutate(final = ifelse(b >= 90, "excellent"))
and the rbind a, b, c, etc. Obviously,this isn't how I want to do it, but I wanted to understand the syntax of ifelse(). I'm guessing the latter works because there aren't any values that don't fill the criteria, but I still can't figure out how to get it to work when there is more than one ifelse.
Select Home > Group by. In the Group by dialog box, select Advanced to select more than one column to group by. To add another column, select Add Grouping.
You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.
On the Home tab, in the Transform group. On the Transform tab, in the Table group. On the shortcut menu when you right-click to select columns. Use an aggregate function to group by one or more columns
Pandas: How to Group and Aggregate by Multiple Columns Often you may want to group and aggregate by multiple columns of a pandas DataFrame. Fortunately this is easy to do using the pandas.groupby () and.agg () functions. This tutorial explains several examples of how to use these functions in practice.
In Power Query, you can group values in various rows into a single value by grouping the rows according to the values in one or more columns. You can choose from two types of grouping operations: Aggregate a column by using an aggregate function.
In Power Query, you can group values in various rows into a single value by grouping the rows according to the values in one or more columns. You can choose from two types of grouping operations: Aggregate a column by using an aggregate function. Perform a row operation.
Define vectors with the levels and labels and then use cut
on the b
column:
levels <- c(-Inf, 60, 70, 80, 90, Inf)
labels <- c("Fail", "Poor", "fair", "very good", "excellent")
grades %>% mutate(x = cut(b, levels, labels = labels))
a b x
1 1 66 Poor
2 2 78 fair
3 3 97 excellent
4 4 46 Fail
5 5 89 very good
6 6 57 Fail
7 7 80 fair
8 8 98 excellent
9 9 100 excellent
10 10 93 excellent
11 11 59 Fail
12 12 51 Fail
13 13 69 Poor
14 14 75 fair
15 15 72 fair
16 16 48 Fail
17 17 74 fair
18 18 54 Fail
19 19 62 Poor
20 20 64 Poor
21 21 88 very good
22 22 70 Poor
23 23 85 very good
24 24 58 Fail
25 25 95 excellent
26 26 56 Fail
27 27 65 Poor
28 28 68 Poor
29 29 91 excellent
30 30 76 fair
31 31 82 very good
32 32 55 Fail
33 33 96 excellent
34 34 83 very good
35 35 61 Poor
36 36 60 Fail
37 37 77 fair
38 38 47 Fail
39 39 73 fair
40 40 71 fair
Or using data.table:
library(data.table)
setDT(grades)[, x := cut(b, levels, labels)]
Or simply in base R:
grades$x <- cut(grades$b, levels, labels)
After taking another close look at your initial approach, I noticed that you would need to include right = FALSE
in the cut
call, because for example, 90 points should be "excellent", not just "very good". So it is used to define where the interval should be closed (left or right) and the default is on the right, which is slightly different from OP's initial approach. So in dplyr, it would then be:
grades %>% mutate(x = cut(b, levels, labels, right = FALSE))
and accordingly in the other options.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With