Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create column with grouped values based on another column

I'm sure this has been asked before, but I don't know what to search for, so I apologise in advance.

Let's say that I have the following data frame:

grades <- data.frame(a = 1:40, b = sample(45:100, 40))

Using deplyr, I want to create a new variable that indicates the grade the student received, based on the following criteria: 90-100 = excellent, 80-90 = very good, etc.

I thought I could use the following to get that result with nestling ifelse() inside of mutate():

grades %>%
mutate(ifelse(b >= 90, "excellent"), 
       ifelse(b >= 80 & b < 90, "very_good"),
       ifelse(b >= 70 & b < 80, "fair"),
       ifelse(b >= 60 & b < 70, "poor", "fail"))

This doesn't work, as I get the error message "argument no is missing, with no default"). I thought the "no" would be the "fail" at the end, but obviously I'm getting the syntax wrong.

I can get this to get if I first filter the original data individually, and then call ifelse, as follows:

a <- grades %>%
     filter( b >= 90) %>%
     mutate(final = ifelse(b >= 90, "excellent"))

and the rbind a, b, c, etc. Obviously,this isn't how I want to do it, but I wanted to understand the syntax of ifelse(). I'm guessing the latter works because there aren't any values that don't fill the criteria, but I still can't figure out how to get it to work when there is more than one ifelse.

like image 872
JoeF Avatar asked Jan 12 '15 13:01

JoeF


People also ask

How do you group a column based on another column in Excel?

Select Home > Group by. In the Group by dialog box, select Advanced to select more than one column to group by. To add another column, select Add Grouping.

How do you get the value of a column based on another column pandas?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.

How do I Group by one or more columns in Excel?

On the Home tab, in the Transform group. On the Transform tab, in the Table group. On the shortcut menu when you right-click to select columns. Use an aggregate function to group by one or more columns

How do I Group and aggregate by multiple columns in pandas?

Pandas: How to Group and Aggregate by Multiple Columns Often you may want to group and aggregate by multiple columns of a pandas DataFrame. Fortunately this is easy to do using the pandas.groupby () and.agg () functions. This tutorial explains several examples of how to use these functions in practice.

How to group values in multiple rows into a single value?

In Power Query, you can group values in various rows into a single value by grouping the rows according to the values in one or more columns. You can choose from two types of grouping operations: Aggregate a column by using an aggregate function.

How do you group values in a Power Query?

In Power Query, you can group values in various rows into a single value by grouping the rows according to the values in one or more columns. You can choose from two types of grouping operations: Aggregate a column by using an aggregate function. Perform a row operation.


1 Answers

Define vectors with the levels and labels and then use cut on the b column:

levels <- c(-Inf, 60, 70, 80, 90, Inf)
labels <- c("Fail", "Poor", "fair", "very good", "excellent")
grades %>% mutate(x = cut(b, levels, labels = labels))
    a   b         x
1   1  66      Poor
2   2  78      fair
3   3  97 excellent
4   4  46      Fail
5   5  89 very good
6   6  57      Fail
7   7  80      fair
8   8  98 excellent
9   9 100 excellent
10 10  93 excellent
11 11  59      Fail
12 12  51      Fail
13 13  69      Poor
14 14  75      fair
15 15  72      fair
16 16  48      Fail
17 17  74      fair
18 18  54      Fail
19 19  62      Poor
20 20  64      Poor
21 21  88 very good
22 22  70      Poor
23 23  85 very good
24 24  58      Fail
25 25  95 excellent
26 26  56      Fail
27 27  65      Poor
28 28  68      Poor
29 29  91 excellent
30 30  76      fair
31 31  82 very good
32 32  55      Fail
33 33  96 excellent
34 34  83 very good
35 35  61      Poor
36 36  60      Fail
37 37  77      fair
38 38  47      Fail
39 39  73      fair
40 40  71      fair

Or using data.table:

library(data.table)
setDT(grades)[, x := cut(b, levels, labels)]

Or simply in base R:

grades$x <- cut(grades$b, levels, labels)

Note

After taking another close look at your initial approach, I noticed that you would need to include right = FALSE in the cut call, because for example, 90 points should be "excellent", not just "very good". So it is used to define where the interval should be closed (left or right) and the default is on the right, which is slightly different from OP's initial approach. So in dplyr, it would then be:

grades %>% mutate(x = cut(b, levels, labels, right = FALSE))

and accordingly in the other options.

like image 129
talat Avatar answered Oct 21 '22 03:10

talat