Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create categories by comparing a numeric column with a fixed value

Tags:

dataframe

r

Consider the iris data:

 iris          Sepal.Length Sepal.Width Petal.Length Petal.Width    Species     1            5.1         3.5          1.4         0.2     setosa     2            4.9         3.0          1.4         0.2     setosa     3            4.7         3.2          1.3         0.2     setosa     4            4.6         3.1          1.5         0.2     setosa     5            5.0         3.6          1.4         0.2     setosa     6            5.4         3.9          1.7         0.4     setosa     7            4.6         3.4          1.4         0.3     setosa 

I want to create a new column based on a comparison of the values in variable Sepal.Length with a fixed limit / cut-off, e.g. check if the values are larger or smaller than 5:

if Sepal.Length >= 5 assign "UP" else assign "DOWN" to a new column "Regulation".

What's the way to do that?

like image 606
neversaint Avatar asked Feb 22 '13 04:02

neversaint


People also ask

How do I add a column to a Dataframe from another Dataframe in R?

To add a new column to a dataframe in R you can use the $-operator. For example, to add the column “NewColumn”, you can do like this: dataf$NewColumn <- Values . Now, this will effectively add your new variable to your dataset.


Video Answer


2 Answers

Try

iris$Regulation <- ifelse(iris$Sepal.Length >=5, "UP", "DOWN") 
like image 110
Oscar de León Avatar answered Sep 28 '22 06:09

Oscar de León


In the interest of updating a possible canonical, the package dplyr has the function mutate which lets you create a new column in a data.frame in a vectorized fashion:

library(dplyr) iris_new <- iris %>%     mutate(Regulation = if_else(Sepal.Length >= 5, 'UP', 'DOWN')) 

This makes a new column called Regulation which consists of either 'UP' or 'DOWN' based on applying the condition to the Sepal.Length column.

The case_when function (also from dplyr) provides an easy to read way to chain together multiple conditions:

iris %>%     mutate(Regulation = case_when(Sepal.Length >= 5 ~ 'High',                                   Sepal.Length >= 4.5 ~ 'Mid',                                   TRUE ~ 'Low')) 

This works just like if_else except instead of 1 condition with a return value for TRUE and FALSE, each line has condition (left side of ~) and a return value (right side of ~) that it returns if TRUE. If false, it moves on to the next condition.

In this case, rows where Sepal.Length >= 5 will return 'High', rows where Sepal.Length < 5 (since the first condition had to fail) & Sepal.Length >= 4.5 will return 'Mid', and all other rows will return 'Low'. Since TRUE is always TRUE, it is used to provide a default value.

like image 28
divibisan Avatar answered Sep 28 '22 05:09

divibisan