Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

r - copy value based on match in another column

Tags:

r

In this data frame:

Item <- c("A","B","A","A","A","A","A","B")
Trial <- c("Fam","Fam","Test","Test","Test","Test","Test","Test")
Condition <-c("apple","cherry","Trash","Trash","Trash","Trash","Trash","Trash")
ID <- c(rep("01",8))


df <- data.frame(cbind(Item,Trial,Condition,ID))

I would like to replace the "Trash" value of df$condition at df$Trial == "Test". The new value of df$condition should be a copy df$condition at df$Trial == "Fam", based on a match of Fam and Test Trials in df$Item.

So my final data frame should look like this

  Item Trial Condition ID
1    A   Fam     apple 01
2    B   Fam    cherry 01
3    A  Test     apple 01
4    A  Test     apple 01
5    A  Test     apple 01
6    A  Test     apple 01
7    A  Test     apple 01
8    B  Test    cherry 01

Ultimately I would like to do this for unique ID's in my original data frame. So I guess I will have to apply the function within ddply or so later on.

like image 395
Laura Avatar asked Nov 30 '15 12:11

Laura


People also ask

How do I get the value of a column based on another column value?

You can extract a column of pandas DataFrame based on another value by using the DataFrame. query() method. The query() is used to query the columns of a DataFrame with a boolean expression.

How do I copy a column to another column in R?

The best way to replicate columns in R is by using the CBIND() function and the REP() function. First, you use the REP() function to select a column and create one or more copies. Then, you use the CBIND() function to merge the original dataset and the replicated columns into a single data frame.

How to get column values in pandas?

You can use the loc and iloc functions to access columns in a Pandas DataFrame. Let's see how. If we wanted to access a certain column in our DataFrame, for example the Grades column, we could simply use the loc function and specify the name of the column in order to retrieve it.


3 Answers

You could do a self binary join on df when Trial != "Test" and update the Condition column by reference using the data.table package, for instance

library(data.table) ## V 1.9.6+
setDT(df)[df[Trial != "Test"], Condition := i.Condition, on = c("Item", "ID")]
df
#    Item Trial Condition ID
# 1:    A   Fam     apple 01
# 2:    B   Fam    cherry 01
# 3:    A  Test     apple 01
# 4:    A  Test     apple 01
# 5:    A  Test     apple 01
# 6:    A  Test     apple 01
# 7:    A  Test     apple 01
# 8:    B  Test    cherry 01

Or (with some modification of @docendos) suggestion, simply

setDT(df)[, Condition := Condition[Trial != "Test"], by = .(Item, ID)]
like image 148
David Arenburg Avatar answered Nov 04 '22 14:11

David Arenburg


Here is an option using dplyr

library(dplyr)
distinct(df) %>% 
    filter(Trial=='Fam') %>% 
    left_join(df, ., by = c('Item', 'ID')) %>% 
    mutate(Condition = ifelse(Condition.x=='Trash',
            as.character(Condition.y), as.character(Condition.x))) %>% 
    select(c(1,2,4,7))

Or as suggested by @docendodiscimus

df %>% 
    group_by(ID, Item) %>%
    mutate(Condition = Condition[Condition != "Trash"])
like image 36
akrun Avatar answered Nov 04 '22 14:11

akrun


You could also just create a for-loop and loop through all the values that need to be changed. This setup makes it easy to add other items and/or change the type of condition later on.

> for(i in 1:nrow(df)) {
>     
>     if(df[i, 1] == "A") {
>         df2[i, 3] <- "apple"
>     }
>     else if(df[i, 1] == "B") {
>         df2[i, 3] <- "cherry"
>     }
> }
like image 42
Mick Avatar answered Nov 04 '22 15:11

Mick