Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Match and replace columns of dataframe by multiple conditions

Cheers, I have two data frames with the following structure.

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     150
American           ATL        10-31-2016     150
Delta              ATL        10-31-2016     180
American           ATL        08-31-2017     200

Second data frame DF2 has the following structure:

DF2:
Airlines           HeadQ      Date          
American           ATL        09-30-2016
Delta              ATL        03-31-2017

Now looking up with data frames DF1 and DF2, I would like to alter DF1 to the following data frame.

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     0
American           ATL        10-31-2016     150
Delta              ATL        10-31-2016     180
American           ATL        08-31-2017     200

The condition is, lookup for Airlines and HeadQ of DF1 from DF2 and if DF1$Date < DF2$Date then make Cost_Index as 0 or else continue with Cost_Index.

I tried, unsuccessfully, with:

DF1$Cost_Index <- ifelse(DF1$Airlines == DF2$Airlines & DF1$HeadQ == DF2$HeadQ 
        & DF1$Date < DF2$Date, 0, DF1$Cost_Index)


Warning:
1: In DF1$Airlines == DF2$Airlines : longer object
length is not a multiple of shorter object length". 
2: In<=.default(DF1$Date, DF2$Date) : longer object length is not a
multiple of shorter object length

DF1:
Airlines           HeadQ      Date           Cost_Index
American           PHX        07-31-2016     220
American           ATL        08-31-2016     0
American           ATL        10-31-2016     0
Delta              ATL        10-31-2016     0
American           ATL        08-31-2017     200

Can anyone point me to right direction?

Note:

str(DF1$Date): Date, format: "2016-10-31"
str(DF2$Date): Date, format: "2016-08-31"
like image 519
Sairam Reddy Avatar asked Aug 29 '16 19:08

Sairam Reddy


People also ask

How replace column values in pandas based on multiple conditions?

You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.

How do you put multiple conditions in a DataFrame?

Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.


1 Answers

Using the conditional joins feature (since 1.9.8), I'd do this as follows:

require(data.table) # v1.9.8+
# convert to data.tables, and Date column to Date class.
setDT(df1)[, Date := as.Date(Date, format = "%m-%d-%Y")]
setDT(df2)[, Date := as.Date(Date, format = "%m-%d-%Y")]

df1[df2, on = .(Airlines, HeadQ, Date < Date), # find matching rows based on condition
      Cost_Index := 0L]                        # update column with 0 for those rows

df1
#    Airlines HeadQ       Date Cost_Index
# 1: American   PHX 2016-07-31        220
# 2: American   ATL 2016-08-31          0
# 3: American   ATL 2016-10-31        150
# 4:    Delta   ATL 2016-10-31        180
like image 144
Arun Avatar answered Nov 26 '22 22:11

Arun