Cheers, I have two data frames with the following structure.
DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 150
American ATL 10-31-2016 150
Delta ATL 10-31-2016 180
American ATL 08-31-2017 200
Second data frame DF2 has the following structure:
DF2:
Airlines HeadQ Date
American ATL 09-30-2016
Delta ATL 03-31-2017
Now looking up with data frames DF1 and DF2, I would like to alter DF1 to the following data frame.
DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 0
American ATL 10-31-2016 150
Delta ATL 10-31-2016 180
American ATL 08-31-2017 200
The condition is, lookup for Airlines and HeadQ of DF1 from DF2 and if DF1$Date < DF2$Date then make Cost_Index as 0 or else continue with Cost_Index.
I tried, unsuccessfully, with:
DF1$Cost_Index <- ifelse(DF1$Airlines == DF2$Airlines & DF1$HeadQ == DF2$HeadQ
& DF1$Date < DF2$Date, 0, DF1$Cost_Index)
Warning:
1: In DF1$Airlines == DF2$Airlines : longer object
length is not a multiple of shorter object length".
2: In<=.default(DF1$Date, DF2$Date) : longer object length is not a
multiple of shorter object length
DF1:
Airlines HeadQ Date Cost_Index
American PHX 07-31-2016 220
American ATL 08-31-2016 0
American ATL 10-31-2016 0
Delta ATL 10-31-2016 0
American ATL 08-31-2017 200
Can anyone point me to right direction?
Note:
str(DF1$Date): Date, format: "2016-10-31"
str(DF2$Date): Date, format: "2016-08-31"
You can replace values of all or selected columns based on the condition of pandas DataFrame by using DataFrame. loc[ ] property. The loc[] is used to access a group of rows and columns by label(s) or a boolean array. It can access and can also manipulate the values of pandas DataFrame.
Using Loc to Filter With Multiple Conditions The loc function in pandas can be used to access groups of rows or columns by label. Add each condition you want to be included in the filtered result and concatenate them with the & operator. You'll see our code sample will return a pd. dataframe of our filtered rows.
Using the conditional joins feature (since 1.9.8
), I'd do this as follows:
require(data.table) # v1.9.8+
# convert to data.tables, and Date column to Date class.
setDT(df1)[, Date := as.Date(Date, format = "%m-%d-%Y")]
setDT(df2)[, Date := as.Date(Date, format = "%m-%d-%Y")]
df1[df2, on = .(Airlines, HeadQ, Date < Date), # find matching rows based on condition
Cost_Index := 0L] # update column with 0 for those rows
df1
# Airlines HeadQ Date Cost_Index
# 1: American PHX 2016-07-31 220
# 2: American ATL 2016-08-31 0
# 3: American ATL 2016-10-31 150
# 4: Delta ATL 2016-10-31 180
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With