I am trying to work out an effective date for any given date. The dataframe has a column which is populated by the BMonthEnd (last business day of month taking into account holidays - calculated by code not shown here)
the partial dataframe shown below has the EffectiveDate equal to the Date as 1st step
Date BMonthEnd EffectiveDate
2014-08-24 2014-08-24 2014-08-29 2014-08-24
2014-08-25 2014-08-25 2014-08-29 2014-08-25
2014-08-26 2014-08-26 2014-08-29 2014-08-26
2014-08-27 2014-08-27 2014-08-29 2014-08-27
2014-08-28 2014-08-28 2014-08-29 2014-08-28
2014-08-29 2014-08-29 2014-08-29 2014-08-29
2014-08-30 2014-08-30 2014-08-29 2014-08-30
2014-08-31 2014-08-31 2014-08-29 2014-08-31
I now try to select out the data that need to be changed with:
df[~(df.Date<df.BMonthEnd)].EffectiveDate # giving the expected slice
# but
df[~(df.Date<df.BMonthEnd)].EffectiveDate = 1
# gives error
SettingWithCopyWarning: A value is trying to be set on a copy of a slice
from a DataFrame. Try using .loc[row_index,col_indexer] = value instead
self[name] = value
following the warning i tried the alternate method i tried:
df.loc[~(df.Date<df.BMonthEnd)].EffectiveDate = 1
this also gives the same error. (note the 1 used in assignment is just placeholder for another function) and the assignment does not reflect on the original dataframe. I understand that I am effectively assigning to a copy so that it does not change the original dataframe as intended.
How do I however achieve my goal of using the selecting syntax to assign. I really do not want to have to iterate over the dataframe.
To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.
You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.
The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.
Figured it out. Selecting out the Series in the Dataframe effectively allows me to assign to it and the original dataframe. this allows me to use the slicing syntac to apply logic influencing the results:
# not all methods, classes shown
def effective_date(dr):
df = pd.DataFrame(dr, index=dr, columns=['Date'])
df['BMonthEnd'] = df.Date.apply(h.last_business_day)
df['MonthEnd'] = df.Date.apply(h.month_end)
df['EffectiveDate'] = df.Date
# df.EffectiveDate[~(df.Date<df.BMonthEnd)] = df.MonthEnd
df.loc[~(df.Date<df.BMonthEnd),'EffectiveDate'] = df.MonthEnd
return df.EffectiveDate
Have Updated it with Jeff's suggestion. See now why chain indexing can get you into trouble. Have done a few timeits and they seem to be faster, but when assigning to the dataframe .loc is the better option.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With