Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assigning to slices of pandas DataFrames

Tags:

python

pandas

I am trying to work out an effective date for any given date. The dataframe has a column which is populated by the BMonthEnd (last business day of month taking into account holidays - calculated by code not shown here)

the partial dataframe shown below has the EffectiveDate equal to the Date as 1st step

            Date        BMonthEnd   EffectiveDate
2014-08-24  2014-08-24  2014-08-29  2014-08-24
2014-08-25  2014-08-25  2014-08-29  2014-08-25
2014-08-26  2014-08-26  2014-08-29  2014-08-26
2014-08-27  2014-08-27  2014-08-29  2014-08-27
2014-08-28  2014-08-28  2014-08-29  2014-08-28
2014-08-29  2014-08-29  2014-08-29  2014-08-29
2014-08-30  2014-08-30  2014-08-29  2014-08-30
2014-08-31  2014-08-31  2014-08-29  2014-08-31

I now try to select out the data that need to be changed with:

df[~(df.Date<df.BMonthEnd)].EffectiveDate  # giving the expected slice
# but 
df[~(df.Date<df.BMonthEnd)].EffectiveDate = 1
# gives error

SettingWithCopyWarning: A value is trying to be set on a copy of a slice
from a DataFrame. Try using .loc[row_index,col_indexer] = value instead
self[name] = value

following the warning i tried the alternate method i tried:

df.loc[~(df.Date<df.BMonthEnd)].EffectiveDate = 1

this also gives the same error. (note the 1 used in assignment is just placeholder for another function) and the assignment does not reflect on the original dataframe. I understand that I am effectively assigning to a copy so that it does not change the original dataframe as intended.

How do I however achieve my goal of using the selecting syntax to assign. I really do not want to have to iterate over the dataframe.

like image 396
Joop Avatar asked May 09 '14 13:05

Joop


People also ask

How do I take sliced columns in pandas?

To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.

How do you slice multiple index in pandas?

You can slice a MultiIndex by providing multiple indexers. You can provide any of the selectors as if you are indexing by label, see Selection by Label, including slices, lists of labels, labels, and boolean indexers. You can use slice(None) to select all the contents of that level.

Is pandas query faster than LOC?

The query function seams more efficient than the loc function. DF2: 2K records x 6 columns. The loc function seams much more efficient than the query function.


1 Answers

Figured it out. Selecting out the Series in the Dataframe effectively allows me to assign to it and the original dataframe. this allows me to use the slicing syntac to apply logic influencing the results:

# not all methods, classes shown
def effective_date(dr):
    df = pd.DataFrame(dr, index=dr, columns=['Date'])
    df['BMonthEnd'] = df.Date.apply(h.last_business_day)
    df['MonthEnd'] = df.Date.apply(h.month_end)
    df['EffectiveDate'] = df.Date
    # df.EffectiveDate[~(df.Date<df.BMonthEnd)] = df.MonthEnd
    df.loc[~(df.Date<df.BMonthEnd),'EffectiveDate'] = df.MonthEnd
    return df.EffectiveDate

Have Updated it with Jeff's suggestion. See now why chain indexing can get you into trouble. Have done a few timeits and they seem to be faster, but when assigning to the dataframe .loc is the better option.

like image 158
Joop Avatar answered Oct 15 '22 03:10

Joop