"Sandwich" values in a pandas dataframe column?

Question

I have a [1008961 rows x 8 columns] pandas dataframe looking like this:

         Position  Price  Side  Size                 time   init       dt best_pricejump
0               1   3542     1   300  1495087206897454000   True    0.000            NaN
1               2   3541     1   484  1495087206906657000   True    9.203            NaN
2               3   3540     1   423  1495087206914836000   True    8.179            NaN
3               4   3539     1   599  1495087206922854000   True    8.018            NaN
4               5   3539     1   599  1495087206930944000   True    8.018            NaN

and a list containing certain slices I am looking at:

[slice(0, 5, None), slice(9, 35, None), slice(39, 131, None), slice(135, 141, None),...]

How can I "sandwhich" the values of column time efficiently, such that every time value of my sliced dataframe is similar to the last value of the slice?

The example above would be:

         Position  Price  Side  Size                 time   init       dt best_pricejump
0               1   3542     1   300  1495087206930944000   True    0.000            NaN
1               2   3541     1   484  1495087206930944000   True    9.203            NaN
2               3   3540     1   423  1495087206930944000   True    8.179            NaN
3               4   3539     1   599  1495087206930944000   True    8.018            NaN
4               5   3539     1   599  1495087206930944000   True    8.018            NaN

I have a solution but it's terribly slow (it takes literally 14 minutes). Are there faster ways?

for slc in list_of_slices:
    df["time"][slc] = (df["time"][slc]).iloc[-1]

jezrael · Accepted Answer

You can try iloc with iat for get scalar by position:

#get position of column time
loc = df.columns.get_loc("time")
for slc in list_of_slices:
    df.iloc[slc, loc] = df["time"].iat[slc.stop-1]

Simon · Answer

You can try a join operation instead of a loop. Although I can't personally see how you would escape an initial loop.I start by looping through the slicer and assigning a group to each slicer range. Then getting a dataframe (map_df) of just the starting time-group. Then joining it back on. I'm actually not sure if this is faster. It depends on how long it takes to slice in every loop, as opposed to assign a value. Maybe you can try it and let me know one way or another?

df['G'] = np.nan
for n, k in enumerate(slicr):
    df.ix[df.ix[list(range(k.start,k.stop)),'position'],'G'] = n

map_df = df.ix[[k.start for k in slicr], ['G', 'time']]
new_df = pd.merge(df, map_df, on='G', how='left')

"Sandwich" values in a pandas dataframe column?

Tags:

python

pandas

dataframe

Hekri

2 Answers

jezrael

Simon

Recent Activity

Donate For Us

"Sandwich" values in a pandas dataframe column?

Tags:

python

pandas

dataframe

Hekri

2 Answers

jezrael

Simon

Related questions

Recent Activity

Donate For Us