Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

"Sandwich" values in a pandas dataframe column?

I have a [1008961 rows x 8 columns] pandas dataframe looking like this:

         Position  Price  Side  Size                 time   init       dt best_pricejump
0               1   3542     1   300  1495087206897454000   True    0.000            NaN
1               2   3541     1   484  1495087206906657000   True    9.203            NaN
2               3   3540     1   423  1495087206914836000   True    8.179            NaN
3               4   3539     1   599  1495087206922854000   True    8.018            NaN
4               5   3539     1   599  1495087206930944000   True    8.018            NaN

and a list containing certain slices I am looking at:

[slice(0, 5, None), slice(9, 35, None), slice(39, 131, None), slice(135, 141, None),...]

How can I "sandwhich" the values of column time efficiently, such that every time value of my sliced dataframe is similar to the last value of the slice?

The example above would be:

         Position  Price  Side  Size                 time   init       dt best_pricejump
0               1   3542     1   300  1495087206930944000   True    0.000            NaN
1               2   3541     1   484  1495087206930944000   True    9.203            NaN
2               3   3540     1   423  1495087206930944000   True    8.179            NaN
3               4   3539     1   599  1495087206930944000   True    8.018            NaN
4               5   3539     1   599  1495087206930944000   True    8.018            NaN

I have a solution but it's terribly slow (it takes literally 14 minutes). Are there faster ways?

for slc in list_of_slices:
    df["time"][slc] = (df["time"][slc]).iloc[-1] 
like image 241
Hekri Avatar asked Jan 28 '26 06:01

Hekri


2 Answers

You can try iloc with iat for get scalar by position:

#get position of column time
loc = df.columns.get_loc("time")
for slc in list_of_slices:
    df.iloc[slc, loc] = df["time"].iat[slc.stop-1]
like image 79
jezrael Avatar answered Jan 30 '26 21:01

jezrael


You can try a join operation instead of a loop. Although I can't personally see how you would escape an initial loop.I start by looping through the slicer and assigning a group to each slicer range. Then getting a dataframe (map_df) of just the starting time-group. Then joining it back on. I'm actually not sure if this is faster. It depends on how long it takes to slice in every loop, as opposed to assign a value. Maybe you can try it and let me know one way or another?

df['G'] = np.nan
for n, k in enumerate(slicr):
    df.ix[df.ix[list(range(k.start,k.stop)),'position'],'G'] = n

map_df = df.ix[[k.start for k in slicr], ['G', 'time']]
new_df = pd.merge(df, map_df, on='G', how='left')
like image 45
Simon Avatar answered Jan 30 '26 22:01

Simon