I have some financial data and want to get only the last transaction from a specific period of time (hours, days, months...).
Example:
>>df
time price_BRL qt time_dt
1312001297 23.49 1.00 2011-07-30 04:48:17
1312049148 23.40 1.00 2011-07-30 18:05:48
1312121523 23.49 2.00 2011-07-31 14:12:03
1312121523 23.50 6.50 2011-07-31 14:12:03
1312177622 23.40 2.00 2011-08-01 05:47:02
1312206416 23.25 1.00 2011-08-01 13:46:56
1312637929 18.95 1.50 2011-08-06 13:38:49
1312637929 18.95 4.00 2011-08-06 13:38:49
1312817114 0.80 0.01 2011-08-08 15:25:14
1312818289 0.10 0.01 2011-08-08 15:44:49
1312819795 6.00 0.09 2011-08-08 16:09:55
1312847064 16.00 0.86 2011-08-08 23:44:24
1312849282 16.00 6.14 2011-08-09 00:21:22
1312898146 19.90 1.00 2011-08-09 13:55:46
1312915666 6.00 0.01 2011-08-09 18:47:46
1312934897 19.90 1.00 2011-08-10 00:08:17
>>filter_by_last_day(df)
time price_BRL qt time_dt
1312049148 23.40 1.00 2011-07-30 18:05:48
1312121523 23.50 6.50 2011-07-31 14:12:03
1312206416 23.25 1.00 2011-08-01 13:46:56
1312637929 18.95 4.00 2011-08-06 13:38:49
1312847064 16.00 0.86 2011-08-08 23:44:24
1312915666 6.00 0.01 2011-08-09 18:47:46
1312934897 19.90 1.00 2011-08-10 00:08:17
I was thinking in using groupby()
and get the mean()
for that day (this solution is also possible for my problem, but not exactly) but don't know how to select the day like df.groupby('time.day').last()
You can use groupby
by dt.date
and aggregate by last
:
#if necessery convert to datetime
df.time_dt = pd.to_datetime(df.time_dt)
df = df.groupby(df.time_dt.dt.date).last().reset_index(drop=True)
print (df)
time price_BRL qt time_dt
0 1312049148 23.40 1.00 2011-07-30 18:05:48
1 1312121523 23.50 6.50 2011-07-31 14:12:03
2 1312206416 23.25 1.00 2011-08-01 13:46:56
3 1312637929 18.95 4.00 2011-08-06 13:38:49
4 1312847064 16.00 0.86 2011-08-08 23:44:24
5 1312915666 6.00 0.01 2011-08-09 18:47:46
6 1312934897 19.90 1.00 2011-08-10 00:08:17
Thank you MaxU for another solution - add parameter as_index=False
for return DataFrame
:
df = df.groupby(df.time_dt.dt.date, as_index=False).last()
print (df)
time price_BRL qt time_dt
0 1312049148 23.40 1.00 2011-07-30 18:05:48
1 1312121523 23.50 6.50 2011-07-31 14:12:03
2 1312206416 23.25 1.00 2011-08-01 13:46:56
3 1312637929 18.95 4.00 2011-08-06 13:38:49
4 1312847064 16.00 0.86 2011-08-08 23:44:24
5 1312915666 6.00 0.01 2011-08-09 18:47:46
6 1312934897 19.90 1.00 2011-08-10 00:08:17
Solution with resample
, but is necessery remove NaN
rows by dropna
:
df = df.resample('d', on='time_dt').last().dropna(how='all').reset_index(drop=True)
#cast column time to int
df.time = df.time.astype(int)
print (df)
time price_BRL qt time_dt
0 1312049148 23.40 1.00 2011-07-30 18:05:48
1 1312121523 23.50 6.50 2011-07-31 14:12:03
2 1312206416 23.25 1.00 2011-08-01 13:46:56
3 1312637929 18.95 4.00 2011-08-06 13:38:49
4 1312847064 16.00 0.86 2011-08-08 23:44:24
5 1312915666 6.00 0.01 2011-08-09 18:47:46
6 1312934897 19.90 1.00 2011-08-10 00:08:17
You can also use dt.month
:
df = df.groupby(df.time_dt.dt.month).last().reset_index(drop=True)
print (df)
time price_BRL qt time_dt
0 1312121523 23.5 6.5 2011-07-31 14:12:03
1 1312934897 19.9 1.0 2011-08-10 00:08:17
With hours
it is a bit complicated, if need groupby
by date
and hours
together, solution is replace minutes
and seconds
to 0
by astype
:
hours = df.time_dt.values.astype('<M8[h]')
print (hours)
['2011-07-30T04' '2011-07-30T18' '2011-07-31T14' '2011-07-31T14'
'2011-08-01T05' '2011-08-01T13' '2011-08-06T13' '2011-08-06T13'
'2011-08-08T15' '2011-08-08T15' '2011-08-08T16' '2011-08-08T23'
'2011-08-09T00' '2011-08-09T13' '2011-08-09T18' '2011-08-10T00']
df = df.groupby(hours).last().reset_index(drop=True)
print (df)
time price_BRL qt time_dt
0 1312001297 23.49 1.00 2011-07-30 04:48:17
1 1312049148 23.40 1.00 2011-07-30 18:05:48
2 1312121523 23.50 6.50 2011-07-31 14:12:03
3 1312177622 23.40 2.00 2011-08-01 05:47:02
4 1312206416 23.25 1.00 2011-08-01 13:46:56
5 1312637929 18.95 4.00 2011-08-06 13:38:49
6 1312818289 0.10 0.01 2011-08-08 15:44:49
7 1312819795 6.00 0.09 2011-08-08 16:09:55
8 1312847064 16.00 0.86 2011-08-08 23:44:24
9 1312849282 16.00 6.14 2011-08-09 00:21:22
10 1312898146 19.90 1.00 2011-08-09 13:55:46
11 1312915666 6.00 0.01 2011-08-09 18:47:46
12 1312934897 19.90 1.00 2011-08-10 00:08:17
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With