Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to deal with this complex logic in python pandas?

I have some data like follow structure. It used in python pandas Data Frame and I named it df.

Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2016-04-29,00:40:35,2
2015-04-29,00:40:36,2
2015-04-29,00:40:43,2
2015-04-29,00:40:45,2
2015-04-29,00:40:55,1
2015-04-29,00:41:05,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
.....................
.....................
2016-11-29,11:52:36,2
2016-11-29,11:52:43,2
2016-11-29,11:52:45,2
2016-11-29,11:52:55,1

I want to get the data meet the following requirements.

  1. As you know the first data's timeseries is 2016-04-29,00:40:15. I want to get the next data in this dataframe larger than primer's data 18 secs. I'll get the second data : 2016-04-29,00:40:35,2 The third data is: 2015-04-29,00:40:55,1
  2. If the next data's flag is different from the primer's data.I will get this data regardless of whether it has passed 18 secs.

For the above two requirements, I 'll get the data as following:

Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2015-04-29,00:40:43,2
2015-04-29,00:40:55,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
.....................
like image 351
matcha latte Avatar asked Feb 25 '26 18:02

matcha latte


1 Answers

Here, try this:

df['Data2'] = pd.to_timedelta(df['Data2'])

tdf = df.copy()
sel_idx = []
while len(tdf) > 0:
    sel_idx.extend([tdf.index[0]])
    cond1 = tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'] + pd.to_timedelta(18, 's')
    cond2 = (tdf['Flag'] != tdf.loc[sel_idx[-1], 'Flag']) & (tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'])
    tdf = tdf[cond1 | cond2]
df.loc[sel_idx, :]

Test

Code:

import pandas as pd
from io import StringIO

data = StringIO("""Data1,Data2,Flag
2016-04-29,00:40:15,1
2016-04-29,00:40:24,2
2016-04-29,00:40:35,2
2015-04-29,00:40:36,2
2015-04-29,00:40:43,2
2015-04-29,00:40:45,2
2015-04-29,00:40:55,1
2015-04-29,00:41:05,1
2015-04-29,00:41:16,1
2015-04-29,00:41:17,2
2016-11-29,11:52:36,2
2016-11-29,11:52:43,2
2016-11-29,11:52:45,2
2016-11-29,11:52:55,1""")

df = pd.read_csv(data)
df['Data2'] = pd.to_timedelta(df['Data2'])
print("Input\n", df)

tdf = df.copy()
sel_idx = []
while len(tdf) > 0:
    sel_idx.extend([tdf.index[0]])
    cond1 = tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'] + pd.to_timedelta(18, 's')
    cond2 = (tdf['Flag'] != tdf.loc[sel_idx[-1], 'Flag']) & (tdf['Data2'] > tdf.loc[sel_idx[-1], 'Data2'])
    tdf = tdf[cond1 | cond2]
print("Ouput\n", df.loc[sel_idx, :])

Output:

Input
    Data1       Data2       Flag
0   2016-04-29  00:40:15    1
1   2016-04-29  00:40:24    2
2   2016-04-29  00:40:35    2
3   2015-04-29  00:40:36    2
4   2015-04-29  00:40:43    2
5   2015-04-29  00:40:45    2
6   2015-04-29  00:40:55    1
7   2015-04-29  00:41:05    1
8   2015-04-29  00:41:16    1
9   2015-04-29  00:41:17    2
10  2016-11-29  11:52:36    2
11  2016-11-29  11:52:43    2
12  2016-11-29  11:52:45    2
13  2016-11-29  11:52:55    1

Output
    Data1       Data2       Flag
0   2016-04-29  00:40:15    1
1   2016-04-29  00:40:24    2
4   2015-04-29  00:40:43    2
6   2015-04-29  00:40:55    1
8   2015-04-29  00:41:16    1
9   2015-04-29  00:41:17    2
10  2016-11-29  11:52:36    2
13  2016-11-29  11:52:55    1
like image 50
Kartik Avatar answered Feb 27 '26 07:02

Kartik



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!