I have multiple dataframes which can have the same timestamps ( also +-1second) that have milliseconds in them. So when they are all together in the new dataframe i want to filter out the rows where they are more than 1 second different from each other
Is there a function similar to dftogether['unique'] = np.ediff1d(dftogether['DateTime'] that works with timestamps?
My current solution works, but I am looking for a proper way to do it.
Let's say i have 3 dataframes, df1, df2 and df3. For each dataframe I do this:
df1['DateTime'] = df1['DateTime'].apply(lambda
x: x.strftime('%Y%d%m%H%M%S'))
df1['DateTime']= df1['DateTime'].astype(np.int64)
Which turns my DateTime into int so i can do this:
dftogether= pd.concat(z, sort=True)
dftogether= dftogether.sort_values('DateTime')
dftogether['unique'] = np.ediff1d(dftogether['DateTime'], to_begin=20181211150613411) <1
dftogether= dftogether[dftogether.unique == False]
And then I convert the int back to datetime
dftogether['DateTime'] = dftogether['DateTime'].apply(lambda x: pd.to_datetime(str(x), format='%Y%d%m%H%M%S'))
I couldn't figure out how to create sample data for the timestamps so i will just copypaste parts of the dataframe.
df1
737 2018-12-18 12:37:19.717
738 2018-12-18 12:37:21.936
739 2018-12-18 12:37:22.841
740 2018-12-18 12:37:23.144
877 2018-12-18 12:40:53.268
878 2018-12-18 12:40:56.597
879 2018-12-18 12:40:56.899
880 2018-12-18 12:40:57.300
968 2018-12-18 12:43:31.411
969 2018-12-18 12:43:36.150
970 2018-12-18 12:43:36.452
df2
691 2018-12-18 12:35:23.612
692 2018-12-18 12:35:25.627
788 2018-12-18 12:38:33.248
789 2018-12-18 12:38:33.553
790 2018-12-18 12:38:34.759
866 2018-12-18 12:40:29.487
867 2018-12-18 12:40:31.199
868 2018-12-18 12:40:32.206
df3
699 2018-12-18 12:35:42.452
701 2018-12-18 12:35:45.081
727 2018-12-18 12:36:47.466
730 2018-12-18 12:36:51.796
741 2018-12-18 12:37:23.448
881 2018-12-18 12:40:57.603
910 2018-12-18 12:42:02.904
971 2018-12-18 12:43:37.361
I want the dftogether to look like this but with timestamps instead of ints
Unique DateTime
737 False 20181812123719
738 False 20181812123721
739 False 20181812123722
741 False 20181812123723
742 True 20181812123723
740 True 20181812123723
785 False 20181812123830
786 False 20181812123831
787 False 20181812123832
787 True 20181812123832
788 False 20181812123833
so I can drop the ones where Unique == True
785 False 2018-12-18 12:38:30
786 False 2018-12-18 12:38:31
787 False 2018-12-18 12:38:32
788 False 2018-12-18 12:38:33
790 False 2018-12-18 12:38:34
812 False 2018-12-18 12:39:10
813 False 2018-12-18 12:39:11
Something else: Where can I voice my opinion on the new stackoverflow ask a question? IMO this is really awful, it keeps scrolling up, entering/copypasting code is really confusing now and all the For Example is really distracting. It took me more than 30 minutes to write this question
I joined your df1 and df2 to a df, and created a dates list like this:
df = pd.concat([df1,df2]).sort_values('DateTime').reset_index(drop=True)
date_list = [datetime.strptime(i, '%Y-%m-%d %H:%M:%S.%f') for i in df.DateTime.tolist()]
then I get the desired output with a 1 liner:
df[[x>1 for x in [0]+[(j-i).total_seconds() for i,j in zip(date_list, date_list[1:])]]]
To understand how it works, first check the output of:
[x>1 for x in [0]+[(j-i).total_seconds() for i,j in zip(date_list, date_list[1:])]]
Hope this helps. Cheers.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With