I have time series data that has duplicate timestamp indexes but I would only like to drop a single row based on the integer location. For example if I have the following :
import numpy as np
import pandas as pd
dates = pd.to_datetime(["2015-10-22 09:40:00","2015-10-22 09:40:00","2015-10-22 09:40:00","2015-10-22 09:50:00","2015-10-22 10:00:00"])
data_rand = np.random.rand(len(dates),3)
col_head = ['A','B','C']
df = pd.DataFrame(data=data_rand, index=dates, columns=col_head)
print(df)
rowindex = 1
df.drop(df.index[rowindex], inplace=True)
#df.drop(df.index.iloc[[rowindex]], inplace=True)
print(df)
The data outputs a dataframe that looks like:
A B C
2015-10-22 09:40:00 0.755642 0.797471 0.366410
2015-10-22 09:40:00 0.475411 0.629229 0.733368
2015-10-22 09:40:00 0.003278 0.461901 0.184833
2015-10-22 09:50:00 0.803465 0.218510 0.864337
2015-10-22 10:00:00 0.153356 0.950724 0.249950
Now if I wanted to remove the second row I would use the drop function but because there are two other labels with the exact same index all three would be dropped. Is there a way to only drop the middle of the three duplicate time stamps? I would prefer to do this without resetting the index.
What I would want the data to look like is this:
A B C
2015-10-22 09:40:00 0.755642 0.797471 0.366410
2015-10-22 09:40:00 0.003278 0.461901 0.184833
2015-10-22 09:50:00 0.803465 0.218510 0.864337
2015-10-22 10:00:00 0.153356 0.950724 0.249950
You could use iloc
or loc
like
In [5055]: idx = np.ones(len(df.index), dtype=bool)
In [5057]: idx[rowindex] = False
In [5058]: df.iloc[idx] # or df.loc[idx]
Out[5058]:
A B C
2015-10-22 09:40:00 0.704959 0.995358 0.355915
2015-10-22 09:40:00 0.151127 0.398876 0.240856
2015-10-22 09:50:00 0.343456 0.513128 0.666625
2015-10-22 10:00:00 0.105908 0.130895 0.321981
Details
In [5059]: df
Out[5059]:
A B C
2015-10-22 09:40:00 0.704959 0.995358 0.355915
2015-10-22 09:40:00 0.762548 0.593177 0.691702
2015-10-22 09:40:00 0.151127 0.398876 0.240856
2015-10-22 09:50:00 0.343456 0.513128 0.666625
2015-10-22 10:00:00 0.105908 0.130895 0.321981
Using np.arange and iloc to select rows other than rowindex
. Much similar to dropping rowindex
i.e (I suggest @Zero's answer in case of thinking of dropping multiple row indices)
rowindex = 2
ndf = df.iloc[~(np.arange(df.shape[0]) == rowindex )]
Output:
A B C 2015-10-22 09:40:00 0.568431 0.302549 0.497309 2015-10-22 09:40:00 0.683263 0.916699 0.108929 2015-10-22 09:50:00 0.751543 0.480892 0.797728 2015-10-22 10:00:00 0.282703 0.433418 0.009757
df A B C 2015-10-22 09:40:00 0.568431 0.302549 0.497309 2015-10-22 09:40:00 0.683263 0.916699 0.108929 2015-10-22 09:40:00 0.495492 0.232836 0.436861 2015-10-22 09:50:00 0.751543 0.480892 0.797728 2015-10-22 10:00:00 0.282703 0.433418 0.009757
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With