I have Pandas DataFrame with nearly 3,000,000 rows. One of the columns is called TIMESTAMP
, and of the datetime64 type. The timestamp format is given below:
2015-03-31 22:56:45.510
My goal is calculating the number of days data were collected. My initial approach was simple:
(df.TIMESTAMP.max() - df.TIMESTAMP.min()).days
However, it occured to me this may not be always correct, since there is no guarantee data was collected everyday. Instead, I tried counting unique days in the timestamp series using map
and apply
, and both take a considerable amount of time for 3,000,000 rows:
%timeit len(df['TIMESTAMP'].map(lambda t: t.date()).unique())
1 loops, best of 3: 41.3 s per loop
%timeit len(df['TIMESTAMP'].apply(lambda t: t.date()).unique())
1 loops, best of 3: 42.3 s per loop
Is there a way to speed up this computation, or an entirely different but better approach?
Thanks!
To get the unique dates you should first normalize
(to get the time at midnight that day, note this is fast), then use unique
:
In [31]: df["Time"].dt.normalize().unique()
Out[31]:
array(['2014-12-31T16:00:00.000000000-0800',
'2015-01-01T16:00:00.000000000-0800',
'2015-01-02T16:00:00.000000000-0800',
'2015-01-04T16:00:00.000000000-0800',
'2015-01-05T16:00:00.000000000-0800'], dtype='datetime64[ns]')
Original answer (I misread question):
To get the counts could use normalize
and then use value_counts
:
In [11]: df
Out[11]:
Time
0 2015-01-01
1 2015-01-02
2 2015-01-03
3 2015-01-03
4 2015-01-05
5 2015-01-06
In [12]: df['Time'].dt.normalize().value_counts()
Out[12]:
2015-01-03 2
2015-01-06 1
2015-01-02 1
2015-01-05 1
2015-01-01 1
Name: Time, dtype: int64
but perhaps the cleaner option is to resample (though I'm not sure if this is less efficient):
In [21]: pd.Series(1, df['Time']).resample("D", how="sum")
Out[21]:
Time
2015-01-01 1
2015-01-02 1
2015-01-03 2
2015-01-04 NaN
2015-01-05 1
2015-01-06 1
Freq: D, dtype: float64
If your index is a DateTimeIndex, I think you can do something like this:
print(df.groupby(df.index.date).shape)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With