Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Timegrouper part of pandas [duplicate]

I have a use case where:

Data is of the form: Col1, Col2, Col3 and Timestamp.

Now, I just want to get the counts of the rows vs Timestamp Bins.

i.e. for every half hour bucket (even the ones which have no correponding rows), I need the counts of how many rows are there.

Timestamps are spread over a one year period, so I can't divide it into 24 buckets.

I have to bin them at 30 minutes interval.

like image 820
david nadal Avatar asked Mar 26 '18 04:03

david nadal


People also ask

How do I avoid duplicates in pandas?

Remove All Duplicate Rows from Pandas DataFrame You can set 'keep=False' in the drop_duplicates() function to remove all the duplicate rows. For E.x, df. drop_duplicates(keep=False) .

How do I find duplicates in panda?

By using 'last', the last occurrence of each set of duplicated values is set on False and all others on True. By setting keep on False, all duplicates are True. To find duplicates on specific column(s), use subset .

How do I get a list of all the duplicate items using pandas in Python?

Pandas DataFrame. duplicated() function is used to get/find/select a list of all duplicate rows(all or selected columns) from pandas. Duplicate rows means, having multiple rows on all columns. Using this method you can get duplicate rows on selected multiple columns or all columns.


1 Answers

groupby via pd.Grouper

# optionally, if needed
# df['Timestamp'] = pd.to_datetime(df['Timestamp'], errors='coerce')  
df.groupby(pd.Grouper(key='Timestamp', freq='30min')).count()

resample

df.set_index('Timestamp').resample('30min').count()
like image 132
cs95 Avatar answered Sep 26 '22 23:09

cs95