I am new on pandas and for now i don't get how to arrange my time serie, take a look at it : <pre class="prettyprint"><code>date & time of connection 19/06/2017 12:39 19/06/2017 12:40 19/06/2017 13:11 20/06/2017 12:02 20/06/2017 12:04 21/06/2017 09:32 21/06/2017 18:23 21/06/2017 18:51 21/06/2017 19:08 21/06/2017 19:50 22/06/2017 13:22 22/06/2017 13:41 22/06/2017 18:01 23/06/2017 16:18 23/06/2017 17:00 23/06/2017 19:25 23/06/2017 20:58 23/06/2017 21:03 23/06/2017 21:05 </code></pre> This is a sample of a dataset of 130 k raws,I tried : <code>df.groupby('date & time of connection')['date & time of connection'].apply(list)</code> Not enough i guess I think i should : <ul> <li>Create a dictionnary with index from dd/mm/yyyy to dd/mm/yyyy </li> <li>Convert "date & time of connection" type dateTime to Date</li> <li>Group and count Date of "date & time of connection"</li> <li>Put the numbers i count inside the dictionary ?</li> </ul> What do you think about my logic ? Do you know some tutos ? Thank you very much

To make sure your columns in in date format. <pre class="prettyprint"><code>df['date & time of connection']=pd.to_datetime(df['date & time of connection']) </code></pre> Then you can group the data by date and do a count: <pre class="prettyprint"><code>df.groupby(by=df['date & time of connection'].dt.date).count() Out[10]: date & time of connection date & time of connection 2017-06-19 3 2017-06-20 2 2017-06-21 5 2017-06-22 3 2017-06-23 6 </code></pre>

Python & Pandas - Group by day and count for each day

Tags:

python

python-3.x

pandas

time-series

I am new on pandas and for now i don't get how to arrange my time serie, take a look at it :

date & time of connection 19/06/2017 12:39 19/06/2017 12:40 19/06/2017 13:11 20/06/2017 12:02 20/06/2017 12:04 21/06/2017 09:32 21/06/2017 18:23 21/06/2017 18:51 21/06/2017 19:08 21/06/2017 19:50 22/06/2017 13:22 22/06/2017 13:41 22/06/2017 18:01 23/06/2017 16:18 23/06/2017 17:00 23/06/2017 19:25 23/06/2017 20:58 23/06/2017 21:03 23/06/2017 21:05

This is a sample of a dataset of 130 k raws,I tried : df.groupby('date & time of connection')['date & time of connection'].apply(list)

Not enough i guess

I think i should :

Create a dictionnary with index from dd/mm/yyyy to dd/mm/yyyy
Convert "date & time of connection" type dateTime to Date
Group and count Date of "date & time of connection"
Put the numbers i count inside the dictionary ?

What do you think about my logic ? Do you know some tutos ? Thank you very much

777

asked Feb 24 '18 10:02

Erwan Pesle

2 Answers

You can use dt.floor for convert to dates and then value_counts or groupby with size:

df = (pd.to_datetime(df['date & time of connection'])        .dt.floor('d')        .value_counts()        .rename_axis('date')        .reset_index(name='count')) print (df)         date  count 0 2017-06-23      6 1 2017-06-21      5 2 2017-06-19      3 3 2017-06-22      3 4 2017-06-20      2

Or:

s = pd.to_datetime(df['date & time of connection']) df = s.groupby(s.dt.floor('d')).size().reset_index(name='count') print (df)   date & time of connection  count 0                2017-06-19      3 1                2017-06-20      2 2                2017-06-21      5 3                2017-06-22      3 4                2017-06-23      6

Timings:

np.random.seed(1542)  N = 220000 a = np.unique(np.random.randint(N, size=int(N/2))) df = pd.DataFrame(pd.date_range('2000-01-01', freq='37T', periods=N)).drop(a) df.columns = ['date & time of connection'] df['date & time of connection'] = df['date & time of connection'].dt.strftime('%d/%m/%Y %H:%M:%S') print (df.head())   In [193]: %%timeit      ...: df['date & time of connection']=pd.to_datetime(df['date & time of connection'])      ...: df1 = df.groupby(by=df['date & time of connection'].dt.date).count()      ...:  539 ms ± 45.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  In [194]: %%timeit      ...: df1 = (pd.to_datetime(df['date & time of connection'])      ...:        .dt.floor('d')      ...:        .value_counts()      ...:        .rename_axis('date')      ...:        .reset_index(name='count'))      ...:  12.4 ms ± 350 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  In [195]: %%timeit      ...: s = pd.to_datetime(df['date & time of connection'])      ...: df2 = s.groupby(s.dt.floor('d')).size().reset_index(name='count')      ...:  17.7 ms ± 140 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

117

answered Oct 09 '22 05:10

jezrael

To make sure your columns in in date format.

df['date & time of connection']=pd.to_datetime(df['date & time of connection'])

Then you can group the data by date and do a count:

df.groupby(by=df['date & time of connection'].dt.date).count() Out[10]:                             date & time of connection date & time of connection                            2017-06-19                                         3 2017-06-20                                         2 2017-06-21                                         5 2017-06-22                                         3 2017-06-23                                         6

answered Oct 09 '22 06:10

Allen

Related questions
                            
                                Python - how to run multiple coroutines concurrently using asyncio?
                            
                                How to set/get pandas.DataFrame to/from Redis?
                            
                                Python requests: URL base in Session
                            
                                attribute 'tzinfo' of 'datetime.datetime' objects is not writable
                            
                                Building lxml for Python 2.7 on Windows
                            
                                Only index needed: enumerate or (x)range?
                            
                                how to initialize time() object in python
                            
                                How can a shell function know if it is running within a virtualenv?
                            
                                Cache entry deserialization failed, entry ignored
                            
                                Iterating over arrays in Python 3
                            
                                Django: "TypeError: [] is not JSON serializable" Why?
                            
                                Reading binary data from stdin
                            
                                Python reverse-stride slicing
                            
                                How to check if the current time is in range in python?
                            
                                How to write python lambda with multiple lines? [duplicate]
                            
                                ImportError: No module named flask.ext.login
                            
                                drop_all() freezes in Flask with SQLAlchemy
                            
                                Pandas Select DataFrame columns using boolean
                            
                                Proxy awareness with pip
                            
                                Flask-Session extension vs default session

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With