I have the following dataframe structure that is indexed with a timestamp: <pre class="prettyprint"><code> neg neu norm pol pos date time 1520353341 0.000 1.000 0.0000 0.000000 0.000 1520353342 0.121 0.879 -0.2960 0.347851 0.000 1520353342 0.217 0.783 -0.6124 0.465833 0.000 </code></pre> I create a date from the timestamp: <pre class="prettyprint"><code>data_frame['date'] = [datetime.datetime.fromtimestamp(d) for d in data_frame.time] </code></pre> Result: <pre class="prettyprint"><code> neg neu norm pol pos date time 1520353341 0.000 1.000 0.0000 0.000000 0.000 2018-03-06 10:22:21 1520353342 0.121 0.879 -0.2960 0.347851 0.000 2018-03-06 10:22:22 1520353342 0.217 0.783 -0.6124 0.465833 0.000 2018-03-06 10:22:22 </code></pre> I want to group by hour, while getting the mean for all the values, except the timestamp, that should be the hour from where the group started. So this is the result I want to archive: <pre class="prettyprint"><code> neg neu norm pol pos time 1520352000 0.027989 0.893233 0.122535 0.221079 0.078779 1520355600 0.028861 0.899321 0.103698 0.209353 0.071811 </code></pre> The closest I have gotten so far has been with this answer: <pre class="prettyprint"><code>data = data.groupby(data.date.dt.hour).mean() </code></pre> Results: <pre class="prettyprint"><code> neg neu norm pol pos date 0 0.027989 0.893233 0.122535 0.221079 0.078779 1 0.028861 0.899321 0.103698 0.209353 0.071811 </code></pre> But I cant figure out how to keep the timestamp that takes in account he hour where the grouby started.

Did you try creating an hour column by: <pre class="prettyprint"><code>data_frame['hour'] = data_frame.date.dt.hour </code></pre> Then grouping by hour like: <pre class="prettyprint"><code>data = data.groupby(data.hour).mean() </code></pre>

How to group dataframe by hour using timestamp with Pandas

Tags:

python

timestamp

pandas

dataframe

pandas-groupby

I have the following dataframe structure that is indexed with a timestamp:

    neg neu norm    pol pos date
time                        
1520353341  0.000   1.000   0.0000  0.000000    0.000   
1520353342  0.121   0.879   -0.2960 0.347851    0.000   
1520353342  0.217   0.783   -0.6124 0.465833    0.000

I create a date from the timestamp:

data_frame['date'] = [datetime.datetime.fromtimestamp(d) for d in data_frame.time]

Result:

    neg neu norm    pol pos date
time                        
1520353341  0.000   1.000   0.0000  0.000000    0.000   2018-03-06 10:22:21
1520353342  0.121   0.879   -0.2960 0.347851    0.000   2018-03-06 10:22:22
1520353342  0.217   0.783   -0.6124 0.465833    0.000   2018-03-06 10:22:22

I want to group by hour, while getting the mean for all the values, except the timestamp, that should be the hour from where the group started. So this is the result I want to archive:

    neg neu norm    pol pos
time                    
1520352000  0.027989    0.893233    0.122535    0.221079    0.078779
1520355600  0.028861    0.899321    0.103698    0.209353    0.071811

The closest I have gotten so far has been with this answer:

data = data.groupby(data.date.dt.hour).mean()

Results:

    neg neu norm    pol pos
date                    
0   0.027989    0.893233    0.122535    0.221079    0.078779
1   0.028861    0.899321    0.103698    0.209353    0.071811

But I cant figure out how to keep the timestamp that takes in account he hour where the grouby started.

408

asked Mar 07 '18 16:03

Franco

2 Answers

I came across this gem, pd.DataFrame.resample, after I posted my round-to-hour solution.

# Construct example dataframe
times = pd.date_range('1/1/2018', periods=5, freq='25min')
values = [4,8,3,4,1]
df = pd.DataFrame({'val':values}, index=times)

# Resample by hour and calculate medians
df.resample('H').median()

Or you can use groupby with Grouper if you don't want times as index:

df = pd.DataFrame({'val':values, 'times':times})
df.groupby(pd.Grouper(level='times', freq='H')).median()

130

answered Nov 15 '22 09:11

Jordi

Did you try creating an hour column by:

data_frame['hour'] = data_frame.date.dt.hour

Then grouping by hour like:

data = data.groupby(data.hour).mean()

answered Nov 15 '22 10:11

Connor John

Related questions
                            
                                How to find and leave only doubles in list python?
                            
                                how to find height and width of image for FileField Django
                            
                                How to extract h1 tag text with beautifulsoup
                            
                                Python Pandas dataframe subtract cumulative column
                            
                                How can I sum the product of two list items using for loop in python?
                            
                                Creating empty lists with the name of the elements of another list
                            
                                Django, Python inheritance: Exclude some fields from superclass
                            
                                Why is keras only doing 10 epochs when I set it to 300?
                            
                                Print statements not working when serve_forever() is called?
                            
                                Mapping string categories to numbers using pandas and numpy
                            
                                Round float to 2 digits after dot in python
                            
                                Combine numbers from two columns to create one array
                            
                                How to add "array of strings" as a schema value for BigQuery
                            
                                Loop through dataframe one by one (pandas)
                            
                                QtDesigner changes will be lost after redesign User Interface
                            
                                Take every nth row from a file with groups and n is a given in a column
                            
                                Generate random locations within a triangular domain
                            
                                Get path from firestore.DocumentRefence
                            
                                Replace a string numpy array with a number
                            
                                Split on more than one space?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With