I have a gigantic dataframe with a datetime type column called <code>dt</code>, the data frame is sorted based on <code>dt</code> already. I want to split the dataframe into several dataframes based on <code>dt</code>, each dataframe contains rows within <code>1 hr</code> range. Split <pre class="prettyprint"><code> dt text 0 20160811 11:05 a 1 20160811 11:35 b 2 20160811 12:03 c 3 20160811 12:36 d 4 20160811 12:52 e 5 20160811 14:32 f </code></pre> into <pre class="prettyprint"><code> dt text 0 20160811 11:05 a 1 20160811 11:35 b 2 20160811 12:03 c dt text 0 20160811 12:36 d 1 20160811 12:52 e dt text 0 20160811 14:32 f </code></pre>

You need <code>groupby</code> by difference of first value of column <code>dt</code> converted to <code>hour</code> by <code>astype</code>: <pre class="prettyprint"><code>S = pd.to_datetime(df.dt) for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')]): print (g.reset_index(drop=True)) dt text 0 20160811 11:05 a 1 20160811 11:35 b 2 20160811 12:03 c dt text 0 20160811 12:36 d 1 20160811 12:52 e dt text 0 20160811 14:32 f </code></pre> <code>List comprehension</code> solution: <pre class="prettyprint"><code>S = pd.to_datetime(df.dt) print ((S - S[0]).astype('timedelta64[h]')) 0 0.0 1 0.0 2 0.0 3 1.0 4 1.0 5 3.0 Name: dt, dtype: float64 L = [g.reset_index(drop=True) for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')])] print (L[0]) dt text 0 20160811 11:05 a 1 20160811 11:35 b 2 20160811 12:03 c print (L[1]) dt text 0 20160811 12:36 d 1 20160811 12:52 e print (L[2]) dt text 0 20160811 14:32 f </code></pre> <hr> Old solution, which split by <code>hour</code>: You can use <code>groupby</code> by <code>dt.hour</code>, but first need convert <code>dt</code> <code>to_datetime</code>: <pre class="prettyprint"><code>for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour]): print (g.reset_index(drop=True)) dt text 0 20160811 11:05 a 1 20160811 11:35 b dt text 0 20160811 12:03 c 1 20160811 12:36 d 2 20160811 12:52 e dt text 0 20160811 14:32 f </code></pre> <code>List comprehension</code> solution: <pre class="prettyprint"><code>L = [g.reset_index(drop=True) for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour])] print (L[0]) dt text 0 20160811 11:05 a 1 20160811 11:35 b print (L[1]) dt text 0 20160811 12:03 c 1 20160811 12:36 d 2 20160811 12:52 e print (L[2]) dt text 0 20160811 14:32 f </code></pre> <hr> Or use <code>list comprehension</code> with converting column <code>dt</code> to <code>datetime</code>: <pre class="prettyprint"><code>df.dt = pd.to_datetime(df.dt) L =[g.reset_index(drop=True) for i, g in df.groupby([df['dt'].dt.hour])] print (L[1]) dt text 0 2016-08-11 12:03:00 c 1 2016-08-11 12:36:00 d 2 2016-08-11 12:52:00 e print (L[2]) dt text 0 2016-08-11 14:32:00 f </code></pre> <hr> If need split by <code>date</code>s and <code>hour</code>s: <pre class="prettyprint"><code>#changed dataframe for testing print (df) dt text 0 20160811 11:05 a 1 20160812 11:35 b 2 20160813 12:03 c 3 20160811 12:36 d 4 20160811 12:52 e 5 20160811 14:32 f serie = pd.to_datetime(df.dt) for i, g in df.groupby([serie.dt.date, serie.dt.hour]): print (g.reset_index(drop=True)) dt text 0 20160811 11:05 a dt text 0 20160811 12:36 d 1 20160811 12:52 e dt text 0 20160811 14:32 f dt text 0 20160812 11:35 b dt text 0 20160813 12:03 c </code></pre>

Pandas how to split dataframe by column by interval

I have a gigantic dataframe with a datetime type column called dt, the data frame is sorted based on dt already. I want to split the dataframe into several dataframes based on dt, each dataframe contains rows within 1 hr range.

Split

   dt                    text
0  20160811 11:05        a
1  20160811 11:35        b
2  20160811 12:03        c
3  20160811 12:36        d
4  20160811 12:52        e
5  20160811 14:32        f

into

   dt                    text
0  20160811 11:05        a
1  20160811 11:35        b
2  20160811 12:03        c

   dt                    text
0  20160811 12:36        d
1  20160811 12:52        e

   dt                    text 
0  20160811 14:32        f

How do I split a column with multiple values in pandas?

Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.

How do you do column slicing in pandas?

To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.

How do you split the pandas series?

split() function. The str. split() function is used to split strings around given separator/delimiter. The function splits the string in the Series/Index from the beginning, at the specified delimiter string.

How do you split data frame values?

In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.

You need groupby by difference of first value of column dt converted to hour by astype:

S = pd.to_datetime(df.dt)
for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')]):
        print (g.reset_index(drop=True))

               dt text
0  20160811 11:05    a
1  20160811 11:35    b
2  20160811 12:03    c
               dt text
0  20160811 12:36    d
1  20160811 12:52    e
               dt text
0  20160811 14:32    f

List comprehension solution:

S = pd.to_datetime(df.dt)

print ((S - S[0]).astype('timedelta64[h]'))
0    0.0
1    0.0
2    0.0
3    1.0
4    1.0
5    3.0
Name: dt, dtype: float64

L = [g.reset_index(drop=True) for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')])]

print (L[0])
               dt text
0  20160811 11:05    a
1  20160811 11:35    b
2  20160811 12:03    c

print (L[1])
               dt text
0  20160811 12:36    d
1  20160811 12:52    e

print (L[2])
               dt text
0  20160811 14:32    f

Old solution, which split by hour:

You can use groupby by dt.hour, but first need convert dt to_datetime:

for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour]):
    print (g.reset_index(drop=True))

               dt text
0  20160811 11:05    a
1  20160811 11:35    b
               dt text
0  20160811 12:03    c
1  20160811 12:36    d
2  20160811 12:52    e
               dt text
0  20160811 14:32    f

List comprehension solution:

L = [g.reset_index(drop=True) for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour])]

print (L[0])
               dt text
0  20160811 11:05    a
1  20160811 11:35    b

print (L[1])
               dt text
0  20160811 12:03    c
1  20160811 12:36    d
2  20160811 12:52    e

print (L[2])
               dt text
0  20160811 14:32    f

Or use list comprehension with converting column dt to datetime:

df.dt = pd.to_datetime(df.dt)
L =[g.reset_index(drop=True) for i, g in df.groupby([df['dt'].dt.hour])]

print (L[1])
                   dt text
0 2016-08-11 12:03:00    c
1 2016-08-11 12:36:00    d
2 2016-08-11 12:52:00    e

print (L[2])
                   dt text
0 2016-08-11 14:32:00    f

If need split by dates and hours:

#changed dataframe for testing
print (df)
               dt text
0  20160811 11:05    a
1  20160812 11:35    b
2  20160813 12:03    c
3  20160811 12:36    d
4  20160811 12:52    e
5  20160811 14:32    f

serie = pd.to_datetime(df.dt)
for i, g in df.groupby([serie.dt.date, serie.dt.hour]):
    print (g.reset_index(drop=True))
               dt text
0  20160811 11:05    a
               dt text
0  20160811 12:36    d
1  20160811 12:52    e
               dt text
0  20160811 14:32    f
               dt text
0  20160812 11:35    b
               dt text
0  20160813 12:03    c

Pandas how to split dataframe by column by interval

Tags:

python

pandas

numpy

python-2.7

scipy

9blue

People also ask

1 Answers

jezrael

Recent Activity

Donate For Us

Pandas how to split dataframe by column by interval

Tags:

python

pandas

numpy

python-2.7

scipy

9blue

People also ask

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us