I have a gigantic dataframe with a datetime type column called dt
, the data frame is sorted based on dt
already. I want to split the dataframe into several dataframes based on dt
, each dataframe contains rows within 1 hr
range.
Split
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
3 20160811 12:36 d
4 20160811 12:52 e
5 20160811 14:32 f
into
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
Split column by delimiter into multiple columnsApply the pandas series str. split() function on the “Address” column and pass the delimiter (comma in this case) on which you want to split the column. Also, make sure to pass True to the expand parameter.
To slice the columns, the syntax is df. loc[:,start:stop:step] ; where start is the name of the first column to take, stop is the name of the last column to take, and step as the number of indices to advance after each extraction; for example, you can select alternate columns.
split() function. The str. split() function is used to split strings around given separator/delimiter. The function splits the string in the Series/Index from the beginning, at the specified delimiter string.
In the above example, the data frame 'df' is split into 2 parts 'df1' and 'df2' on the basis of values of column 'Weight'. Method 2: Using Dataframe. groupby(). This method is used to split the data into groups based on some criteria.
You need groupby
by difference of first value of column dt
converted to hour
by astype
:
S = pd.to_datetime(df.dt)
for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
List comprehension
solution:
S = pd.to_datetime(df.dt)
print ((S - S[0]).astype('timedelta64[h]'))
0 0.0
1 0.0
2 0.0
3 1.0
4 1.0
5 3.0
Name: dt, dtype: float64
L = [g.reset_index(drop=True) for i, g in df.groupby([(S - S[0]).astype('timedelta64[h]')])]
print (L[0])
dt text
0 20160811 11:05 a
1 20160811 11:35 b
2 20160811 12:03 c
print (L[1])
dt text
0 20160811 12:36 d
1 20160811 12:52 e
print (L[2])
dt text
0 20160811 14:32 f
Old solution, which split by hour
:
You can use groupby
by dt.hour
, but first need convert dt
to_datetime
:
for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
1 20160811 11:35 b
dt text
0 20160811 12:03 c
1 20160811 12:36 d
2 20160811 12:52 e
dt text
0 20160811 14:32 f
List comprehension
solution:
L = [g.reset_index(drop=True) for i, g in df.groupby([pd.to_datetime(df.dt).dt.hour])]
print (L[0])
dt text
0 20160811 11:05 a
1 20160811 11:35 b
print (L[1])
dt text
0 20160811 12:03 c
1 20160811 12:36 d
2 20160811 12:52 e
print (L[2])
dt text
0 20160811 14:32 f
Or use list comprehension
with converting column dt
to datetime
:
df.dt = pd.to_datetime(df.dt)
L =[g.reset_index(drop=True) for i, g in df.groupby([df['dt'].dt.hour])]
print (L[1])
dt text
0 2016-08-11 12:03:00 c
1 2016-08-11 12:36:00 d
2 2016-08-11 12:52:00 e
print (L[2])
dt text
0 2016-08-11 14:32:00 f
If need split by date
s and hour
s:
#changed dataframe for testing
print (df)
dt text
0 20160811 11:05 a
1 20160812 11:35 b
2 20160813 12:03 c
3 20160811 12:36 d
4 20160811 12:52 e
5 20160811 14:32 f
serie = pd.to_datetime(df.dt)
for i, g in df.groupby([serie.dt.date, serie.dt.hour]):
print (g.reset_index(drop=True))
dt text
0 20160811 11:05 a
dt text
0 20160811 12:36 d
1 20160811 12:52 e
dt text
0 20160811 14:32 f
dt text
0 20160812 11:35 b
dt text
0 20160813 12:03 c
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With