I have a long time series, eg. <pre class="prettyprint"><code>import pandas as pd index=pd.date_range(start='2012-11-05', end='2012-11-10', freq='1S').tz_localize('Europe/Berlin') df=pd.DataFrame(range(len(index)), index=index, columns=['Number']) </code></pre> Now I want to extract all sub-DataFrames for each day, to get the following output: <pre class="prettyprint"><code>df_2012-11-05: data frame with all data referring to day 2012-11-05 df_2012-11-06: etc. df_2012-11-07 df_2012-11-08 df_2012-11-09 df_2012-11-10 </code></pre> What is the most effective way to do this avoiding to check if the index.date==give_date which is very slow. Also, the user does not know a priory the range of days in the frame. Any hint do do this with an iterator? My current solution is this, but it is not so elegant and has two issues defined below: <pre class="prettyprint"><code>time_zone='Europe/Berlin' # find all days a=np.unique(df.index.date) # this can take a lot of time a.sort() results=[] for i in range(len(a)-1): day_now=pd.Timestamp(a[i]).tz_localize(time_zone) day_next=pd.Timestamp(a[i+1]).tz_localize(time_zone) results.append(df[day_now:day_next]) # how to select if I do not want day_next included? # last day results.append(df[day_next:]) </code></pre> This approach has the following problems: <ul> <li>a=np.unique(df.index.date) can take a lot of time</li> <li>df[day_now:day_next] includes the day_next, but I need to exclude it in the range</li> </ul>

If you want to group by date (AKA: year+month+day), then use <code>df.index.date</code>: <pre class="prettyprint"><code>result = [group[1] for group in df.groupby(df.index.date)] </code></pre> As <code>df.index.day</code> will use the day of the month (i.e.: from 1 to 31) for grouping, which could result in undesirable behavior if the input dataframe dates extend to multiple months.

How to split a pandas dataframe or series by day (possibly using an iterator)

I have a long time series, eg.

import pandas as pd
index=pd.date_range(start='2012-11-05', end='2012-11-10', freq='1S').tz_localize('Europe/Berlin')
df=pd.DataFrame(range(len(index)), index=index, columns=['Number'])

Now I want to extract all sub-DataFrames for each day, to get the following output:

df_2012-11-05: data frame with all data referring to day 2012-11-05
df_2012-11-06: etc.
df_2012-11-07
df_2012-11-08
df_2012-11-09
df_2012-11-10

What is the most effective way to do this avoiding to check if the index.date==give_date which is very slow. Also, the user does not know a priory the range of days in the frame.

Any hint do do this with an iterator?

My current solution is this, but it is not so elegant and has two issues defined below:

time_zone='Europe/Berlin'
# find all days
a=np.unique(df.index.date) # this can take a lot of time
a.sort()
results=[]
for i in range(len(a)-1):
    day_now=pd.Timestamp(a[i]).tz_localize(time_zone)
    day_next=pd.Timestamp(a[i+1]).tz_localize(time_zone)
    results.append(df[day_now:day_next]) # how to select if I do not want day_next included?

# last day
results.append(df[day_next:])

This approach has the following problems:

a=np.unique(df.index.date) can take a lot of time
df[day_now:day_next] includes the day_next, but I need to exclude it in the range

How do you split the pandas series?

split() function. The str. split() function is used to split strings around given separator/delimiter. The function splits the string in the Series/Index from the beginning, at the specified delimiter string.

What is the fastest way to iterate over pandas DataFrame?

Vectorization is always the first and best choice. You can convert the data frame to NumPy array or into dictionary format to speed up the iteration workflow. Iterating through the key-value pair of dictionaries comes out to be the fastest way with around 280x times speed up for 20 million records.

Can you iterate through a pandas series?

iteritems() function iterates over the given series object. the function iterates over the tuples containing the index labels and corresponding value in the series.

Can you slice a pandas DataFrame?

Slicing a DataFrame in Pandas includes the following steps:Ensure Python is installed (or install ActivePython) Import a dataset. Create a DataFrame. Slice the DataFrame.

If you want to group by date (AKA: year+month+day), then use df.index.date:

result = [group[1] for group in df.groupby(df.index.date)]

As df.index.day will use the day of the month (i.e.: from 1 to 31) for grouping, which could result in undesirable behavior if the input dataframe dates extend to multiple months.

Perhaps groupby?

DFList = []
for group in df.groupby(df.index.day):
    DFList.append(group[1])

Should give you a list of data frames where each data frame is one day of data.

Or in one line:

DFList = [group[1] for group in df.groupby(df.index.day)]

Gotta love python!

How to split a pandas dataframe or series by day (possibly using an iterator)

Tags:

python

indexing

pandas

time-series

Mannaggia

People also ask

2 Answers

Peque

Woody Pride

Recent Activity

Donate For Us

How to split a pandas dataframe or series by day (possibly using an iterator)

Tags:

python

indexing

pandas

time-series

Mannaggia

People also ask

2 Answers

Peque

Woody Pride

Related questions

Recent Activity

Donate For Us