I have a dataframe like the following and I intend to extract windows with size = 30
and then write for loop for each block of data and call other functions.
index = pd.date_range(start='2016-01-01', end='2016-04-01', freq='D')
data = pd.DataFrame(np.random.rand(len(index)), index = index, columns=['random'])
I found the following function, but I wonder if there is more efficient way to do so.
def split(df, chunkSize = 30):
listOfDf = list()
numberChunks = len(df) // chunkSize + 1
for i in range(numberChunks):
listOfDf.append(df[i*chunkSize:(i+1)*chunkSize])
return listOfDf
Use underscore as delimiter to split the column into two columns. # Adding two new columns to the existing dataframe. # splitting is done on the basis of underscore.
Let us first create a simple Pandas data frame using Pandas' DataFrame function. We can use Pandas' str. split function to split the column of interest. Here we want to split the column “Name” and we can select the column using chain operation and split the column with expand=True option.
Here, we use the DataFrame. groupby() method for splitting the dataset by rows. The same grouped rows are taken as a single element and stored in a list. This list is the required output which consists of small DataFrames.
You can use list comprehension. See this SO Post about how access dfs and another way to break up a dataframe.
n = 200000 #chunk row size
list_df = [df[i:i+n] for i in range(0,df.shape[0],n)]
You can do it efficiently with NumPy's array_split
like:
import numpy as np
def split(df, chunkSize = 30):
numberChunks = len(df) // chunkSize + 1
return np.array_split(df, numberChunks, axis=0)
Even though it is a NumPy function, it will return the split data frames with the correct indices and columns.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With