Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting dataframe column into equal windows in Pandas

I have a dataframe like the following and I intend to extract windows with size = 30 and then write for loop for each block of data and call other functions.

index = pd.date_range(start='2016-01-01', end='2016-04-01', freq='D')
data = pd.DataFrame(np.random.rand(len(index)), index = index, columns=['random'])

I found the following function, but I wonder if there is more efficient way to do so.

def split(df, chunkSize = 30): 
    listOfDf = list()
    numberChunks = len(df) // chunkSize + 1
    for i in range(numberChunks):
        listOfDf.append(df[i*chunkSize:(i+1)*chunkSize])
    return listOfDf 
like image 335
mk_sch Avatar asked Jul 25 '17 12:07

mk_sch


People also ask

How do I split a column in a DataFrame Pandas?

Use underscore as delimiter to split the column into two columns. # Adding two new columns to the existing dataframe. # splitting is done on the basis of underscore.

How do I split a column into multiple columns in Pandas?

Let us first create a simple Pandas data frame using Pandas' DataFrame function. We can use Pandas' str. split function to split the column of interest. Here we want to split the column “Name” and we can select the column using chain operation and split the column with expand=True option.

How do you split a DataFrame into multiple data frames?

Here, we use the DataFrame. groupby() method for splitting the dataset by rows. The same grouped rows are taken as a single element and stored in a list. This list is the required output which consists of small DataFrames.


2 Answers

You can use list comprehension. See this SO Post about how access dfs and another way to break up a dataframe.

n = 200000  #chunk row size
list_df = [df[i:i+n] for i in range(0,df.shape[0],n)]
like image 182
Scott Boston Avatar answered Nov 04 '22 00:11

Scott Boston


You can do it efficiently with NumPy's array_split like:

import numpy as np

def split(df, chunkSize = 30):
    numberChunks = len(df) // chunkSize + 1
    return np.array_split(df, numberChunks, axis=0)

Even though it is a NumPy function, it will return the split data frames with the correct indices and columns.

like image 42
jdehesa Avatar answered Nov 03 '22 22:11

jdehesa