Splitting dataframe into multiple dataframes

Tags:

I have a very large dataframe (around 1 million rows) with data from an experiment (60 respondents).

I would like to split the dataframe into 60 dataframes (a dataframe for each participant).

In the dataframe, data, there is a variable called 'name', which is the unique code for each participant.

I have tried the following, but nothing happens (or execution does not stop within an hour). What I intend to do is to split the data into smaller dataframes, and append these to a list (datalist):

import pandas as pd  def splitframe(data, name='name'):          n = data[name][0]      df = pd.DataFrame(columns=data.columns)      datalist = []      for i in range(len(data)):         if data[name][i] == n:             df = df.append(data.iloc[i])         else:             datalist.append(df)             df = pd.DataFrame(columns=data.columns)             n = data[name][i]             df = df.append(data.iloc[i])              return datalist

I do not get an error message, the script just seems to run forever!

Is there a smart way to do it?

567

asked Nov 05 '13 14:11

Martin Petri Bagger

Video Answer

2 Answers

Can I ask why not just do it by slicing the data frame. Something like

#create some data with Names column data = pd.DataFrame({'Names': ['Joe', 'John', 'Jasper', 'Jez'] *4, 'Ob1' : np.random.rand(16), 'Ob2' : np.random.rand(16)})  #create unique list of names UniqueNames = data.Names.unique()  #create a data frame dictionary to store your data frames DataFrameDict = {elem : pd.DataFrame for elem in UniqueNames}  for key in DataFrameDict.keys():     DataFrameDict[key] = data[:][data.Names == key]

Hey presto you have a dictionary of data frames just as (I think) you want them. Need to access one? Just enter

DataFrameDict['Joe']

Hope that helps

194

answered Sep 21 '22 09:09

Woody Pride

Firstly your approach is inefficient because the appending to the list on a row by basis will be slow as it has to periodically grow the list when there is insufficient space for the new entry, list comprehensions are better in this respect as the size is determined up front and allocated once.

However, I think fundamentally your approach is a little wasteful as you have a dataframe already so why create a new one for each of these users?

I would sort the dataframe by column 'name', set the index to be this and if required not drop the column.

Then generate a list of all the unique entries and then you can perform a lookup using these entries and crucially if you only querying the data, use the selection criteria to return a view on the dataframe without incurring a costly data copy.

Use pandas.DataFrame.sort_values and pandas.DataFrame.set_index:

# sort the dataframe df.sort_values(by='name', axis=1, inplace=True)  # set the index to be this and don't drop df.set_index(keys=['name'], drop=False,inplace=True)  # get a list of names names=df['name'].unique().tolist()  # now we can perform a lookup on a 'view' of the dataframe joe = df.loc[df.name=='joe']  # now you can query all 'joes'

answered Sep 20 '22 09:09

EdChum

Related questions
                            
                                Python CSV error: line contains NULL byte
                            
                                Why does Popen.communicate() return b'hi\n' instead of 'hi'?
                            
                                Get row-index values of Pandas DataFrame as list? [duplicate]
                            
                                Python pickle error: UnicodeDecodeError
                            
                                Where is my Django installation?
                            
                                Assert that a method was called in a Python unit test
                            
                                How to change fonts in matplotlib (python)?
                            
                                How do I close a tkinter window?
                            
                                How do I test if int value exists in Python Enum without using try/catch?
                            
                                Display rows with one or more NaN values in pandas dataframe
                            
                                Why does `True == False is False` evaluate to False? [duplicate]
                            
                                Elegant way to check if a nested key exists in a dict?
                            
                                Two way/reverse map [duplicate]
                            
                                Analyze audio using Fast Fourier Transform
                            
                                matplotlib colorbar for scatter
                            
                                Can I remove script tags with BeautifulSoup?
                            
                                Salt and hash a password in Python
                            
                                How can I dynamically create derived classes from a base class
                            
                                Find integer index of rows with NaN in pandas dataframe
                            
                                filter items in a python dictionary where keys contain a specific string

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Splitting dataframe into multiple dataframes

Tags:

python

split

pandas

dataframe