DataFrame of DataFrames in Python (Pandas)

Tags:

The idea here is that for every year, I am able to create three dataframes(df1, df2, df3), each containing different firms and stock prices('firm' and 'price' are the two columns in df1~df3). I would like to use another dataframe (named 'store' below) to store the three dataframes every year.

Here is what I code:

store = pd.DataFrame(list(range(1967,2014)), columns=['year'])
for year in range(1967,2014):
    ....some codes that allow me to generate df1, df2 and df3 correctly...
    store.loc[store['year']==year, 'df1']=df1
    store.loc[store['year']==year, 'df2']=df2
    store.loc[store['year']==year, 'df3']=df3

I am not getting error warning or anything after this code. But in the "store" dataframe, columns 'df1', 'df2' and 'df3' are all 'NAN' values.

645

asked Mar 11 '16 04:03

Stephen

1 Answers

I think that pandas offers better alternatives to what you're suggesting (rationale below).

For one, there's the pandas.Panel data structure, which was meant for things like you're doing here.

However, as Wes McKinney (the Pandas author) noted in his book Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, multi-dimensional indices, to a large extent, offer a better alternative.

Consider the following alternative to your code:

dfs = []
for year in range(1967,2014):
    ....some codes that allow me to generate df1, df2 and df3 
    df1['year'] = year
    df1['origin'] = 'df1'
    df2['year'] = year
    df2['origin'] = 'df2'
    df3['year'] = year
    df3['origin'] = 'df3'
    dfs.extend([df1, df2, df3])
df = pd.concat(dfs)

This gives you a DataFrame with 4 columns: 'firm', 'price', 'year', and 'origin'.

This gives you the flexibility to:

Organize hierarchically by, say, 'year' and 'origin': df.set_index(['year', 'origin']), by, say, 'origin' and 'price': df.set_index(['origin', 'price'])
Do groupbys according to different levels
In general, slice and dice the data along many different ways.

What you're suggesting in the question makes one dimension (origin) arbitrarily different, and it's hard to think of an advantage to this. If a split along some dimension is necessary due, to, e.g., performance, you can combine DataFrames better with standard Python data structures:

A dictionary mapping each year to a Dataframe with the other three dimensions.
Three DataFrames, one for each origin, each having three dimensions.

100

answered Sep 28 '22 09:09

Ami Tavory

Related questions
                            
                                Python threading/multiprocessing don't need Mutex?
                            
                                Dockable window in Maya with PySide clean up
                            
                                Attaching intensity to 3D plot
                            
                                How to test the validity of your Sphinx documentation?
                            
                                playing video file using python opencv
                            
                                Why do I get a spurious ']' character in syslog messages with Python's SysLogHandler on OS X?
                            
                                What is time complexity of divison?
                            
                                How to prevent re-definition of keys in YAML?
                            
                                Difference between __getattribute__ and obj.__dict__['x'] in python?
                            
                                Implement realtime signal processing in Python - how to capture audio continuously?
                            
                                django.db.utils.IntegrityError: duplicate key value violates unique constraint "spirit_category_category_pkey"
                            
                                How to get the names of the named variables from the python string
                            
                                List all third party packages and theirselves functions used in a Python file
                            
                                How to share logger instance among Flask app and other classes
                            
                                Django datetimefield timezone aware CET
                            
                                change orientation of contour clabel text objects
                            
                                How should I structure and access a table of data so that I can compare subsets easily in Python 3.5?
                            
                                Is the interaction between python unittest subTest and skipTest defined?
                            
                                Doctest not recognizing __future__.division
                            
                                Explain the difference between these Midpoint Algorithms

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

DataFrame of DataFrames in Python (Pandas)

Tags:

python

pandas

input

Stephen

People also ask

1 Answers

Ami Tavory

Recent Activity

Donate For Us