The idea here is that for every year, I am able to create three dataframes(df1, df2, df3), each containing different firms and stock prices('firm' and 'price' are the two columns in df1~df3). I would like to use another dataframe (named 'store' below) to store the three dataframes every year.
Here is what I code:
store = pd.DataFrame(list(range(1967,2014)), columns=['year'])
for year in range(1967,2014):
....some codes that allow me to generate df1, df2 and df3 correctly...
store.loc[store['year']==year, 'df1']=df1
store.loc[store['year']==year, 'df2']=df2
store.loc[store['year']==year, 'df3']=df3
I am not getting error warning or anything after this code. But in the "store" dataframe, columns 'df1', 'df2' and 'df3' are all 'NAN' values.
You can create a new DataFrame of a specific column by using DataFrame. assign() method. The assign() method assign new columns to a DataFrame, returning a new object (a copy) with the new columns added to the original ones.
DataFrame. DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. You can think of it like a spreadsheet or SQL table, or a dict of Series objects. It is generally the most commonly used pandas object.
A DataFrame is a data structure that organizes data into a 2-dimensional table of rows and columns, much like a spreadsheet. DataFrames are one of the most common data structures used in modern data analytics because they are a flexible and intuitive way of storing and working with data.
I think that pandas offers better alternatives to what you're suggesting (rationale below).
For one, there's the pandas.Panel
data structure, which was meant for things like you're doing here.
However, as Wes McKinney (the Pandas author) noted in his book Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, multi-dimensional indices, to a large extent, offer a better alternative.
Consider the following alternative to your code:
dfs = []
for year in range(1967,2014):
....some codes that allow me to generate df1, df2 and df3
df1['year'] = year
df1['origin'] = 'df1'
df2['year'] = year
df2['origin'] = 'df2'
df3['year'] = year
df3['origin'] = 'df3'
dfs.extend([df1, df2, df3])
df = pd.concat(dfs)
This gives you a DataFrame with 4 columns: 'firm'
, 'price'
, 'year'
, and 'origin'
.
This gives you the flexibility to:
Organize hierarchically by, say, 'year'
and 'origin'
: df.set_index(['year', 'origin'])
, by, say, 'origin'
and 'price'
: df.set_index(['origin', 'price'])
Do groupby
s according to different levels
In general, slice and dice the data along many different ways.
What you're suggesting in the question makes one dimension (origin) arbitrarily different, and it's hard to think of an advantage to this. If a split along some dimension is necessary due, to, e.g., performance, you can combine DataFrames better with standard Python data structures:
A dictionary mapping each year to a Dataframe with the other three dimensions.
Three DataFrames, one for each origin, each having three dimensions.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With