Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the pandas.Panel deprecation warning actually recommending?

I have a package that uses pandas Panels to generate MultiIndex pandas DataFrames. However, whenever I use pandas.Panel, I get the following DeprecationError:

DeprecationWarning: Panel is deprecated and will be removed in a future version. The recommended way to represent these types of 3-dimensional data are with a MultiIndex on a DataFrame, via the Panel.to_frame() method. Alternatively, you can use the xarray package http://xarray.pydata.org/en/stable/. Pandas provides a .to_xarray() method to help automate this conversion.

However, I can't understand what the first recommendation here is actually recommending in order to create MultiIndex DataFrames. If Panel is going to be removed, how am I going to be able to use Panel.to_frame?


To clarify: I am not asking what deprecation is, or how to convert my Panels to DataFrames. What I am asking is, if I am using pandas.Panel and then pandas.Panel.to_frame in a library to create MultiIndex DataFrames from 3D ndarrays, and Panels are going to be deprecated, then what is the best option for making those DataFrames without using the Panel API?

Eg, if I'm doing the following, with X as a ndarray with shape (N,J,K):

p = pd.Panel(X, items=item_names, major_axis=names0, minor_axis=names1)
df = p.to_frame()

this is clearly no longer a viable future-proof option for DataFrame construction, though it was the recommended method in this question.

like image 214
cge Avatar asked Jan 28 '18 01:01

cge


People also ask

Is panel removed from pandas?

Panel data structure. According to Pandas development, Panel is deprecated and will be removed in future version.

Is there an alternative to pandas?

Panda, NumPy, R Language, Apache Spark, and PySpark are the most popular alternatives and competitors to Pandas.

What is pandas panel in Python?

In Pandas, Panel is a very important container for three-dimensional data. The names for the 3 axes are intended to give some semantic meaning to describing operations involving panel data and, in particular, econometric analysis of panel data. In Pandas Panel. shape can be used to get a tuple of axis dimensions.

What is the purpose of using pandas?

Pandas is mainly used for data analysis and associated manipulation of tabular data in Dataframes. Pandas allows importing data from various file formats such as comma-separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel.


1 Answers

Consider the following panel:

data = np.random.randint(1, 10, (5, 3, 2))
pnl = pd.Panel(
    data, 
    items=['item {}'.format(i) for i in range(1, 6)], 
    major_axis=[2015, 2016, 2017], 
    minor_axis=['US', 'UK']
)

If you convert this to a DataFrame, this becomes:

             item 1  item 2  item 3  item 4  item 5
major minor                                        
2015  US          9       6       3       2       5
      UK          8       3       7       7       9
2016  US          7       7       8       7       5
      UK          9       1       9       9       1
2017  US          1       8       1       3       1
      UK          6       8       8       1       6

So it takes the major and minor axes as the row MultiIndex, and items as columns. The shape has become (6, 5) which was originally (5, 3, 2). It is up to you where to use the MultiIndex but if you want the exact same shape, you can do the following:

data = data.reshape(5, 6).T
df = pd.DataFrame(
    data=data,
    index=pd.MultiIndex.from_product([[2015, 2016, 2017], ['US', 'UK']]),
    columns=['item {}'.format(i) for i in range(1, 6)]
)

which yields the same DataFrame (use the names parameter of pd.MultiIndex.from_product if you want to name your indices):

         item 1  item 2  item 3  item 4  item 5
2015 US       9       6       3       2       5
     UK       8       3       7       7       9
2016 US       7       7       8       7       5
     UK       9       1       9       9       1
2017 US       1       8       1       3       1
     UK       6       8       8       1       6

Now instead of pnl['item1 1'], you use df['item 1'] (optionally df['item 1'].unstack()); instead of pnl.xs(2015) you use df.xs(2015) and instead of pnl.xs('US', axis='minor'), you use df.xs('US', level=1).

As you see, this is just a matter of reshaping your initial 3D numpy array to 2D. You add the other (artificial) dimension with the help of MultiIndex.

like image 172
ayhan Avatar answered Sep 28 '22 04:09

ayhan