Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pivoting pandas dataframe into prefixed cols, not a MultiIndex

Tags:

python

pandas

I have a timeseries dataframe that is similar to:

ts = pd.DataFrame([['Jan 2000','WidgetCo',0.5, 2], ['Jan 2000','GadgetCo',0.3, 3], ['Jan 2000','SnazzyCo',0.2, 4],
          ['Feb 2000','WidgetCo',0.4, 2], ['Feb 2000','GadgetCo',0.5, 2.5], ['Feb 2000','SnazzyCo',0.1, 4],
          ], columns=['month','company','share','price'])

Which looks like:

  month   company  share  price
0  Jan 2000  WidgetCo    0.5    2.0
1  Jan 2000  GadgetCo    0.3    3.0
2  Jan 2000  SnazzyCo    0.2    4.0
3  Feb 2000  WidgetCo    0.4    2.0
4  Feb 2000  GadgetCo    0.5    2.5
5  Feb 2000  SnazzyCo    0.1    4.0

I can pivot this table like so:

pd.pivot_table(ts,index='month', columns='company')

Which gets me:

            share                      price                  
company  GadgetCo SnazzyCo WidgetCo GadgetCo SnazzyCo WidgetCo
month                                                         
Feb 2000      0.5      0.1      0.4      2.5        4        2
Jan 2000      0.3      0.2      0.5      3.0        4        2

This is what I want except that I need to collapse the MultiIndex so that the company is used as a prefix for share and price like so:

          WidgetCo_share  WidgetCo_price  GadgetCo_share  GadgetCo_price   ...
month                                                                      
Jan 2000             0.5               2             0.3             3.0   
Feb 2000             0.4               2             0.5             2.5   

I came up with this function to do just that but it seems like a poor solution:

def pivot_table_to_flat(df, column, index):
    res = df.set_index(index)
    cols = res.drop(column, axis=1).columns.values
    resulting_cols = []
    for prefix in res[column].unique():
        for col in cols:
            new_col_name = prefix + '_' + col
            res[new_col_name] = res[res[column] == prefix][col]
            resulting_cols.append(new_col_name)

    return res[resulting_cols]

pivot_table_to_flat(ts, index='month', column='company')

What is a better way of accomplishing a pivot resulting in a columns with prefixes as opposed to a MultiIndex?

like image 980
Ben Mabey Avatar asked Nov 21 '14 22:11

Ben Mabey


People also ask

How to revert multi-index to single index in pandas Dataframe?

To revert the index of the dataframe from multi-index to a single index using the Pandas inbuilt function reset_index (). Syntax: DataFrame.reset_index (level=None, drop=False, inplace=False, col_level=0, col_fill=”) Returns: (Data Frame or None) DataFrame with the new index or None if inplace=True. Reverting the Multi-index using the above way i.

What are multi-level columns in pandas Dataframe?

Multi-level columns are used when you wanted to group columns together. 1. Create MultiIndex pandas DataFrame (Multi level Index) A multi-level index DataFrame is a type of DataFrame that contains multiple level or hierarchical indexing. You can create a MultiIndex (multi-level index) in the following ways.

What is pandas Dataframe pivot () function?

Pandas DataFrame: pivot () function Last update on May 27 2020 08:34:05 (UTC/GMT +8 hours) DataFrame - pivot () function The pivot () function is used to reshaped a given DataFrame organized by given index / column values.

How to select data from a Dataframe in pandas?

When it comes to select data on a DataFrame, Pandas loc is one of the top favorites. In a previous article, we have introduced the loc and iloc for selecting data in a general (single-index) DataFrame. Accessing data in a MultiIndex DataFrame can be done in a similar way to a single index DataFrame. We can also use : to return all data.


1 Answers

This seems even simpler:

df.columns = [' '.join(col).strip() for col in df.columns.values]

It takes a df with a multiindex column and flattens the column labels, with the df remaining in place.

(ref: @andy-haden Python Pandas - How to flatten a hierarchical index in columns )

like image 66
CPBL Avatar answered Nov 03 '22 04:11

CPBL