Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reshaping pandas data frame into as many columns as there are repeating rows

Tags:

python

pandas

I have this data frame:

>> df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]})

>> df
  Place  Values      Var
0     A     250      All
1     A      30   French
2     B     120      All
3     B      12   German
4     C     200      All
5     C     112  Spanish

It has a repeating pattern of two rows for every Place. I want to reshape it so it's one row per Place and the Var column becomes two columns, one for "All" and one for the other value.

Like so:

Place   All   Language   Value
    A   250     French      30
    B   120     German      12
    C   200     Spanish    112

A pivot table would make a column for each unique value, and I don't want that.

What's the reshaping method for this?

like image 788
robroc Avatar asked Apr 01 '16 14:04

robroc


People also ask

How do you restructure DataFrame in Pandas?

You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.

Can we reshape Pandas DataFrame?

Fortunately, Pandas allows us to change the structure of the DataFrame in multiple ways. But first of all, we need to understand the concept of shape before explaining how these changes work. Shape refers to how a dataset is organized in rows and columns.

How do I convert rows to columns in Pandas?

Pandas DataFrame: transpose() function The transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.


1 Answers

Because the data appears in alternating pattern, we can conceptualize the transformation in 2 steps.

Step 1:

Go from

a,a,a
b,b,b

To

a,a,a,b,b,b

Step 2: drop redundant columns.

The following solution applies reshape to the values of the DataFrame; the arguments to reshape are (-1, df.shape[1] * 2), which says 'give me a frame that has twice as many columns and as many rows as you can manage.

Then, I hardwired the column indexes for the filter: [0, 1, 4, 5] based on your data layout. Resulting numpy array has 4 columns, so we pass it into a DataFrame constructor along with the correct column names.

It is an unreadable solution that depends on the df layout and produces columns in the wrong order;

import pandas as pd

df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]})

df = pd.DataFrame(df.values.reshape(-1, df.shape[1] * 2)[:,[0,1,4,5]],
    columns = ['Place', 'All', 'Value', 'Language'])
like image 84
hilberts_drinking_problem Avatar answered Sep 28 '22 07:09

hilberts_drinking_problem