I have this data frame:
>> df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]})
>> df
Place Values Var
0 A 250 All
1 A 30 French
2 B 120 All
3 B 12 German
4 C 200 All
5 C 112 Spanish
It has a repeating pattern of two rows for every Place
. I want to reshape it so it's one row per Place
and the Var
column becomes two columns, one for "All" and one for the other value.
Like so:
Place All Language Value
A 250 French 30
B 120 German 12
C 200 Spanish 112
A pivot table would make a column for each unique value, and I don't want that.
What's the reshaping method for this?
You can use the following basic syntax to convert a pandas DataFrame from a wide format to a long format: df = pd. melt(df, id_vars='col1', value_vars=['col2', 'col3', ...]) In this scenario, col1 is the column we use as an identifier and col2, col3, etc.
Fortunately, Pandas allows us to change the structure of the DataFrame in multiple ways. But first of all, we need to understand the concept of shape before explaining how these changes work. Shape refers to how a dataset is organized in rows and columns.
Pandas DataFrame: transpose() function The transpose() function is used to transpose index and columns. Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. If True, the underlying data is copied. Otherwise (default), no copy is made if possible.
Because the data appears in alternating pattern, we can conceptualize the transformation in 2 steps.
Step 1:
Go from
a,a,a
b,b,b
To
a,a,a,b,b,b
Step 2: drop redundant columns.
The following solution applies reshape
to the values
of the DataFrame; the arguments to reshape are (-1, df.shape[1] * 2)
, which says 'give me a frame that has twice as many columns and as many rows as you can manage.
Then, I hardwired the column indexes for the filter: [0, 1, 4, 5]
based on your data layout. Resulting numpy
array has 4 columns, so we pass it into a DataFrame
constructor along with the correct column names.
It is an unreadable solution that depends on the df
layout and produces columns in the wrong order;
import pandas as pd
df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]})
df = pd.DataFrame(df.values.reshape(-1, df.shape[1] * 2)[:,[0,1,4,5]],
columns = ['Place', 'All', 'Value', 'Language'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With