I have this data frame: <pre class="prettyprint"><code>>> df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]}) >> df Place Values Var 0 A 250 All 1 A 30 French 2 B 120 All 3 B 12 German 4 C 200 All 5 C 112 Spanish </code></pre> It has a repeating pattern of two rows for every <code>Place</code>. I want to reshape it so it's one row per <code>Place</code> and the <code>Var</code> column becomes two columns, one for "All" and one for the other value. Like so: <pre class="prettyprint"><code>Place All Language Value A 250 French 30 B 120 German 12 C 200 Spanish 112 </code></pre> A pivot table would make a column for each unique value, and I don't want that. What's the reshaping method for this?

Because the data appears in alternating pattern, we can conceptualize the transformation in 2 steps. Step 1: Go from <pre class="prettyprint"><code>a,a,a b,b,b </code></pre> To <pre class="prettyprint"><code>a,a,a,b,b,b </code></pre> Step 2: drop redundant columns. The following solution applies <code>reshape</code> to the <code>values</code> of the DataFrame; the arguments to reshape are <code>(-1, df.shape[1] * 2)</code>, which says 'give me a frame that has twice as many columns and as many rows as you can manage. Then, I hardwired the column indexes for the filter: <code>[0, 1, 4, 5]</code> based on your data layout. Resulting <code>numpy</code> array has 4 columns, so we pass it into a <code>DataFrame</code> constructor along with the correct column names. It is an unreadable solution that depends on the <code>df</code> layout and produces columns in the wrong order; <pre class="prettyprint"><code>import pandas as pd df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]}) df = pd.DataFrame(df.values.reshape(-1, df.shape[1] * 2)[:,[0,1,4,5]], columns = ['Place', 'All', 'Value', 'Language']) </code></pre>

Reshaping pandas data frame into as many columns as there are repeating rows

Tags:

python

pandas

I have this data frame:

>> df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]})

>> df
  Place  Values      Var
0     A     250      All
1     A      30   French
2     B     120      All
3     B      12   German
4     C     200      All
5     C     112  Spanish

It has a repeating pattern of two rows for every Place. I want to reshape it so it's one row per Place and the Var column becomes two columns, one for "All" and one for the other value.

Like so:

Place   All   Language   Value
    A   250     French      30
    B   120     German      12
    C   200     Spanish    112

A pivot table would make a column for each unique value, and I don't want that.

What's the reshaping method for this?

788

asked Apr 01 '16 14:04

robroc

1 Answers

Because the data appears in alternating pattern, we can conceptualize the transformation in 2 steps.

Step 1:

Go from

a,a,a
b,b,b

a,a,a,b,b,b

Step 2: drop redundant columns.

The following solution applies reshape to the values of the DataFrame; the arguments to reshape are (-1, df.shape[1] * 2), which says 'give me a frame that has twice as many columns and as many rows as you can manage.

Then, I hardwired the column indexes for the filter: [0, 1, 4, 5] based on your data layout. Resulting numpy array has 4 columns, so we pass it into a DataFrame constructor along with the correct column names.

It is an unreadable solution that depends on the df layout and produces columns in the wrong order;

import pandas as pd

df = pd.DataFrame({'Place' : ['A', 'A', 'B', 'B', 'C', 'C'], 'Var' : ['All', 'French', 'All', 'German', 'All', 'Spanish'], 'Values' : [250, 30, 120, 12, 200, 112]})

df = pd.DataFrame(df.values.reshape(-1, df.shape[1] * 2)[:,[0,1,4,5]],
    columns = ['Place', 'All', 'Value', 'Language'])

answered Sep 28 '22 07:09

hilberts_drinking_problem

Related questions
                            
                                Multiindex only some of columns in Pandas
                            
                                Create a method attribute in a class
                            
                                Setting values with multiindex in pandas
                            
                                Docker. No such file or directory
                            
                                Messed up numpy installation - `GFORTRAN_1.4' not found bug
                            
                                Accessing rows of an array, inside an array of arrays?
                            
                                Name columns when importing csv to dataframe in dask
                            
                                Python - How to handle HTTPS request with (Urllib2 + SSL) though a HTTP proxy
                            
                                How to use sklearn Pipeline with custom Features?
                            
                                How to save a NumPy array as 16-bit single channel PNG image? [duplicate]
                            
                                Working in Pandas with variable names with a common suffix
                            
                                Write a dictionary of lists to csv in Python [duplicate]
                            
                                Pass a variable in IPython / Jupyter to a block of html (%%html)
                            
                                Why the set is ordered when converting a list to set?
                            
                                Does Spark SQL do predicate pushdown on filtered equi-joins?
                            
                                Why does numpy.broadcast "transpose" results of vstack and similar functions?
                            
                                how to not install package with pip to anaconda
                            
                                Populate QComboBox with a list
                            
                                Python list error: [::-1] step on [:-1] slice
                            
                                Admin Site: TemplateDoesNotExist at /admin/

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With