I have a dataframe which look like: <pre class="prettyprint lang-py prettyprint-override"><code>0 target_year ID v1 v2 1 2000 1 0.3 1 2 2000 2 1.2 4 ... 10 2001 1 3 2 11 2001 2 2 2 </code></pre> An I would like the following output: <pre class="prettyprint lang-py prettyprint-override"><code>0 ID v1_1 v2_1 v1_2 v2_2 1 1 0.3 1 3 2 2 2 1.2 4 2 2 </code></pre> Do you have any idea how to do that?

You could use <code>pd.pivot_table</code>, using the <code>GroupBy.cumcount</code> of <code>ID</code> as columns. Then we can use a list comprehension with <code>f-strings</code> to merge the <code>MultiIndex</code> header into a sinlge level: <pre class="prettyprint"><code>cols = df.groupby('ID').ID.cumcount() + 1 df_piv = (pd.pivot_table(data = df.drop('target_year', axis=1)[['v1','v2']], index = df.ID, columns = cols) df_piv.columns = [f'{i}_{j}' for i,j in df_piv.columns] v1_1 v1_2 v2_1 v2_2 ID 1 0.3 3.0 1 2 2 1.2 2.0 4 2 </code></pre>

Use <code>GroupBy.cumcount</code> for counter column, reshape by <code>DataFrame.set_index</code> with <code>DataFrame.unstack</code> and last flatten in list comprehension and <code>f-string</code>s: <pre class="prettyprint"><code>g = df.groupby('ID').ID.cumcount() + 1 df = df.drop('target_year', axis=1).set_index(['ID', g]).unstack() df.columns = [f'{a}_{b}' for a, b in df.columns] df = df.reset_index() print (df) ID v1_1 v1_2 v2_1 v2_2 0 1 0.3 3.0 1 2 1 2 1.2 2.0 4 2 </code></pre>

Create new variables from row for each existing variable in pandas dataframe

Tags:

python

pandas

dataframe

I have a dataframe which look like:

0  target_year ID   v1  v2  
1  2000         1  0.3   1
2  2000         2  1.2   4
...
10 2001         1    3   2
11 2001         2    2   2

An I would like the following output:

0   ID   v1_1  v2_1  v1_2  v2_2  
1    1    0.3     1     3     2 
2    2    1.2     4     2     2

Do you have any idea how to do that?

392

asked May 15 '19 09:05

Thomuf

2 Answers

You could use pd.pivot_table, using the GroupBy.cumcount of ID as columns.

Then we can use a list comprehension with f-strings to merge the MultiIndex header into a sinlge level:

cols = df.groupby('ID').ID.cumcount() + 1
df_piv = (pd.pivot_table(data = df.drop('target_year', axis=1)[['v1','v2']],
                         index = df.ID, 
                         columns = cols)
df_piv.columns = [f'{i}_{j}' for i,j in df_piv.columns]


     v1_1  v1_2  v2_1  v2_2
ID                        
1    0.3   3.0     1     2
2    1.2   2.0     4     2

answered Oct 19 '22 04:10

yatu

Use GroupBy.cumcount for counter column, reshape by DataFrame.set_index with DataFrame.unstack and last flatten in list comprehension and f-strings:

g = df.groupby('ID').ID.cumcount() + 1

df = df.drop('target_year', axis=1).set_index(['ID', g]).unstack()
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
   ID  v1_1  v1_2  v2_1  v2_2
0   1   0.3   3.0     1     2
1   2   1.2   2.0     4     2

answered Oct 19 '22 05:10

jezrael

Related questions
                            
                                Is it possible to use a custom filter function in pandas?
                            
                                pandas: Fill missing dates when keeping duplicates
                            
                                Pandas DataFrame: mean of column B values within column A windows
                            
                                Django update_or_create (get part) using related object as kwarg
                            
                                How to put multiple colormap patches in a matplotlib legend?
                            
                                Convert UTC timestamp to local timezone issue in pandas
                            
                                How to import one databricks notebook into another?
                            
                                Joining Two Different Dataframes on Timestamp
                            
                                Calculating Rolling forward averages with pandas
                            
                                How to validate html forms in python Flask?
                            
                                Why is there so much speed difference between these two variants?
                            
                                Extracting parts of array repeatedly
                            
                                when extending python with c, how do one cope with arbitrary size integers?
                            
                                How to create a tree from a list of subtrees?
                            
                                What is the best way to run python scripts in AWS?
                            
                                Why is my Flask error handler not being called?
                            
                                Overhead of python multiprocessing initialization is worse than benefits
                            
                                Binary-vectorize pandas DataFrame column
                            
                                How does pytest.approx accomplish its magic?
                            
                                Take the difference of all elements of a series with the previous ones in python pandas

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With