Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Create new variables from row for each existing variable in pandas dataframe

I have a dataframe which look like:

0  target_year ID   v1  v2  
1  2000         1  0.3   1
2  2000         2  1.2   4
...
10 2001         1    3   2
11 2001         2    2   2

An I would like the following output:

0   ID   v1_1  v2_1  v1_2  v2_2  
1    1    0.3     1     3     2 
2    2    1.2     4     2     2

Do you have any idea how to do that?

like image 392
Thomuf Avatar asked May 15 '19 09:05

Thomuf


People also ask

How do you add a new variable to an existing dataset in Python?

In this example, instead of using the assign() method, we use square brackets ([]) to create a new variable or column for an existing Dataframe.

Which function is used to add new variables to an existing DataFrame?

mutate() function in R Language is used to add new variables in a data frame which are formed by performing operation on existing variables.

How do you add a new variable to a data frame?

Syntax to add multiple variables to a dataframe If you want to add multiple variables, you can do this with a single call to the assign method. Just type the name of your dataframe, call the method, and then provide the name-value pairs for each new variable, separated by commas.


2 Answers

You could use pd.pivot_table, using the GroupBy.cumcount of ID as columns.

Then we can use a list comprehension with f-strings to merge the MultiIndex header into a sinlge level:

cols = df.groupby('ID').ID.cumcount() + 1
df_piv = (pd.pivot_table(data = df.drop('target_year', axis=1)[['v1','v2']],
                         index = df.ID, 
                         columns = cols)
df_piv.columns = [f'{i}_{j}' for i,j in df_piv.columns]


     v1_1  v1_2  v2_1  v2_2
ID                        
1    0.3   3.0     1     2
2    1.2   2.0     4     2
like image 70
yatu Avatar answered Oct 19 '22 04:10

yatu


Use GroupBy.cumcount for counter column, reshape by DataFrame.set_index with DataFrame.unstack and last flatten in list comprehension and f-strings:

g = df.groupby('ID').ID.cumcount() + 1

df = df.drop('target_year', axis=1).set_index(['ID', g]).unstack()
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
   ID  v1_1  v1_2  v2_1  v2_2
0   1   0.3   3.0     1     2
1   2   1.2   2.0     4     2
like image 2
jezrael Avatar answered Oct 19 '22 05:10

jezrael