I have a dataframe which look like:
0 target_year ID v1 v2
1 2000 1 0.3 1
2 2000 2 1.2 4
...
10 2001 1 3 2
11 2001 2 2 2
An I would like the following output:
0 ID v1_1 v2_1 v1_2 v2_2
1 1 0.3 1 3 2
2 2 1.2 4 2 2
Do you have any idea how to do that?
In this example, instead of using the assign() method, we use square brackets ([]) to create a new variable or column for an existing Dataframe.
mutate() function in R Language is used to add new variables in a data frame which are formed by performing operation on existing variables.
Syntax to add multiple variables to a dataframe If you want to add multiple variables, you can do this with a single call to the assign method. Just type the name of your dataframe, call the method, and then provide the name-value pairs for each new variable, separated by commas.
You could use pd.pivot_table
, using the GroupBy.cumcount
of ID
as columns.
Then we can use a list comprehension with f-strings
to merge the MultiIndex
header into a sinlge level:
cols = df.groupby('ID').ID.cumcount() + 1
df_piv = (pd.pivot_table(data = df.drop('target_year', axis=1)[['v1','v2']],
index = df.ID,
columns = cols)
df_piv.columns = [f'{i}_{j}' for i,j in df_piv.columns]
v1_1 v1_2 v2_1 v2_2
ID
1 0.3 3.0 1 2
2 1.2 2.0 4 2
Use GroupBy.cumcount
for counter column, reshape by DataFrame.set_index
with DataFrame.unstack
and last flatten in list comprehension and f-string
s:
g = df.groupby('ID').ID.cumcount() + 1
df = df.drop('target_year', axis=1).set_index(['ID', g]).unstack()
df.columns = [f'{a}_{b}' for a, b in df.columns]
df = df.reset_index()
print (df)
ID v1_1 v1_2 v2_1 v2_2
0 1 0.3 3.0 1 2
1 2 1.2 2.0 4 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With