I have a dataframe with columns 'NAME' and 'VALUE':
NAME VALUE
apple 6
banana 3
cherry 7
apple 2
banana 5
cherry 3
I would like to move the duplicate instances' VALUE to a new column as such:
NAME VALUE VALUE2
apple 6 2
banana 3 5
cherry 7 3
I have done this in Python previously by iterating through CSV rows and maintaining a unique list of NAME's read so far, checking for the existence of NAME in the list...if it is, then it's now the 2nd time I've seen it so I add its VALUE to the new column VALUE2. By the way, the rows are not assumed to be in any particular order.
Is there a quick way of doing this in Pandas?
We'll need two set_index
calls plus an unstack
:
df = df.set_index('NAME').set_index(
df.groupby('NAME').cumcount(), append=True
).unstack()
df.columns = df.columns.map('{0[0]}{0[1]}'.format)
df
VALUE0 VALUE1
NAME
apple 6 2
banana 3 5
cherry 7 3
By using pivot
df.assign(key=df.groupby('NAME').cumcount()+1).pivot('NAME','key','VALUE').add_prefix('VALUE_')
Out[397]:
key VALUE_1 VALUE_2
NAME
apple 6 2
banana 3 5
cherry 7 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With