I have the following Python pandas dataframe:
id| country | 2016 | 2017 | 2018
--+----------+------+------+------
0 | saudi | A | null | B
1 | china | C | A | B
2 | pakistan | null | null | C
I want:
id| country | year | value
--+----------+------+------
0 | saudi | 2016 | A
1 | saudi | 2018 | B
2 | china | 2016 | C
3 | china | 2017 | A
4 | china | 2018 | B
5 | pakistan | 2018 | C
You can use df.melt
df.melt(['id','country'], df.columns[2:]).dropna()
id country variable value
0 0 saudi 2016 A
6 0 saudi 2018 B
1 1 china 2016 C
4 1 china 2017 A
7 1 china 2018 B
8 2 pakistan 2018 C
Then just add .sort_values('id')
if you want
Or you can use stack
df.set_index(['id','country']).stack().to_frame()
and then .reset_index()
if you want
It looks like using stack
will be a little more efficient:
melt: 11.3 ms ± 798 µs per loop (mean ± std. dev. of 3 runs, 1000 loops each)
stack: 9.18 ms ± 594 µs per loop (mean ± std. dev. of 3 runs, 1000 loops each)
yes you may also use unstack df2=df.unstack()
and then df2.reset_index()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With