Long story short
I have a nested dictionary. When I turn it into a dataframe.
import pandas
pdf = pandas.DataFrame(nested_dict)
95 96 97 98 99 100 101 102 103 104 105 \
A 70019 102 4243 3083 3540 6311 4851 5938 4140 4659 3100
C 0 185 427 433 1190 910 3898 3869 2861 2149 3065
D 8 9 23463 1237 2574 4174 3640 4747 3557 4582 5934
E 141 89 5034 1576 2303 3416 2377 1252 1204 1703 718
F 7 12 1937 2246 1687 1154 1317 3473 1881 2221 3060
G 343 1550 13497 10659 12343 8213 9251 7341 6354 9058 9022
H 1 1978 1829 1394 1945 1003 1382 1489 4182 932 556
I 5 772 1361 3914 3255 3242 2808 3765 3284 2127 3120
K 3 10353 540 2364 1196 882 3439 2107 803 743 621
L 6 14 1599 11759 4571 4821 3450 5071 4364 1891 3677
M 1 6 158 211 524 2738 686 443 612 509 1721
N 6 186 299 2971 791 1440 2028 1163 1689 4296 1535
P 54 31 726 6208 7160 5494 6184 4282 3587 3727 3821
Q 10 87 1228 2233 1016 1801 1768 1693 3414 515 563
R 7 53939 3030 8904 6712 6134 5127 3223 4764 3768 6429
S 76 5213 3676 7480 9831 7666 5410 8185 7508 11237 8298
T 4369 1253 3087 2487 6559 4572 6863 3184 7352 6068 4756
V 732 5 7595 4331 5216 5444 5187 6013 4245 4545 4761
W 0 6 103 1225 598 888 601 713 1298 1323 908
Y 12 9 1968 1085 2787 5489 5529 7840 8691 9745 10136
Eventually I want to melt down this data frame to look like the following.
residue residue_num count
A 95 70019
A 96 102
A 97 4243
....
The residue column is being marked as the index so I don't know how to make it an arbitrary index like 0,1,2,3 and call "A C D E F.." another name.
EDIT Answered myself as per suggestion
Pandas DataFrame reindex() Method The reindex() method allows you to change the row indexes, and the columns labels. Note: The values are set to NaN if the new index is not the same as the old.
pandas. reset_index in pandas is used to reset index of the dataframe object to default indexing (0 to number of rows minus 1) or to reset multi level index. By doing so, the original index gets converted to a column.
Use DataFrame.reset_index() function reset_index() to reset the index of the updated DataFrame. By default, it adds the current row index as a new column called 'index' in DataFrame, and it will create a new row index as a range of numbers starting at 0.
Answered from here and here
import pandas
pdf = pandas.DataFrame(the_matrix)
pdf = pdf.reset_index()
pdf.rename(columns={'index':'aa'},inplace=True)
pandas.melt(pdf,id_vars='aa',var_name="position",value_name="counts")
aa position counts
0 A 95 70019
1 C 95 0
2 D 95 8
3 E 95 141
4 F 95 7
5 G 95 343
6 H 95 1
7 I 95 5
8 K 95 3
Your pdf looks like a pivot table. Let's assume we have a dataframe with three columns. We can pivot it with a single function like this:
pivoted = df.pivot(index='col1',columns='col2',values='col3')
Unpivoting it back without losing the index requires a reset_index
dance:
pivoted.reset_index().melt(id_vars=pivoted.index.name)
To get the exact original df:
pivoted.reset_index().melt(id_vars=pivoted.index.name, var_name='col2', value_name='col3')
PS. To my surprise, melt does not get a kwarg like keep_index=True
. Enhancement suggestion is still open: https://github.com/pandas-dev/pandas/issues/17440
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With