I would like to have a unique index value, instead of having the same one repeated many times.
Example: I have this dataframe:
test = pd.DataFrame({'id': ['a','a','a','a','b'],
'col_1': [1,2,3,4,5],
'col_2': [6,7,8,9,10]
})
id col_1 col_2
0 a 1 6
1 a 2 7
2 a 3 8
3 a 4 9
4 b 5 10
And what I want to achieve, is to have the id column as an index, and not repeated. I tried this, but as you can see, the index is repeated in every row:
test.set_index('id')
col_1 col_2
id
a 1 6
a 2 7
a 3 8
a 4 9
b 5 10
And what I would like to achieve, is this (the index 'a' for all the group of 4 values, etc):
col_1 col_2
id
a 1 6
2 7
3 8
4 9
b 5 10
Any ideas how to do it? Thanks in advance.
You can set the id
column as an index. To avoid duplicate index entries, also set the index as the second level of the resulting MultiIndex.
test.set_index(['id', test.index])
# Out:
col_1 col_2
id
a 0 1 6
1 2 7
2 3 8
3 4 9
b 4 5 10
If you really don't want to have the non-duplicate index level, simply set id
as the index. But mind that in this case the displayed format of pandas will include the duplicates:
test.set_index('id')
# Out:
col_1 col_2
id
a 1 6
a 2 7
a 3 8
a 4 9
b 5 10
Also test.set_index('id').index.duplicated().any()
will yield True
, with the typical non-optimal consequences for indices containing duplicates.
If want replace duplicated values to ''
for displaying, but better is duplicated index values, if need later processing:
df = test.set_index('id')
df1 = df.set_index(df.index.where(~df.index.duplicated(), ''))
print (df1)
col_1 col_2
id
a 1 6
2 7
3 8
4 9
b 5 10
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With