Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Have unique index value in Pandas DataFrame

I would like to have a unique index value, instead of having the same one repeated many times.

Example: I have this dataframe:

test = pd.DataFrame({'id': ['a','a','a','a','b'],
                     'col_1': [1,2,3,4,5],
                     'col_2': [6,7,8,9,10]
                     })

  id  col_1  col_2
0  a  1      6    
1  a  2      7    
2  a  3      8    
3  a  4      9    
4  b  5      10  

And what I want to achieve, is to have the id column as an index, and not repeated. I tried this, but as you can see, the index is repeated in every row:

test.set_index('id')

    col_1  col_2
id              
a   1      6    
a   2      7    
a   3      8    
a   4      9    
b   5      10  

And what I would like to achieve, is this (the index 'a' for all the group of 4 values, etc):

    col_1  col_2
id              
a   1      6    
    2      7    
    3      8    
    4      9    
b   5      10  

Any ideas how to do it? Thanks in advance.

like image 571
Matias Eiletz Avatar asked Sep 08 '20 09:09

Matias Eiletz


2 Answers

You can set the id column as an index. To avoid duplicate index entries, also set the index as the second level of the resulting MultiIndex.

test.set_index(['id', test.index])

# Out:
      col_1  col_2
id                
a  0      1      6
   1      2      7
   2      3      8
   3      4      9
b  4      5     10

If you really don't want to have the non-duplicate index level, simply set id as the index. But mind that in this case the displayed format of pandas will include the duplicates:

test.set_index('id')
# Out: 
    col_1  col_2
id              
a       1      6
a       2      7
a       3      8
a       4      9
b       5     10

Also test.set_index('id').index.duplicated().any() will yield True, with the typical non-optimal consequences for indices containing duplicates.

like image 67
JE_Muc Avatar answered Oct 15 '22 06:10

JE_Muc


If want replace duplicated values to '' for displaying, but better is duplicated index values, if need later processing:

df = test.set_index('id')

df1 = df.set_index(df.index.where(~df.index.duplicated(), ''))
print (df1)
    col_1  col_2
id              
a       1      6
        2      7
        3      8
        4      9
b       5     10
like image 25
jezrael Avatar answered Oct 15 '22 04:10

jezrael