Drop duplicates from level in hierarchical index pandas

Question

I want to de-duplicate the following hierarchical indexed dataframe based off the second index. I haven't been able to find a way of doing this. there is a pandas.Multiindex.drop_duplicates() but it doesn't allow you to specify level.

An example dataframe is:

In [5]: df
Out[5]:
               given_name  surname  dob  phone_number_1_clean 
985    2414           1.0      1.0  0.0                   1.0
       122864         1.0      1.0  0.0                   0.0
       167863         1.0      1.0  0.0                   0.0
       418911         1.0      1.0  0.0                   0.0
       516362         1.0      1.0  0.0                   0.0
2414   122864         1.0      1.0  0.0                   0.0
       167863         1.0      1.0  1.0                   0.0
       418911         1.0      1.0  1.0                   0.0
       516362         1.0      1.0  0.0                   0.0
122864 167863         1.0      1.0  0.0                   1.0
       418911         1.0      1.0  0.0                   1.0
       516362         1.0      1.0  0.0                   1.0
167863 418911         1.0      1.0  1.0                   1.0
       516362         1.0      1.0  0.0                   1.0
418911 516362         1.0      1.0  0.0                   1.0

The output should look:

               given_name  surname  dob  phone_number_1_clean 
985    2414           1.0      1.0  0.0                   1.0
       122864         1.0      1.0  0.0                   0.0
       167863         1.0      1.0  0.0                   0.0
       418911         1.0      1.0  0.0                   0.0
       516362         1.0      1.0  0.0                   0.0

jezrael · Accepted Answer

Use get_level_values for select second level of MultiIndex with duplicated for boolean mask, invert condition and filter by boolean indexing:

df = df[~df.index.get_level_values(1).duplicated()]
print (df)
            given_name  surname  dob  phone_number_1_clean
985 2414           1.0      1.0  0.0                   1.0
    122864         1.0      1.0  0.0                   0.0
    167863         1.0      1.0  0.0                   0.0
    418911         1.0      1.0  0.0                   0.0
    516362         1.0      1.0  0.0                   0.0

Detail:

print (df.index.get_level_values(1))
Int64Index([  2414, 122864, 167863, 418911, 516362, 122864, 167863, 418911,
            516362, 167863, 418911, 516362, 418911, 516362, 516362],
           dtype='int64')

print (df.index.get_level_values(1).duplicated())
[False False False False False  True  True  True  True  True  True  True
  True  True  True]

print (~df.index.get_level_values(1).duplicated())
[ True  True  True  True  True False False False False False False False
 False False False]

Drop duplicates from level in hierarchical index pandas

Tags:

python

pandas

Auren Ferguson

1 Answers

jezrael

Recent Activity

Donate For Us

Drop duplicates from level in hierarchical index pandas

Tags:

python

pandas

Auren Ferguson

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us