Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I increment a level in Pandas MultiIndex?

How can I increment all values in a specific level of a pandas multiindex?

like image 903
ajwood Avatar asked Dec 04 '16 18:12

ajwood


2 Answers

You can create new MultiIndex.from_tuples and assign:

df = pd.DataFrame({'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9],
                   'D':[1,3,5],
                   'E':[5,3,6],
                   'F':[7,4,3]})

df = df.set_index(['A','B'])
print (df)
     C  D  E  F
A B            
1 4  7  1  5  7
2 5  8  3  3  4
3 6  9  5  6  3

#change multiindex
new_index = list(zip(df.index.get_level_values('A'), df.index.get_level_values('B') + 1))
df.index = pd.MultiIndex.from_tuples(new_index, names = df.index.names)
print (df)
     C  D  E  F
A B            
1 5  7  1  5  7
2 6  8  3  3  4
3 7  9  5  6  3

Another possible solution with reset_index and set_index:

df = df.reset_index()
df.B = df.B + 1
df = df.set_index(['A','B'])
print (df)
     C  D  E  F
A B            
1 5  7  1  5  7
2 6  8  3  3  4
3 7  9  5  6  3

Solution with DataFrame.assign:

print (df.reset_index().assign(B=lambda x: x.B+1).set_index(['A','B']))

Timings:

In [26]: %timeit (reset_set(df1))
1 loop, best of 3: 144 ms per loop

In [27]: %timeit (assign_method(df3))
10 loops, best of 3: 161 ms per loop

In [28]: %timeit (jul(df2))
1 loop, best of 3: 543 ms per loop

In [29]: %timeit (tuples_method(df))
1 loop, best of 3: 581 ms per loop

Code for timings:

np.random.seed(100)
N = 1000000
df = pd.DataFrame(np.random.randint(10, size=(N,5)), columns=list('ABCDE'))
print (df)

df = df.set_index(['A','B'])
print (df)
df1 = df.copy()
df2 = df.copy()
df3 = df.copy()

def reset_set(df):
    df = df.reset_index()
    df.B = df.B + 1
    return df.set_index(['A','B'])

def assign_method(df):
    df = df.reset_index().assign(B=lambda x: x.B+1).set_index(['A','B']) 
    return df   

def tuples_method(df):
    new_index = list(zip(df.index.get_level_values('A'), df.index.get_level_values('B') + 1))
    df.index = pd.MultiIndex.from_tuples(new_index, names = df.index.names)
    return df

def jul(df):
    df.index = pd.MultiIndex.from_tuples([(x[0], x[1]+1) for x in df.index], names=df.index.names)
    return df

Thank you Jeff for another solution:

df.index.set_levels(df.index.levels[1] + 1 , level=1, inplace=True)
print (df)

     C  D  E  F
A B            
1 5  7  1  5  7
2 6  8  3  3  4
3 7  9  5  6  3
like image 70
jezrael Avatar answered Oct 22 '22 07:10

jezrael


Here's a slightly different way:

df.index = pd.MultiIndex.from_tuples([(x[0], x[1]+1) for x in df.index], names=df.index.names)

1000 loops, best of 3: 840 µs per loop

For comparison:

new_index = list(zip(df.index.get_level_values('A'), 
df.index.get_level_values('B') + 1))
df.index = pd.MultiIndex.from_tuples(new_index, names = df.index.names)

1000 loops, best of 3: 984 µs per loop

The reset_index method is 10 times slower.

like image 37
Julien Marrec Avatar answered Oct 22 '22 07:10

Julien Marrec