I have the following multiindex dataframe:
from io import StringIO
import pandas as pd
datastring = StringIO("""File,no,runtime,value1,value2
    A,0, 0,12,34
    A,0, 1,13,34
    A,0, 2,23,34
    A,1, 6,23,38
    A,1, 7,22,38
    B,0,17,15,35
    B,0,18,17,35
    C,0,34,23,32
    C,0,35,21,32
    """)    
df = pd.read_csv(datastring, sep=',')
df.set_index(['File','no',df.index], inplace=True)
>> df
               runtime  value1  value2
File   no               
A      0    0     0       12      34
            1     1       13      34
            2     2       23      34
       1    3     6       23      38
            4     7       22      38
B      0    5     17      15      35
            6     18      17      35
C      0    7     34      23      32
            8     35      21      32
What I would like to get is just the first values of every entry with a new file and a different number
A 0 34
A 1 38
B 0 35
C 0 32
The most similar questions I could find where these
Resample pandas dataframe only knowing result measurement count
MultiIndex-based indexing in pandas
Select rows in pandas MultiIndex DataFrame
but I was unable to construct a solution from them. The best I got was the ix operation, but as the values technically are still there (just not on display), the result is
idx = pd.IndexSlice
df.loc[idx[:,0],:]
could, for example, filter for the 0 value but would still return the entire rest of the dataframe.
Is a multiindex even the right tool for the task at hand? How to solve this?
Use GroupBy.first by first and second level of MultiIndex:
s = df.groupby(level=[0,1])['value2'].first()
print (s)
File  no
A     0     34
      1     38
B     0     35
C     0     32
Name: value2, dtype: int64
If need one column DataFrame use one element list:
df1 = df.groupby(level=[0,1])[['value2']].first()
print (df1)
         value2
File no        
A    0       34
     1       38
B    0       35
C    0       32
Another idea is remove 3rd level by DataFrame.reset_index and filter by Index.get_level_values with boolean indexing:
df2 = df.reset_index(level=2, drop=True)
s = df2.loc[~df2.index.duplicated(), 'value2']
print (s)
File  no
A     0     34
      1     38
B     0     35
C     0     32
Name: value2, dtype: int64
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With