Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

levels option in pandas concat

Tags:

python

pandas

df1 = DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'],
        columns=['one', 'two'])
df2 = DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'],
        columns=['three', 'four'])

>>> df1
   one  two
a    0    1
b    2    3
c    4    5

>>> df2
   three  four
a      5     6
c      7     8


res = pd.concat([df1, df2], axis=1, levels=['level1', 'level2'],
        names=['upper', 'lower'])
>>> res
   one  two  three  four
a    0    1      5     6
b    2    3    NaN   NaN
c    4    5      7     8

My question is why levels and names are not shown in res output above? Any real example how levels option is used?

Thanks for your time and help

like image 618
venkysmarty Avatar asked May 30 '17 12:05

venkysmarty


People also ask

What are levels in pandas?

The levels are parts of the index (only together they can identify a row in a DataFrame / Series). Levels being parts of the index (as a tuple) can be nicely observed in the Spyder Variable explorer: Having levels gives us opportunity to aggregate values within groups in respect to an index part (level) of our choice.

What is keys in concat pandas?

Pandas: Data Manipulation - concat() function A sequence or mapping of Series or DataFrame objects. If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below).

What is difference between pandas concat and merge?

merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.

Is concat faster than append pandas?

Time is of the essence; which one is faster? In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version.


1 Answers

Really interesting question.

I do research in SO but never is used :(

But in docs is one sample with notice:

Yes, this is fairly esoteric, but is actually necessary for implementing things like GroupBy where the order of a categorical variable is meaningful.

Also docs says:

levels : list of sequences, default None. Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys.

So it add new levels to MultiIndex:

res = pd.concat([df1, df2], axis=1,
                keys=['level1','level2'], 
                levels=[['level1', 'level2','level3']], 
                names=['upper', 'lower'])

print (res)
upper level1     level2     
lower    one two  three four
a          0   1    5.0  6.0
b          2   3    NaN  NaN
c          4   5    7.0  8.0

print (res.columns)
MultiIndex(levels=[['level1', 'level2', 'level3'], ['four', 'one', 'three', 'two']],
           labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
           names=['upper', 'lower'])

Same without parameter levels:

res = pd.concat([df1, df2], axis=1,
                keys=['level1','level2'], 
                names=['upper', 'lower'])

print (res)
upper level1     level2     
lower    one two  three four
a          0   1    5.0  6.0
b          2   3    NaN  NaN
c          4   5    7.0  8.0

print (res.columns)
MultiIndex(levels=[['level1', 'level2'], ['four', 'one', 'three', 'two']],
           labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
           names=['upper', 'lower'])
like image 189
jezrael Avatar answered Sep 18 '22 13:09

jezrael