levels option in pandas concat

Tags:

python

pandas

df1 = DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'],
        columns=['one', 'two'])
df2 = DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'],
        columns=['three', 'four'])

>>> df1
   one  two
a    0    1
b    2    3
c    4    5

>>> df2
   three  four
a      5     6
c      7     8


res = pd.concat([df1, df2], axis=1, levels=['level1', 'level2'],
        names=['upper', 'lower'])
>>> res
   one  two  three  four
a    0    1      5     6
b    2    3    NaN   NaN
c    4    5      7     8

My question is why levels and names are not shown in res output above? Any real example how levels option is used?

Thanks for your time and help

618

asked May 30 '17 12:05

venkysmarty

1 Answers

Really interesting question.

I do research in SO but never is used :(

But in docs is one sample with notice:

Yes, this is fairly esoteric, but is actually necessary for implementing things like GroupBy where the order of a categorical variable is meaningful.

Also docs says:

levels : list of sequences, default None. Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys.

So it add new levels to MultiIndex:

res = pd.concat([df1, df2], axis=1,
                keys=['level1','level2'], 
                levels=[['level1', 'level2','level3']], 
                names=['upper', 'lower'])

print (res)
upper level1     level2     
lower    one two  three four
a          0   1    5.0  6.0
b          2   3    NaN  NaN
c          4   5    7.0  8.0

print (res.columns)
MultiIndex(levels=[['level1', 'level2', 'level3'], ['four', 'one', 'three', 'two']],
           labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
           names=['upper', 'lower'])

Same without parameter levels:

res = pd.concat([df1, df2], axis=1,
                keys=['level1','level2'], 
                names=['upper', 'lower'])

print (res)
upper level1     level2     
lower    one two  three four
a          0   1    5.0  6.0
b          2   3    NaN  NaN
c          4   5    7.0  8.0

print (res.columns)
MultiIndex(levels=[['level1', 'level2'], ['four', 'one', 'three', 'two']],
           labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
           names=['upper', 'lower'])

189

answered Sep 18 '22 13:09

jezrael

Related questions
                            
                                Python type annotations: Any way to annotate a property?
                            
                                Which $path is needed so g++/pybind11 could locate Python.h?
                            
                                Subtract time from datetime.time object
                            
                                Numpy assignment like 'numpy.take'
                            
                                Diamond inheritance and the MRO
                            
                                Train NER model in NLTK with custom corpus
                            
                                selenium python element.screenshot() not working
                            
                                To Kill A Mocking Object: A Python Story
                            
                                Python whole reverse list specifying index
                            
                                Sort bins from pandas cut
                            
                                Django - short non-linear non-predictable ID in the URL
                            
                                ctypes.ArgumentError: Don't know how to convert parameter
                            
                                Arrange elements with same count in alphabetical order
                            
                                How to let python3 import graph-tool installed by Homebrew?
                            
                                How to convert NumPy ndarray to C++ vector with Boost.Python and back?
                            
                                How to run PyTorch on GPU by default?
                            
                                Most efficient way to upload image to Amazon S3 with Python using Boto3
                            
                                Efficient Algorithm to compose valid expressions with specific target
                            
                                How to create a DataFrame from dict of unequal length lists, and truncating to a specific length?
                            
                                How to avoid inherited members using autosummary and custom templates?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With