df1 = DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'],
columns=['one', 'two'])
df2 = DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'],
columns=['three', 'four'])
>>> df1
one two
a 0 1
b 2 3
c 4 5
>>> df2
three four
a 5 6
c 7 8
res = pd.concat([df1, df2], axis=1, levels=['level1', 'level2'],
names=['upper', 'lower'])
>>> res
one two three four
a 0 1 5 6
b 2 3 NaN NaN
c 4 5 7 8
My question is why levels and names are not shown in res output above? Any real example how levels option is used?
Thanks for your time and help
The levels are parts of the index (only together they can identify a row in a DataFrame / Series). Levels being parts of the index (as a tuple) can be nicely observed in the Spyder Variable explorer: Having levels gives us opportunity to aggregate values within groups in respect to an index part (level) of our choice.
Pandas: Data Manipulation - concat() function A sequence or mapping of Series or DataFrame objects. If a dict is passed, the sorted keys will be used as the keys argument, unless it is passed, in which case the values will be selected (see below).
merge() for combining data on common columns or indices. . join() for combining data on a key column or an index. concat() for combining DataFrames across rows or columns.
Time is of the essence; which one is faster? In this benchmark, concatenating multiple dataframes by using the Pandas. concat function is 50 times faster than using the DataFrame. append version.
Really interesting question.
I do research in SO but never is used :(
But in docs is one sample with notice:
Yes, this is fairly esoteric, but is actually necessary for implementing things like
GroupBy
where the order of a categorical variable is meaningful.
Also docs
says:
levels : list of sequences, default None. Specific levels (unique values) to use for constructing a MultiIndex. Otherwise they will be inferred from the keys.
So it add new levels to MultiIndex
:
res = pd.concat([df1, df2], axis=1,
keys=['level1','level2'],
levels=[['level1', 'level2','level3']],
names=['upper', 'lower'])
print (res)
upper level1 level2
lower one two three four
a 0 1 5.0 6.0
b 2 3 NaN NaN
c 4 5 7.0 8.0
print (res.columns)
MultiIndex(levels=[['level1', 'level2', 'level3'], ['four', 'one', 'three', 'two']],
labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
names=['upper', 'lower'])
Same without parameter levels
:
res = pd.concat([df1, df2], axis=1,
keys=['level1','level2'],
names=['upper', 'lower'])
print (res)
upper level1 level2
lower one two three four
a 0 1 5.0 6.0
b 2 3 NaN NaN
c 4 5 7.0 8.0
print (res.columns)
MultiIndex(levels=[['level1', 'level2'], ['four', 'one', 'three', 'two']],
labels=[[0, 0, 1, 1], [1, 3, 2, 0]],
names=['upper', 'lower'])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With