Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

multi column selection with pandas xs function is failed

I have a following multiindex time-series data.

first                001                                               \
second              open     high      low    close jdiff_vol   value   
date     time                                                           
20150721 90100   2082.18  2082.18  2082.18  2082.18     11970   99466   
         90200   2082.72  2083.01  2082.18  2083.01      4886   40108   
         90300   2083.68  2084.20  2083.68  2083.98      6966   48847   
         90400   2083.63  2084.21  2083.63  2084.00      6817   48020   
         90500   2084.03  2084.71  2083.91  2084.32     10193   58399   
20150721 90100   2084.14  2084.22  2083.59  2083.65      7860   39128   
         90200   2084.08  2084.08  2083.47  2083.50      7171   39147   
         90300   2083.25  2083.65  2083.08  2083.60      4549   34373   
         90400   2084.06  2084.06  2083.66  2083.80      6980   38088   
         90500   2083.61  2084.04  2083.27  2083.89      5292   33466   

The below code works.

opens = data.xs('open', level='second', axis=1, drop_level=True)

But, selecting multi columns using the below code fails.

opens = data.xs(('open','close'), level='second', axis=1, drop_level=True)

How can I modify it in order to select multi columns ?

like image 968
user1913171 Avatar asked Oct 19 '22 12:10

user1913171


1 Answers

Until now, it is not possible to use the pandas xs() function with two column keys at the same level. It would only be possible to use two keys that are from different levels:

opens = data.xs(('001','close'), level=('first','second'), axis=1, drop_level=True)

However, that is not exactly what your are looking for. An alternative solution is to execute two pandas xs() functions and concat them afterwards:

df_xs = pd.concat([df.xs('open', level='second', axis=1, drop_level=True), df.xs('close', level='second', axis=1, drop_level=True)])

Here is a complete example. First, create a dataframe:

import pandas as pd
import numpy as np

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
          ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['first', 'second'])
df = pd.DataFrame(np.random.randn(6, 6), index=index[:6], columns=index[:6])

print(df)

first              bar                 baz                 foo          
second             one       two       one       two       one       two
first second                                                            
bar   one     0.699065 -0.283550  0.072595 -0.699627  0.879832 -1.787520
      two    -1.172970  1.381607  1.941370  0.577451 -0.182819  0.215879
baz   one     0.669402 -0.018534  0.775114  1.277079  0.404116 -2.450712
      two     0.066530 -0.509366  1.249981  2.426217  0.409881 -0.178713
foo   one     1.098217  0.399427 -1.423057 -1.261542  1.668202  0.187629
      two     0.827283  0.974239 -1.944796  0.266321  0.700679 -0.371074

Then you can execute the xs() with a concat:

df_xs = pd.concat([df.xs('one', level='second', axis=1, drop_level=True), df.xs('two', level='second', axis=1, drop_level=True)])
print (df_xs)

first              bar       baz       foo
first second                              
bar   one     0.699065  0.072595  0.879832
      two    -1.172970  1.941370 -0.182819
baz   one     0.669402  0.775114  0.404116
      two     0.066530  1.249981  0.409881
foo   one     1.098217 -1.423057  1.668202
      two     0.827283 -1.944796  0.700679
bar   one    -0.283550 -0.699627 -1.787520
      two     1.381607  0.577451  0.215879
baz   one    -0.018534  1.277079 -2.450712
      two    -0.509366  2.426217 -0.178713
foo   one     0.399427 -1.261542  0.187629
      two     0.974239  0.266321 -0.371074
like image 117
Rene B. Avatar answered Nov 02 '22 14:11

Rene B.