Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to select dataframe columns using string keys when the column names are timestamps?

In pandas, time series can be indexed by passing a string that is interpretable as a date. This works for a DataFrame too:

>>> dates = pd.date_range('2000-01-01', periods=8, freq='M')
>>> df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
>>> df
                   A         B         C         D
2000-01-31  0.096115  0.069723 -1.546733 -1.661178
2000-02-29  0.256296  1.838310  0.227132  1.765269
2000-03-31  0.315862  0.167007 -1.340888  1.005260
2000-04-30  1.238728 -2.325420  1.371134 -0.373232
2000-05-31  0.639211 -0.209961 -1.006498  0.005214
2000-06-30  0.091590 -0.664554 -2.037539 -1.335070
2000-07-31  0.275373 -0.398758  0.402848  0.441035
2000-08-31  2.189259 -1.236159 -0.579680  0.878355
>>> df['2000-05']
                   A         B         C         D
2000-05-31  0.639211 -0.209961 -1.006498  0.005214

I am looking for ways to do this when the timestamps are the column names.

>>> df = df.T
>>> df['2000-05']

This yields TypeError: only integer scalar arrays can be converted to a scalar index. The same is true for

>>> df.loc[:, '2000-05']

The most immediate solution that I can think of is

>>> df.T['2000-05'].T
   2000-05-31
A    0.639211
B   -0.209961
C   -1.006498
D    0.005214

but I am wondering if there are any other good solutions. I imagine that for very large DataFrames, doing transpositions may have a performance impact that could be avoided here.

like image 762
Radoslaw Jurga Avatar asked May 11 '18 18:05

Radoslaw Jurga


2 Answers

Well, there is always the filter option.

df = df.T
df.filter(like='2000-05')

   2000-05-31
A    1.884517
B    0.258133
C    0.809360
D   -0.069186

filter gives you greater flexibility, for example, with regular expressions:

df.filter(regex='2000-.*-30')

   2000-04-30  2000-06-30
A   -2.968870    2.064582
B   -0.844370    0.093393
C    0.027328    0.033193
D   -0.270860   -0.455323
like image 100
cs95 Avatar answered Sep 29 '22 04:09

cs95


Maybe you can try str , contains

df[df.index.str.contains('2000-05')].T
Out[163]: 
   2000-05-31
A    0.639211
B   -0.209961
C   -1.006498
D    0.005214
like image 26
BENY Avatar answered Sep 29 '22 05:09

BENY