In pandas, time series can be indexed by passing a string that is interpretable as a date. This works for a DataFrame too:
>>> dates = pd.date_range('2000-01-01', periods=8, freq='M')
>>> df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D'])
>>> df
A B C D
2000-01-31 0.096115 0.069723 -1.546733 -1.661178
2000-02-29 0.256296 1.838310 0.227132 1.765269
2000-03-31 0.315862 0.167007 -1.340888 1.005260
2000-04-30 1.238728 -2.325420 1.371134 -0.373232
2000-05-31 0.639211 -0.209961 -1.006498 0.005214
2000-06-30 0.091590 -0.664554 -2.037539 -1.335070
2000-07-31 0.275373 -0.398758 0.402848 0.441035
2000-08-31 2.189259 -1.236159 -0.579680 0.878355
>>> df['2000-05']
A B C D
2000-05-31 0.639211 -0.209961 -1.006498 0.005214
I am looking for ways to do this when the timestamps are the column names.
>>> df = df.T
>>> df['2000-05']
This yields TypeError: only integer scalar arrays can be converted to a scalar index
. The same is true for
>>> df.loc[:, '2000-05']
The most immediate solution that I can think of is
>>> df.T['2000-05'].T
2000-05-31
A 0.639211
B -0.209961
C -1.006498
D 0.005214
but I am wondering if there are any other good solutions. I imagine that for very large DataFrames, doing transpositions may have a performance impact that could be avoided here.
Well, there is always the filter
option.
df = df.T
df.filter(like='2000-05')
2000-05-31
A 1.884517
B 0.258133
C 0.809360
D -0.069186
filter
gives you greater flexibility, for example, with regular expressions:
df.filter(regex='2000-.*-30')
2000-04-30 2000-06-30
A -2.968870 2.064582
B -0.844370 0.093393
C 0.027328 0.033193
D -0.270860 -0.455323
Maybe you can try str
, contains
df[df.index.str.contains('2000-05')].T
Out[163]:
2000-05-31
A 0.639211
B -0.209961
C -1.006498
D 0.005214
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With