How to select multiple columns based on their names in python?

Question

I am new to python so sorry if this is too obvious.

I have a dataframe that looks like below:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(5, 10))
df.columns = ['date1', 'date2', 'date3', 'name1', 'col1', 'col2', 'col3', 'name2', 'date4', 'date5']

    date1     date2     date3     name1      col1      col2      col3  \
0 -0.177090  0.417442 -0.930226  0.460750  1.062997  0.534942 -1.082967   
1 -0.942154  0.047837 -0.494979  2.437469 -0.446984  0.709556 -0.135978   
2 -1.544783  0.129307 -0.169556 -0.890697  2.650924  0.976610  0.290226   
3 -0.651220 -0.196342  0.712601  0.641927 -0.009921 -0.038450  0.498087   
4 -0.299145 -1.407747  1.914364  0.554330 -0.196702  2.037057 -0.287942   

    name2     date4     date5  
0 -0.318310  0.358619 -0.243150  
1  1.171024  0.277943 -1.584723  
2 -0.546707 -1.951831  0.678125  
3 -0.510261 -0.018574 -0.212684  
4  1.929841  0.995625 -1.125044

I'd like to to keep all columns that have, for example, 'date' in their names. That is, I want to keep columns 'date1', 'date2', 'date3', 'date4', 'date5', etc. In some statistical packages I can use * to represent all possible characters and use a command like this:

keep date*

Is there an equivalent way of doing this in python?

Thanks very much for any help.

joris · Accepted Answer

You can use the filter method. To do the equivalent of keep date*:

In [62]: df.filter(like='date')
Out[62]: 
      date1     date2     date3     date4     date5
0  0.091744 -0.431606  1.280286  0.263137  0.444550
1  0.688155  0.583918  0.957041  0.446047  1.654274
2  0.109004  0.608818  0.091498  0.940406  0.476479
3 -0.874016  1.312567  0.326480  1.213292  0.504049
4 -0.203515 -0.979086  0.458790  1.012296 -2.446310

The filter method has also a regex keyword, to do some more complex filtering.
Eg to drop all dates, you can provide a regex expression that says to not match a certain string: df.filter(regex="^(?!date).*$")

In the upcoming pandas (0.14), this functionality will also be provided in drop method, so this will be easier.

How to select multiple columns based on their names in python?

Tags:

python

pandas

Zhen Sun

1 Answers

joris

Recent Activity

Donate For Us

How to select multiple columns based on their names in python?

Tags:

python

pandas

Zhen Sun

1 Answers

joris

Related questions

Recent Activity

Donate For Us