pandas: best way to select all columns whose names start with X

People also ask

How do I select X columns in pandas?

We can use double square brackets [[]] to select multiple columns from a data frame in Pandas.

How do you get the names of all columns in pandas?

You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.

How do you select all columns except the first pandas?

To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].

Just perform a list comprehension to create your columns:

In [28]:

filter_col = [col for col in df if col.startswith('foo')]
filter_col
Out[28]:
['foo.aa', 'foo.bars', 'foo.fighters', 'foo.fox', 'foo.manchu']
In [29]:

df[filter_col]
Out[29]:
   foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu
0     1.0         0             0        2         NA
1     2.1         0             1        4          0
2     NaN         0           NaN        1          0
3     4.7         0             0        0          0
4     5.6         0             0        0          0
5     6.8         1             0        5          0

Another method is to create a series from the columns and use the vectorised str method startswith:

In [33]:

df[df.columns[pd.Series(df.columns).str.startswith('foo')]]
Out[33]:
   foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu
0     1.0         0             0        2         NA
1     2.1         0             1        4          0
2     NaN         0           NaN        1          0
3     4.7         0             0        0          0
4     5.6         0             0        0          0
5     6.8         1             0        5          0

In order to achieve what you want you need to add the following to filter the values that don't meet your ==1 criteria:

In [36]:

df[df[df.columns[pd.Series(df.columns).str.startswith('foo')]]==1]
Out[36]:
   bar.baz  foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu nas.foo
0      NaN       1       NaN           NaN      NaN        NaN     NaN
1      NaN     NaN       NaN             1      NaN        NaN     NaN
2      NaN     NaN       NaN           NaN        1        NaN     NaN
3      NaN     NaN       NaN           NaN      NaN        NaN     NaN
4      NaN     NaN       NaN           NaN      NaN        NaN     NaN
5      NaN     NaN         1           NaN      NaN        NaN     NaN

EDIT

OK after seeing what you want the convoluted answer is this:

In [72]:

df.loc[df[df[df.columns[pd.Series(df.columns).str.startswith('foo')]] == 1].dropna(how='all', axis=0).index]
Out[72]:
   bar.baz  foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu nas.foo
0      5.0     1.0         0             0        2         NA      NA
1      5.0     2.1         0             1        4          0       0
2      6.0     NaN         0           NaN        1          0       1
5      6.8     6.8         1             0        5          0       0

Now that pandas' indexes support string operations, arguably the simplest and best way to select columns beginning with 'foo' is just:

df.loc[:, df.columns.str.startswith('foo')]

Alternatively, you can filter column (or row) labels with df.filter(). To specify a regular expression to match the names beginning with foo.:

>>> df.filter(regex=r'^foo\.', axis=1)
   foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu
0     1.0         0             0        2         NA
1     2.1         0             1        4          0
2     NaN         0           NaN        1          0
3     4.7         0             0        0          0
4     5.6         0             0        0          0
5     6.8         1             0        5          0

To select only the required rows (containing a 1) and the columns, you can use loc, selecting the columns using filter (or any other method) and the rows using any:

>>> df.loc[(df == 1).any(axis=1), df.filter(regex=r'^foo\.', axis=1).columns]
   foo.aa  foo.bars  foo.fighters  foo.fox foo.manchu
0     1.0         0             0        2         NA
1     2.1         0             1        4          0
2     NaN         0           NaN        1          0
5     6.8         1             0        5          0

The simplest way is to use str directly on column names, there is no need for pd.Series

df.loc[:,df.columns.str.startswith("foo")]

In my case I needed a list of prefixes

colsToScale=["production", "test", "development"]
dc[dc.columns[dc.columns.str.startswith(tuple(colsToScale))]]

You can try the regex here to filter out the columns starting with "foo"

df.filter(regex='^foo*')

If you need to have the string foo in your column then

df.filter(regex='foo*')

would be appropriate.

For the next step, you can use

df[df.filter(regex='^foo*').values==1]

to filter out the rows where one of the values of 'foo*' column is 1.

Related questions
                            
                                How to print like printf in Python3?
                            
                                How to suppress py.test internal deprecation warnings
                            
                                How can I split a text into sentences?
                            
                                How to toggle a value?
                            
                                Making heatmap from pandas DataFrame
                            
                                ImportError: No module named Crypto.Cipher
                            
                                Do python projects need a MANIFEST.in, and what should be in it?
                            
                                How does Python manage int and long?
                            
                                How to develop Android app completely using python? [closed]
                            
                                How to re-raise an exception in nested try/except blocks?
                            
                                Running python script inside ipython
                            
                                How to create a tuple with only one element
                            
                                Unpickling a python 2 object with python 3
                            
                                Invalid http_host header
                            
                                How to get autocomplete in jupyter notebook without using tab?
                            
                                Multiple linear regression in Python
                            
                                How do I write data into CSV format as string (not file)?
                            
                                Why does += behave unexpectedly on lists?
                            
                                How do I get Pyflakes to ignore a statement?
                            
                                python list in sql query as parameter [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

pandas: best way to select all columns whose names start with X

Tags:

python

pandas

dataframe

selection

People also ask

Recent Activity

Donate For Us