We can use double square brackets [[]] to select multiple columns from a data frame in Pandas.
You can get the column names from pandas DataFrame using df. columns. values , and pass this to python list() function to get it as list, once you have the data you can print it using print() statement.
To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].
Just perform a list comprehension to create your columns:
In [28]:
filter_col = [col for col in df if col.startswith('foo')]
filter_col
Out[28]:
['foo.aa', 'foo.bars', 'foo.fighters', 'foo.fox', 'foo.manchu']
In [29]:
df[filter_col]
Out[29]:
foo.aa foo.bars foo.fighters foo.fox foo.manchu
0 1.0 0 0 2 NA
1 2.1 0 1 4 0
2 NaN 0 NaN 1 0
3 4.7 0 0 0 0
4 5.6 0 0 0 0
5 6.8 1 0 5 0
Another method is to create a series from the columns and use the vectorised str method startswith
:
In [33]:
df[df.columns[pd.Series(df.columns).str.startswith('foo')]]
Out[33]:
foo.aa foo.bars foo.fighters foo.fox foo.manchu
0 1.0 0 0 2 NA
1 2.1 0 1 4 0
2 NaN 0 NaN 1 0
3 4.7 0 0 0 0
4 5.6 0 0 0 0
5 6.8 1 0 5 0
In order to achieve what you want you need to add the following to filter the values that don't meet your ==1
criteria:
In [36]:
df[df[df.columns[pd.Series(df.columns).str.startswith('foo')]]==1]
Out[36]:
bar.baz foo.aa foo.bars foo.fighters foo.fox foo.manchu nas.foo
0 NaN 1 NaN NaN NaN NaN NaN
1 NaN NaN NaN 1 NaN NaN NaN
2 NaN NaN NaN NaN 1 NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN
5 NaN NaN 1 NaN NaN NaN NaN
EDIT
OK after seeing what you want the convoluted answer is this:
In [72]:
df.loc[df[df[df.columns[pd.Series(df.columns).str.startswith('foo')]] == 1].dropna(how='all', axis=0).index]
Out[72]:
bar.baz foo.aa foo.bars foo.fighters foo.fox foo.manchu nas.foo
0 5.0 1.0 0 0 2 NA NA
1 5.0 2.1 0 1 4 0 0
2 6.0 NaN 0 NaN 1 0 1
5 6.8 6.8 1 0 5 0 0
Now that pandas' indexes support string operations, arguably the simplest and best way to select columns beginning with 'foo' is just:
df.loc[:, df.columns.str.startswith('foo')]
Alternatively, you can filter column (or row) labels with df.filter()
. To specify a regular expression to match the names beginning with foo.
:
>>> df.filter(regex=r'^foo\.', axis=1)
foo.aa foo.bars foo.fighters foo.fox foo.manchu
0 1.0 0 0 2 NA
1 2.1 0 1 4 0
2 NaN 0 NaN 1 0
3 4.7 0 0 0 0
4 5.6 0 0 0 0
5 6.8 1 0 5 0
To select only the required rows (containing a 1
) and the columns, you can use loc
, selecting the columns using filter
(or any other method) and the rows using any
:
>>> df.loc[(df == 1).any(axis=1), df.filter(regex=r'^foo\.', axis=1).columns]
foo.aa foo.bars foo.fighters foo.fox foo.manchu
0 1.0 0 0 2 NA
1 2.1 0 1 4 0
2 NaN 0 NaN 1 0
5 6.8 1 0 5 0
The simplest way is to use str directly on column names, there is no need for pd.Series
df.loc[:,df.columns.str.startswith("foo")]
In my case I needed a list of prefixes
colsToScale=["production", "test", "development"]
dc[dc.columns[dc.columns.str.startswith(tuple(colsToScale))]]
You can try the regex here to filter out the columns starting with "foo"
df.filter(regex='^foo*')
If you need to have the string foo in your column then
df.filter(regex='foo*')
would be appropriate.
For the next step, you can use
df[df.filter(regex='^foo*').values==1]
to filter out the rows where one of the values of 'foo*' column is 1.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With