Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select multiple columns by labels in pandas

Tags:

python

pandas

People also ask

How do I select only certain columns in pandas?

To select a single column, use square brackets [] with the column name of the column of interest.

How do I select all columns in a data frame?

To select all columns except one column in Pandas DataFrame, we can use df. loc[:, df. columns != <column name>].

How do I see specific columns in pandas?

Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.


Name- or Label-Based (using regular expression syntax)

df.filter(regex='[A-CEG-I]')   # does NOT depend on the column order

Note that any regular expression is allowed here, so this approach can be very general. E.g. if you wanted all columns starting with a capital or lowercase "A" you could use: df.filter(regex='^[Aa]')

Location-Based (depends on column order)

df[ list(df.loc[:,'A':'C']) + ['E'] + list(df.loc[:,'G':'I']) ]

Note that unlike the label-based method, this only works if your columns are alphabetically sorted. This is not necessarily a problem, however. For example, if your columns go ['A','C','B'], then you could replace 'A':'C' above with 'A':'B'.

The Long Way

And for completeness, you always have the option shown by @Magdalena of simply listing each column individually, although it could be much more verbose as the number of columns increases:

df[['A','B','C','E','G','H','I']]   # does NOT depend on the column order

Results for any of the above methods

          A         B         C         E         G         H         I
0 -0.814688 -1.060864 -0.008088  2.697203 -0.763874  1.793213 -0.019520
1  0.549824  0.269340  0.405570 -0.406695 -0.536304 -1.231051  0.058018
2  0.879230 -0.666814  1.305835  0.167621 -1.100355  0.391133  0.317467

Just pick the columns you want directly....

df[['A','E','I','C']]

How do I select multiple columns by labels in pandas?

Multiple label-based range slicing is not easily supported with pandas, but position-based slicing is, so let's try that instead:

loc = df.columns.get_loc
df.iloc[:, np.r_[loc('A'):loc('C')+1, loc('E'), loc('G'):loc('I')+1]]

          A         B         C         E         G         H         I
0 -1.666330  0.321260 -1.768185 -0.034774  0.023294  0.533451 -0.241990
1  0.911498  3.408758  0.419618 -0.462590  0.739092  1.103940  0.116119
2  1.243001 -0.867370  1.058194  0.314196  0.887469  0.471137 -1.361059
3 -0.525165  0.676371  0.325831 -1.152202  0.606079  1.002880  2.032663
4  0.706609 -0.424726  0.308808  1.994626  0.626522 -0.033057  1.725315
5  0.879802 -1.961398  0.131694 -0.931951 -0.242822 -1.056038  0.550346
6  0.199072  0.969283  0.347008 -2.611489  0.282920 -0.334618  0.243583
7  1.234059  1.000687  0.863572  0.412544  0.569687 -0.684413 -0.357968
8 -0.299185  0.566009 -0.859453 -0.564557 -0.562524  0.233489 -0.039145
9  0.937637 -2.171174 -1.940916 -1.553634  0.619965 -0.664284 -0.151388

Note that the +1 is added because when using iloc the rightmost index is exclusive.


Comments on Other Solutions

  • filter is a nice and simple method for OP's headers, but this might not generalise well to arbitrary column names.

  • The "location-based" solution with loc is a little closer to the ideal, but you cannot avoid creating intermediate DataFrames (that are eventually thrown out and garbage collected) to compute the final result range -- something that we would ideally like to avoid.

  • Lastly, "pick your columns directly" is good advice as long as you have a manageably small number of columns to pick. It will, however not be applicable in some cases where ranges span dozens (or possibly hundreds) of columns.


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!